Data Stage is used in large organizations as an interface between their systems which takes care of extraction, translation and the loading to the systems. The Data Stage interfaces are called “jobs” which can be configured in such a way that they can run on single servers as well as multiple servers in a grid architecture. Data Stage consisted of a two-tier infrastructure, with clients connected directly to the DSEngine. The DSEngine stores all the metadata and runtime information, as well as controls the execution of jobs. The Server is where the actual developed jobs reside and run. This can be compatible on both UNIX as well as WINDOWS servers. Earlier versions of Data Stage only supported UNIX servers. The server connection is done via Data Stage client(s) which is a Windows based application with tools to prepare a Data Stage job. On the Data Stage server, work is organized into “projects”.

datastage overview

As mentioned, Data Stage has two engines, Parallel and the Server engine. In UNIX the server engine is located in a directory called DSEngine whose location is specified in a file .dshome which presents in the root directory. In Windows the server engine will be located in the folder <>:\IBM\InformationServer\Server and the parallel engine will be located in <>:\IBM\InformationServer\PXEngine.

Development work in Data Stage Training is organized into a number of work areas called “projects”. Each project has its own individual local Repository where the designs, technical and process metadata are stored. You will be able find the projects created in the dshome directory under the subfolder ‘Projects’. This is the default directory. You will also be able to create projects in a directory of your liking. To find the directory of a particular project you will just have to log on to the Data Stage administrator console.