datastage经验总结Word文件下载.docx

资源描述

datastage经验总结Word文件下载.docx

《datastage经验总结Word文件下载.docx》由会员分享，可在线阅读，更多相关《datastage经验总结Word文件下载.docx（69页珍藏版）》请在冰豆网上搜索。

datastage经验总结Word文件下载.docx

﹡保持（keep）或删除（drop）字段

﹡也可增加新字段，并为新增字段赋值,但赋值方式要以字段=字段形式,例如:

new_columnname=old_columnname;

但new_columnname=”hf”这样赋值是错误的

﹡Null的处理

destinationColum=handle_null（sourceColum,Value）

destinationColum=make_null（sourceColum,Value）,这个使用中有问题,不处理空值

5Lookup/join

空值处理

1.当使用LookupFailure=Contunue时,要把referencelink的非主键设置成Nullable，即使referencedata是非空的,也要设置成Nullable，这样能够确保Lookup把空值分配给没有匹配的参考非主键

2.如果参考非主键没有设置成Nullable,将会发生什么：

Lookup将会分配一个默认值给没有匹配的的行:

Integer默认值为0

Varchar/char默认值为空字符串（0长度的）

2.6DataStage中默认和隐式类型转换时注意的问题

当从源向目标映射数据时，如果类型不一致，对于有些类型我们需要在modify或transfomerstage中通过函数进行转换,对于有些系统会自动完成类型转换，在类型转换过程中，注意以下几点：

1在变长到固定长度字符串的默认转换中，paralleljobs用空格（ASCII20）字符来填充剩余长度（环境变量APT_STRING_PADCHAR的默认值也是空格（ASCII20）,具体在哪个stage填充的不知。

2通过环境变量APT_STRING_PADCHAR可以改变默认的填充字符null（ASCII0）.

3有个PadString函数可以用来用指定的字符来填充一个变长的字符串到指定的长度。

这个函数的参数不能使固定长度字符串，如果是固定长度的先转化为变长。

2.7配置一个

input或output，就viewdata一下，不要等到run时再回头找error

2.8Data型数据是比较麻烦的

因为Datastage中的日期格式为timestamp，当然你也可以把它的日期格式更改为date型，但经常会出现错误.对于oracle数据库源表和目标表，不需要对date型数据做任何转换，直接使用默认即可，但对于informix等一些数据库,则需要使用oconv,iconv函数进行转换，并在output中相应的修改outputsql中的日期格式

2.9行列互换之HorizontalPivot（PivotStage）

列变行,即宽表变窄表，字段变少了,记录数变多了，牵涉到Column数量的变化;

注意要在Pivot-→Output的Derivation中写上转换字段的来源字段，字段之间用逗号隔开

例子：

PIVOTInput记录如下：

Idcol1col2col3

2RootpathWorkdateEdsDbname

3RootpathWorkdateAsdmDbname

PIVOTOutput记录如下：

Idcolum

2Rootpath

2Workdate

2EdsDbname

3Rootpath

3Workdate

3AsdmDbname

10行列互换之VerticalPivot

PivotStage是宽表变窄表,也即：

HorizontalPivot；

实际应用中还会牵涉到窄表

宽表,即VerticalPivot的应用。

例如：

输入Input记录如下：

IdColumn

2Rootpath

2Workdate

2EdsDbname

3Rootpath

3Workdate

我们想要的输出Output记录是这样的：

IdNewCol

2Rootpath,Workdate，EdsDbname

3Rootpath,Workdate，AsdmDbname

解决方法如下：

ServerJob的做法:

SequenceFile—--→Transform———→HashFile

源表结构：

Idvarchar10

Columnvarchar10

DefineTransformasfollows

StageVariables：

currentKey

Initialvalue="

Derivation=L1.Id

newRecord

”

Derivation=ifcurrentKey=lastKeyThennewRecord：

”,"

L1.ColumnelseL1.Column

lastKey

Initialvalue=”"

Derivation=currentKey

L2Deriviations

L2.key=L1。

L2。

line=newRecord

目标表结构：

Idvarchar10（markedasthekey）

Newcolvarchar200

（注意:

StageVariable的有先后顺序的,它是按照先后顺序来赋值的，所以lastKey要在newRecord后面）

如果把Newcol的值放在不同的字段中，格式如下：

IdCol1Col2Col3

2RootpathWorkdateEdsDbname

3RootpathWorkdateAsdmDbname

解决方法是：

把NewColm的值读出来赋给一个StageVariable，然后使用Field（NewCord，"

，1），Field（NewCord，”,"

2）。

。

等等，把值赋给每个Colm.

ParallelJob的做法:

（按照SERVERJOB的做法，然后改成串行方式也能实现）

使用SortStage对KeyColumn：

Id进行分区和排序；

并设置CreateKeyChangeColumn=True（作用是第一条记录会标识为1，其它0），产生KeyChangeColumn;

运行出来的结果如下：

IdColumnKeyChange

—-—---—-—---——--—-----

2Rootpath1

2Workdate0

2EdsDbname0

3Rootpath1

3Workdate0

3AsdmDbname0

2。

在TransformStage里创建StageVariable；

创建变量后,根据KeyChange的值来设置变量的值;

如：

创建变量svBuildColum,

赋值：

ifDSLink12.keyChange=1thenDSLink12.ColumnelsesvBuildColumn:

＄”:

DSLink12.Column

IdColumnKeyChangesvBuildColum

—-———-—--——-—--—-----————---——-—----——--—-—------—-------—-—---————-——

2Rootpath1Rootpath

2Workdate0Rootpath＄Workdate

2EdsDbname0Rootpath$Workdate$EdsDbname

3Rootpath1Rootpath

3Workdate0Rootpath＄Workdate

3AsdmDbname0Rootpath$Workdate＄AsdmDbname

3。

使用Remove_DuplicatesStage,根据KeyColum：

Id去除重复行，并RetainLast;

运行的结果如下:

IdsvBuildColum

--———--—-——----—————————-———-———--———-——-————-———--

2Rootpath＄Workdate＄EdsDbname

3Rootpath$Workdate＄AsdmDbname

4。

如果把svBuildColum的值放在不同的字段中,使用Field（NewCord,"

＄"

，1），Field（NewCord，”＄"

，2）..等等，把值赋给每个Colm.

最后结果如下：

2.11OracleEEStage在VIEW数据时出现的错误及解决方法

错误信息如下：

＃＃ITOSH00000204:

05：

22（001）<

main_program〉orchgeneral：

loaded

05:

22（002）<

main_program>

orchsort:

22（003）<

orchstats:

#＃ETOSH00020504：

22（004）<

main_program〉PATHsearchfailure:

〉＃＃ETOSH00000004:

22（005）<

Errorloading”orchoracle”:

Couldnotload"

orchoracle”:

Thespecifiedmodulecouldnotbefound。

##ETOSH00000004:

22（006）<

Couldnotlocateoperatordefinition,wrapper，orUnixcommandfor”oraread”；

pleasecheckthatallneededlibrariesarepreloaded,andcheckthePATHforthewrappers

#＃ETCOS00002904：

22（007）〈main_program〉Creationofstepfinishedwithstatus=FAILED

解决方法：

running7.5x2EEontheWindowsplatform

1.cdtoyourC：

\Ascential\DataStage\PXEngine\install

typesh

3.ORACLE_HOME="

C：

/Your_Oracle_Client”

4.exportORACLE_HOME

5.APT_ORCHHOME=”C:

/Ascential/DataStage/PXEngine”

6.exportAPT_ORCHHOME

7。

shinstall。

liborchoracle

thenyouwillseethemessageonthescreen;

InstallingOracleDrvie

UsingC：

/Your_Oracle_ClientasORACLE_HOME

InstallingdriveforOracleversion9ior10g

Oracelinstallationiscomplete.

Rebootthemachineafteraboveisdone

2.12DataStageSAPStage的使用

见附件：

2.13ColumImportStage的使用

将一个字段中的数据输出到多个字段中，完成分割单个字段数据到多个字段的目的；

输入数据应为定长或者有可以被识别的可分割的界限，必须是String或者Binary类型的，输出数据可以是任何数据类型;

字段分割后：

14ColumExportStage的使用

与ColumnImportStage相反，将多个类型不同的字段合并成一个string或者binary类型的字段

合并字段后：

2.15GotERROR：

CannotfindanyprocessnumberforstagesinJobJobname解决

当我们用Director来ClearUpResources或ClearStatusFile时候，会出现上面的错误提示；

原因是：

incorrectpermissionssettings。

Followingcorrectsettingsweredonetosolvetheproblem：

—rwsr—x--x1rootdstage1519616Nov132003dsdlockd

-rwsr—x—-x1rootdstage1499136Nov132003dslictool

—rwsr-x——x1rootdstage3678208Nov132003dstskup

—rwsr-x-—x1rootdstage1519616Nov132003list_readu

—rwsr—x——x1rootdstage1486848Nov132003upduvtrans

—rwsr—x-—x1rootdstage53248Nov132003uv

—rwsr—x-—x2rootdstage3796992Nov132003uvbackup

-rwsr—x—-x1rootdstage49152Nov132003uvdls

—rwsr—x—-x2rootdstage3796992Nov132003uvrestore

-rwsr—x--x1rootdstage16384Nov132003uvsetacc

Settingsforalltheabovewasfoundtobeincorrect.dsadmwastheownerinsteadofrootandalsopermissionswereincorrect。

16UnabletocreateRT_CONFIGnnn

造成这种问题的最普遍的两个原因是:

Isthefilesystemonwhichyourprojectdirectoryexistsfull;

Doyouhavewritepermissiontoyourprojectdirectory

2.17查看job和client的对应的后台进程

＄ps—fudsadm

UID

PID

PPID

STIMETTY

TIMECMD

dsadm1177911776009:

02：

02?

14phantomDSD。

StageRunloadDataDayAg。

loadupdIRCashIVDayAg.xfm30/0

dsadm17611760208:

56：

27？

23：

16phantomDSD.RUNBatch:

MasterControlOrderDetail.0ParameterFile=/var/opt/dat

前者是Jobstagethread,后者是Jobmainthread

dsadm2986529863

Oct25？

2：

57dsapi_slave870（Userclientdatabaseslave）

2.18强制杀死DS进程

Cd$DSHOME/bin

list_readu

ps–ef|grepusername

2.19查看ServerEngine的进程

＄netstat-a｜grepuvor$netstat-a｜grepdsrpc

*。

uvrpc*。

*00245760LISTEN——--——Daemonlistener

.。

。

....。

..。

...ESTABLISHED—-——-—Clientsattached

$ipcs——----Sharedmemoryusage

$ps-ef|grepuni---———Enginedaemon

root1297010Oct09？

0：

11/opt/Ascential/DataStage/unishared/unirpc/unirpcd—-—-—Enginedaemon

2.20查看ServerLocks

$cd`cat/.dshome`

$.。

/dsenv

$bin/uvsh———-—-DSEnginecommandprompt

上面的操作等同于DataStageAdminitratot--—>

Projects（tab）—-—>

Command（button）

〉DS.TOOLSVerb”DS.TOOLS”isnotinyourVOC。

DS.TOOLSutility

〉LOGTOyourprojectname>

DS.TOOLSWhichwouldyoulike？

（1—6）？

5Whichwouldyoulike?

（1—11）?

LISTU——-—UsersinDataStage

〉LIST.READU—--—Listlocktablecommand

上面操作等同于$DSHOME/bin/list_readu

〉QUIT

同样，在查看jobpid及locks也可以通过datastagedirector———>

Jobs-—->

ClearupResources

4，DataStageFilesystemMountPoints

＄cd`cat/。

dshome`

$df-k。

5，DataStageEngineDaemon

＄cd/etc/rc2。

＄moreS999ds.rc

2.21关于UNIX系统下无法启动服务的解决办法

在诊断启动失败的原因之前，先说说如何停止服务。

启停服务的命令大家都知道，要注意的是停止服务之前应先确保无client连接、无端口连接：

1.使用ps-ef|grepds查看client连接情况，如果还有client连接，你又无法查找是谁,急需重启，可以通过director将所有的连接logoff

2.使用netstat-a｜grepdsrpc查看网路连接状况，确保只有listen状态

这样,将服务停止,会很顺利的重启服务。

当执行完重启命令后，使用ps—ef|grepdsrpcd查看服务是否启动，如果此服务没有启动，查看：

1.ps查看有无client连接，杀掉进程。

stat查看网络情况，有无FIN_WAIT_2orCLOSE_WAIT等的tcp状态,如果有，则使用ndd命令调整datastage的端口连接，

方法如下（如hp-unix）：

查找进程号：

ndd-get/dev/tcptcp_status｜grep-estate—eFIN_WAIT_2

断开连接，释放端口：

ndd-set/dev/tcptcp_discon0x+进程号

如果上述情况都核实后，仍启动失败

请使用”./bindsrpcd-d9>

/tmp/dsrpcd。

log2〉&

1＆"

来启动服务,在dsrpcd.log文件中看到启动日志信息，根据实际情况解决。

2.22Lockedbyotheruser

启动DataStagDirector，Job-—--〉ClearupResources，在这

展开阅读全文