英文文献及翻译.docx

上传人:b****5 文档编号:10184936 上传时间:2023-02-09 格式:DOCX 页数:13 大小:62.69KB
下载 相关 举报
英文文献及翻译.docx_第1页
第1页 / 共13页
英文文献及翻译.docx_第2页
第2页 / 共13页
英文文献及翻译.docx_第3页
第3页 / 共13页
英文文献及翻译.docx_第4页
第4页 / 共13页
英文文献及翻译.docx_第5页
第5页 / 共13页
点击查看更多>>
下载资源
资源描述

英文文献及翻译.docx

《英文文献及翻译.docx》由会员分享,可在线阅读,更多相关《英文文献及翻译.docx(13页珍藏版)》请在冰豆网上搜索。

英文文献及翻译.docx

英文文献及翻译

 

姓名:

_____郭鑫____

学号:

____120360114____

专业:

信息管理与信息系统

班级:

1203601__

指导教师:

胡仕成___

经济与管理学院

哈尔滨工业大学

BasedonDataMiningAnalysisofauditdata

[Abstract]Inthispaper,acomputerauditofthestatusquoisproposedbasedondataminingauditdataanalysisprocess,andtheapplicationofDBSCANclusteringalgorithmtofindtheauditevidence. 

[Keywords]ComputerAudit,DataMining,ClusteringAlgorithms,NoiseData

Withtheeconomyandinformationtechnologycontinuestoevolve,manycompaniesbegantointroducetheERPandothersystems,whichmakesthecompany'smanyactivitiesinreal-timedatalogging,theformationofalargenumberofthebusinessmanagementofthedatawarehousefromthemassiveamountsofdatatoobtainusefulauditdataisanapplicationofcomputeraudit.Fortheauditstaff,howunitmassfromtheauditeddatainfindingacomprehensive,highqualityauditdatatoidentifytheauditevidenceisaproblemthispaperusingdataminingtechniquesdiscussedthisissueandproposedsolution. 

DataMining(DataMiningisthetimefromalotof,incomplete,noisy,fuzzy,andthepracticalapplicationofrandomdatatoextractthehidden,unknown,butpotentiallyusefulinformationandknowledgeoftheprocess[1]Infact,thepracticalapplicationofthequalityofdataandstoragemodelsforthesuccessfulimplementationofcomputerauditandtoobtainauditevidenceisveryimportant.Beishendanweiinformationsystemshardwareandsoftwareplatformastheheterogeneityandpossibleman-madeintentionalconcealment,fraud,etc.,ascomputerauditworktoensurethesmoothandcorrectauditfindings,theauditdatacollectionmustbecheckedfordata,controlandanalysis. 

AnAuditOfDataCollection

Auditdataacquisitionmeanstocarryoutcomputerauditfromtheauditedentity'sfinancialandbusinessinformationsystemsauditandotherdatasourcestoobtainthenecessaryandappropriateelectronicdataformatconversion[3]Ingeneral,thedataacquisitioncomputerauditmethodsincludethefollowing:

 

(1Beishendanweiinformationsystemsusingdataexportcapabilities. Mostoftheinformationmanagementsystemprovidesadataexportfunctions,auditorscanusethefeaturetoexportdirectlytocorporatefinancialdatatocompletedatacollection. 

(2)theuseofcommondataprocessingsoftwarefordataacquisition,suchasAccess,SQLServerandsohasamorepowerfuldataimportandexportfunctionsanddataconversion.Auditorscanusethesoftwarefordatacollection,suchasrawdatabythetrialenterprisesforthetextformatcanbeconvertedtoadatabasetableformat. 

(3useofauditsoftwarefordataacquisition,suchastheconstructionofthecountryfrom2002's"GoldenAuditProject"on-siteauditoftheimplementationofthesystem(AOandauditofficesystems(OAasacomputer-assistedaudittoolsdobusinessoutsideoftheapplicationofdomesticfinancialauditsoftware,auditdataacquisitionandanalysissoftwaresoyoucancompletetheauditdatacollection. 

(4usingadedicatedinterfacetocompletethedatacollectionprocesswhentheauditedentitytoprovideauditdata,auditofexistingdatastructuresanddataprocessingsoftwaresystemdatastructuresarequitedifferent,youcanauditwiththeassistanceofthededicatedprogrammerdevelopedinterfaceprogram,thecompletionofdatacollection,butthecostisrelativelyhigh. 

2DataCleaning

Useofdataminingclassificationofauditdataprocessing,inordertoimprovetheclassificationaccuracy,efficiencyandscalability,thedatabasemustbepre-processing,including:

datacleaning,correlationanalysis,dataconversion. 

[4]givesdatacleansingisdefinedas:

findandeliminatedataerrorsandinconsistenciestoimprovethequalityofthedata. Ingeneral,theauditdatabase,dataacquisitionoperationsinheterogeneousdatabases,thereareinevitableerrorsinthedataorinconsistenciesandotherissues,suchasdatafraud,dataduplication,dataerrorssuchasmissing,accordingtotheliterature[5]proposedauditdataqualitycharacteristics,havetocollecttherawdataforcleaning,thatis"dirty"to"clean",improvedataqualityaudit,whichistoensurethatthecorrectkeyauditfindings. 

ThegeneralprocessofdatacleaningasshowninFigure2. 

(1dataanalysis:

Inordertocleanoutthecleandata,theneedfordetailedanalysisofthedata,includingdataformatsandcategories,suchasfinancialdatacollectedtothefieldtype,width,meaning,etc. 

(2modeconversion:

modeconversionmainlyreferstothesourcedataismappedintothetargetdatamodel,suchastheconversionofproperty,fieldconstraintsanddatabasemappingbetweendifferentdatasetsandconversionsometimesneedmorethanonedatatablecombinedintoatwo-dimensionalform,andsometimeshavetosplitatableintomultipletwo-dimensionalforminordertosolvetheproblem. 

(3datavalidation:

themodeswitchonthestepifpossible,theneedforassessmenttests,afterrepeatedanalysis,design,calculationandanalysisinordertobettercleanthedataorwithoutdatavalidationmaybesomeerroneousdataisnotveryclear,notbewellscreened,suchasmodeconversiontoadatasetintomultipledatatable,resultingintheparenttable'sprimarykeyvalueandthechildtableforeignkeyvalues​​areinconsistent,andthustheformationofisolatedrecords,theauditstaffevidenceofthecorrectnessoftheaudit,therebyaffectingtheaccuracyofauditfindings. 

(4)Databack:

withthe"clean"alternativetotheoriginaldatasourcedatainthe"dirty"data,datacollectiontoavoidthenexttimeredodatacleaning. 

Sometimestheneedforrepeateddatacleaning,auditorsneedtocollectelectronicdataonmultiplecleaning,inordertoobtainhigh-qualityauditdata. 

3DataMiningToAchieve

Afterdatapre-auditafterthedatabasecontainsanumberofdatasets,eachdatasetalsocontainsanumberofdatarecordsortuplesashowthedatafromthesetwo-dimensionalformmeaningfulauditdataminingiscrucial.Thispaperpresentsaclusteringalgorithmtoaudittheuseofdataminingalgorithms. 

3.1OverviewOfAlgorithm

3.1.1ClusteringAlgorithm

Theso-calledclusteringisthesimilarityofdataobjectsaccordingtogroup,foundthatthedatadistribution,makingthedataineachclusterhasaveryhighsimilarityofthedataindifferentclustersasdifferent[6]It'sthesameclassificationThemaindifferenceisthatclassificationbasedonpriorknowledgeofthecharacteristicsofthedata,anddataclusteringistofindthisfeatureasafunctionofdatamining,clusteranalysiscanbeusedasadistributionforthedatatoobservethecharacteristicsofeachclassandthespecificclassfurtherindependentanalysistoolforclusteringcaneffectivelydealwithnoisydata,suchasthedatabasegenerallycontainsisolatedpoints,thevacancyorerrordata. 

Clusteringalgorithmsareusuallyfivecategories[7]:

①basedclassificationmethods,suchasCLARANS,②basedonhierarchicalmethods,suchasCUREandBIRCH,③density-basedmethods,suchasDBSCAN,OPTICS,GDBSCANandDBRS;④network-basedgridmethods,suchasSTINGandWaveCluster,⑤model-basedmethods,suchasCOBWEB.DBSCANalgorithmwhichhasagoodadvantageoffilteringthenoisedata.ThispaperdiscussestheuseofDBSCANalgorithmtoprocesstheauditdatatoidentifyabnormaldata,findouttheauditevidence. 

3.1.2DBSCANAlgorithm

Thebasicideaof​​DBSCANalgorithm[8]:

forthesameclustereachobjectinagivenneighborhoodradiusdoftheobjectcontainsnotlessthanaminimumnumberofagivenMinPts(alsocalleddensity). 

Inordertosurviveacluster,DBSCANalgorithmfirstselectedfromthedatasetDBanyobjectp,andfindthedatasetDBontheneighborhoodradiusdofallobjects,iftheneighborhoodislessthantheminimumnumberofobjectsthenumberofMinPts,thenpisthenoisedata,objectorneighborhoodofptoformaninitialclusterN,Ncontainsobjectspandpdirectlydensity-reachablealltheobjectsandthendetermineforeachobjectclassisthecoreobjectq,if,willbethed-neighborhoodofqdoesnotyetcontainalltheobjectsappendedtotheNtoN,andcontinuetodeterminewhetheranewobjectaddedasthecoreobject,andifso,repeattheaboveadditionalprocessuntiltheclustercannotbeextendedsofar.DBSCANalgorithmthenre-electtheDBinadatasethasnotbeenidentifiedasaclusterornoiseobjects,repeattheaboveoperation,thedatasetuntilallobjectsintheDBorisidentifiedasacluster,eitheridentifiedasnoisedatasofar. 

DBSCANclusteringalgorithmtoperformdatacollectionprocessisthecontinuousprocessofcomparingthequery,andfinallythenoisedataiscommonlyreferredtoasabnormaldata,theauditorsfortheaudittohelpdeterminetheveryeffectiveFigure3showsthetwo-dimensionalcoordinatesnoisedataandthenumberofclusters. 

3.2DefinitionOfDataModel

3.2.1TheDistanceBetween

IsSetRiAndRjDBDataSetsTwoRecordsInWhichAnyTwoDataItemsA,TheDistanceBetweenThemIsDefinedAs:

Where,Ri(Rix,Riy,Rj(Rjx,RjytwoitemsthatthedatasetRiandRjinthetwo-dimensionalcoordinatesofthepoints,sothatRiandRjdijtwo-dimensionalspatialcoordinatesinthedistanceisgreaterthanifdijtogivenvalued,saidRiandRjdoesnotbelongtothesameclustergroup.Linkstofreedownload

3.2.2Pre-AuditData

Choiceofdataminingdataiscarriedoutintwo-dimensionalplane,firstselectthecolumns(fieldsorattributes,andthenselecttherows(recordsortuples).Inordertoobtainvalidauditevidencetoarriveatacorrectauditfindings,sometimessourcedatamustbesetfordataconversion. 

Becauseof

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 表格模板 > 调查报告

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1