文献翻译数据类型泛化用于数据挖掘算法.docx

上传人:b****4 文档编号:4263793 上传时间:2022-11-28 格式:DOCX 页数:18 大小:570.49KB
下载 相关 举报
文献翻译数据类型泛化用于数据挖掘算法.docx_第1页
第1页 / 共18页
文献翻译数据类型泛化用于数据挖掘算法.docx_第2页
第2页 / 共18页
文献翻译数据类型泛化用于数据挖掘算法.docx_第3页
第3页 / 共18页
文献翻译数据类型泛化用于数据挖掘算法.docx_第4页
第4页 / 共18页
文献翻译数据类型泛化用于数据挖掘算法.docx_第5页
第5页 / 共18页
点击查看更多>>
下载资源
资源描述

文献翻译数据类型泛化用于数据挖掘算法.docx

《文献翻译数据类型泛化用于数据挖掘算法.docx》由会员分享,可在线阅读,更多相关《文献翻译数据类型泛化用于数据挖掘算法.docx(18页珍藏版)》请在冰豆网上搜索。

文献翻译数据类型泛化用于数据挖掘算法.docx

文献翻译数据类型泛化用于数据挖掘算法

英文翻译

 

系别

专业

班级

学生姓名

学号

指导教师

 

 

DataTypesGeneralizationforDataMiningAlgorithms

 

Abstract

Withtheincreasingofdatabaseapplications,mininginterestinginformationfromhugedatabasesbecomesofmostconcernandavarietyofminingalgorithmshavebeenproposedinrecentyears.Asweknow,thedataprocessedindataminingmaybeobtainedfrommanysourcesinwhichdifferentdatatypesmaybeused.However,noalgorithmcanbeappliedtoallapplicationsduetothedifficultyforfittingdatatypesofthealgorithm,sotheselectionofanappropriateminingalgorithmisbasedonnotonlythegoalofapplication,butalsothedatafittability.Therefore,totransformthenon-fittingdatatypeintotargetoneisalsoanimportantworkindatamining,buttheworkisoftentediousorcomplexsincealotofdatatypesexistinrealworld.Mergingthesimilardatatypesofagivenselectedminingalgorithmintoageneralizeddatatypeseemstobeagoodapproachtoreducethetransformationcomplexity.Inthiswork,thedatatypesfittabilityproblemforsixkindsofwidelyuseddataminingtechniquesisdiscussedandadatatypegeneralizationprocessincludingmergingandtransformingphasesisproposed.Inthemergingphase,theoriginaldatatypesofdatasourcestobeminedarefirstmergedintothegeneralizedones.Thetransformingphaseisthenusedtoconvertthegeneralizeddatatypesintothetargetonesfortheselectedminingalgorithm.Usingthedatatypegeneralizationprocess,theusercanselectappropriateminingalgorithmjustforthegoalofapplicationwithoutconsideringthedatatypes.

1.Introduction

Inrecentyears,theamountofvariousdatagrowsrapidlyWidelyavailable,low-costcomputertechnologynowmakesitpossibletobothcollecthistoricaldataandalsoinstituteon-lineanalysisfornewlyarrivingdata.AutomateddatagenerationandgatheringleadstotremendousamountsofdatastoredindatabasesAlthoughwearefilledwithdata,butwelackforknowledge.Dataminingistheautomateddiscoveryofnon-trivial,previouslyunknown,andpotentiallyusefulknowledgeembeddedindatabases.Differentkindsofdataminingmethodsandalgorithmshavebeenproposed,eachofwhichhasitsownadvantagesandsuitableapplicationdomains.However,itisdifficultforuserstochooseanappropriateonebythemselves.tochooseanappropriateonebythemselves.Thisisbecausethedataprovidedcannotbedirectlyusedfordataminingalgorithms.Sincemostdataminingalgorithmscanonlybeappliedtosomespecificdatatypes,thetypesofdatastoredindatabasesrestrictsthechoiceofdataminingmethods.Ifcertainkindsofknowledgeneedtobeobtainedusingsomedataminingalgorithms,datatypestransformationshouldbedonefirstandthisiswhatwecalled“thedatatypesfittabilityproblem”fordatamining.Forthetimebeing,thereisnotoolthatcanhelpuserstodothiskindofdatatypestransformation.Inthispaper,wewillsurveyandanalyzethedatatypesfittabilityproblemfordataminingalgorithms,andthenweproposea“datatypesgeneralizationprocess”tosolvethedatatypesfittabilityproblemfortheattributesinrelationaldatabases.

The“datatypesgeneralizationprocess”includingmergingandtransformingphasesisaproceduretotransformthedatatypesofatttributescontainedinrelations(tables).Inthemergingphase,theoriginaldatatypesofdatasourcestobeminedarefirstmergedintothegeneralizedones.Thetransformingphaseisthenusedtoconvertthegeneralizeddatatypesintothetargetonesfortheselectedminingalgorithm.Usingthedatatypegeneralizationprocess,theusercanselectappropriateminingalgorithmjustforthegoalofapplicationwithoutconsideringthedatatypes.

2.Relatedwork

Asmentionedabove,becausemanydataminingalgorithmscanonlybeappliedtothedatatypeswithrestrictedrange,userspossiblyneedtododatatypestransformationbeforetheselectedalgorithmhasbeenexecuted.Inthispaper,weproposeageneralconceptcalled“datatypesgeneralizationprocess“whichprovideaprocedurefordoingthiskindofdatatypestransformation.Datatypesgeneralizationcanbeseenasapre-processingofdatamining.Ofcourse,otherpre-processingsuchasdataselection,datacleaning,dimension(attribute)reduction,missingdatahandlingmayalsoneedtobeperformedbeforerunningtheselecteddataminingalgorithm.Insummary,thewholeprocessofdataminingistheso-calledKDD(knowledgediscoveryindatabases),asshowninFigure1.

Figure1:

TheKDDprocessandtheroleofdatatypesgeneralization.

Thereisamajordifferencebetweenthedatatypesgeneralizationprocessandotherdataminingpre-processes.Otherpre-processes(likemissingvaluehandling)areallindependentoftheselecteddataminingmethod.Thatis,theycanbedonewithoutknowingwhatdataminingalgorithmwillbeused.Butitisclearthatdatatypesgeneralizationprocessdependsonthedesiredminingmethod.Thetargetofdoingdatatransformationusingdatatypesgeneralizationistomakethespecifieddatasetsuitablefortheminingalgorithm.Therefore,ifwewanttoachievethisgoal,wemustsurveyboththedatatypesindatabasesandtheirrelationswithvariousdataminingmethods.TheflowofsolvingadataminingproblemwithdoingdatatransformationisillustratedinFigure2.

Figure2:

Solvingdataminingproblemswithdatatransformationdatatypestransformation

Someresearchersproposedhowtogeneralizethedatacontainedinattributesusing"attribute-orientedinduction"whichallowsthegeneralizationofdata,offerstwomajoradvantagesfortheminingoflargedatabases.First,itallowstherawdatatobehandledathigherconceptuallevels.Generalizationisperformedwiththeuseof"attributeconcepthierarchies",wheretheleavesofagivenattribute'sconcepthierarchycorre-spondtotheattribute'svaluesinthedata(referredtoasprimitiveleveldata).Generalizationofthetrainingdataisachievedbyreplacingprimitiveleveldatabyhigherlevelconcepts.

Infact,datageneralizationusingattributeconcepthierarchiesisakindofdatatypetransformationwhichreducesthenumberofdistinctvaluescontainedinattributes.Wefirstprovideatypicaldescriptionofthedatatypesfittabilityproblemandadatatypesgeneralizationprocesstodefineandsolvethedatatypestransformationproblemforattributes.Hence,datageneralizationusingconcepthierarchiesisincludedintheprocessforperformingspecifieddatatypestransformation.

Anotherrelatedworkisthatsomeresearcherssurveyedabouthowtotransformdataintonumericalvalues.Almostalldata-drivenalgorithmsutilizenumericinputs.Fromacomputerprocessingpointofview,handlingcomputationswithnumbersiseasierandmoreefficient.Therefore,iftheinputvaluesarenon-numeric(e.g.,textstrings),theyshouldbeintelligentlyconvertedtomeaningfulnumericalvaluesinmanycases.Numericalvaluescanbeseenasadatatypeandtransformingdataintonumericalvaluesisakindofdatatypestransformation.Thestrategiesareincludedinthedatatypesgeneralizationprocessforperformingdatatypestransformation.

3.Analysisofthedatatypesfittabilityproblem

Inrecentyears,duetotheexplosionofinformationandtherapidgrowthofdatabaseapplications,dataminingtechniquesbecomemoreandmoreimportant.Forthisreason,differentkindsofdataminingmethodsoralgorithmshavebeenproposed.However,itisdifficultforuserstochooseasuitableonebythemselveswithoutpriorknowledgeaboutdatamining.Actually,thekindofdataminingmethodsshouldbeapplieddependsonboththecharacteristicofthedatatobeminedandthekindofknowledgetobefoundthroughthedataminingprocess.Hence,thetypesofdatastoredindatabasesplayanimportantroleduringthedataminingprocessandrestrictthedataminingmethodscanbechosenbyusers.Itistruethatallkindsofdataminingmethodscanonlybeappliedtoparticulardatabasessuitableforeachkindandthisiswhatwecalled"thedatatypesfittabilityproblem"fordatamining.Tosolvethisproblem,weneedtoinvestigatetherelationshipsbetweenthecharacteristicsofthedatatobeminedandvariouskindsofdataminingtechniques.Withtherelation-ships,wecanclearlyanalyzethedatatypesfittabilityproblemandfurtherknowwhetherthedatatypestransformationcanbeperformedornot.Hence,analyzingthiskindofrelationshipsisapreparationworkforourdatatypesgeneralizationprocess,whichexplainswhythedatatypesgeneralizationprocesscansolvethedatafittabilityproblem.Wenowillustratetheanalysisasfollows.

3.1Fourkindsofdataformsfordatamining

Dataminingtechniquesususallycanbeappliedtofourkindsofdataforms:

texual,temporal,transactionalandrelationalforms.Differentkindsofdataformsareusedtostoredifferentkindsofdatatypes.Wedescribeeachkindofdataformsinthefollowing:

(1)Textualdataforms:

Textualdataformsareusedtorepresenttextsordocuments.Basically,thiskindofdataformscanbeseenasasetofcharacterswithhugeamount.

(2)Temporaldataforms:

Time-seriesdataisstoredintemporaldataforms.Datathatvarieswithtime(suchashistoricaldata)canbestoredintheformofnumericaltime-series.

(3)Transactionaldataforms:

Forexample,thepasttransactionsofamarketcanbestoredintransactionaldataforms.Eachtransactionrecordsalistofitemsboughtinthattransaction.

(4)Relationaldataforms:

Relationaldataformsarethemostwidelyuseddataformsandcanstorediffierentkindsofdata.Thebasicunitsofrelationaldataformsarerelations(

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 解决方案 > 学习计划

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1