聚类分析文献英文翻译Word格式.doc

上传人:b****2 文档编号:14186251 上传时间:2022-10-19 格式:DOC 页数:14 大小:223.50KB
下载 相关 举报
聚类分析文献英文翻译Word格式.doc_第1页
第1页 / 共14页
聚类分析文献英文翻译Word格式.doc_第2页
第2页 / 共14页
聚类分析文献英文翻译Word格式.doc_第3页
第3页 / 共14页
聚类分析文献英文翻译Word格式.doc_第4页
第4页 / 共14页
聚类分析文献英文翻译Word格式.doc_第5页
第5页 / 共14页
点击查看更多>>
下载资源
资源描述

聚类分析文献英文翻译Word格式.doc

《聚类分析文献英文翻译Word格式.doc》由会员分享,可在线阅读,更多相关《聚类分析文献英文翻译Word格式.doc(14页珍藏版)》请在冰豆网上搜索。

聚类分析文献英文翻译Word格式.doc

译文名称:

数据挖掘—聚类分析

专业:

自动化

姓名:

****

班级学号:

****

指导教师:

******

译文出处:

Datamining:

IanH.Witten,EibeFrank著

二○一○年四月二十六日

Clustering

5.1INTRODUCTION

Clusteringissimilartoclassificationinthatdataaregrouped.However,unlikeclassification,thegroupsarenotpredefined.Instead,thegroupingisaccomplishedbyfindingsimilaritiesbetweendataaccordingtocharacteristicsfoundintheactualdata.Thegroupsarecalledclusters.Someauthorsviewclusteringasaspecialtypeofclassification.Inthistext,however,wefollowamoreconventionalviewinthatthetwoaredifferent.Manydefinitionsforclustershavebeenproposed:

lSetoflikeelements.Elementsfromdifferentclustersarenotalike.

lThedistancebetweenpointsinaclusterislessthanthedistancebetweenapointintheclusterandanypointoutsideit.

Atermsimilartoclusteringisdatabasesegmentation,whereliketuple(record)inadatabasearegroupedtogether.Thisisdonetopartitionorsegmentthedatabaseintocomponentsthatthengivetheuseramoregeneralviewofthedata.Inthiscasetext,wedonotdifferentiatebetweensegmentationandclustering.AsimpleexampleofclusteringisfoundinExample5.1.Thisexampleillustratesthefactthatthatdetermininghowtodotheclusteringisnotstraightforward.

AsillustratedinFigure5.1,agivensetofdatamaybeclusteredondifferentattributes.Hereagroupofhomesinageographicareaisshown.Thefirstfloortypeofclusteringisbasedonthelocationofthehome.Homesthataregeographicallyclosetoeachotherareclusteredtogether.Inthesecondclustering,homesaregroupedbasedonthesizeofthehouse.

Clusteringhasbeenusedinmanyapplicationdomains,includingbiology,medicine,anthropology,marketing,andeconomics.Clusteringapplicationsincludeplantandanimalclassification,diseaseclassification,imageprocessing,patternrecognition,anddocumentretrieval.Oneofthefirstdomainsinwhichclusteringwasusedwasbiologicaltaxonomy.RecentusesincludeexaminingWeblogdatatodetectusagepatterns.

Whenclusteringisappliedtoareal-worlddatabase,manyinterestingproblemsoccur:

lOutlierhandlingisdifficult.Heretheelementsdonotnaturallyfallintoanycluster.Theycanbeviewedassolitaryclusters.However,ifaclusteringalgorithmattemptstofindlargerclusters,theseoutlierswillbeforcedtobeplacedinsomecluster.Thisprocessmayresultinthecreationofpoorclustersbycombiningtwoexistingclustersandleavingtheoutlierinitsowncluster.

lDynamicdatainthedatabaseimpliesthatclustermembershipmaychangeovertime.

lInterpretingthesemanticmeaningofeachclustermaybedifficult.Withclassification,thelabelingoftheclassesisknownaheadoftime.However,withclustering,thismaynotbethecase.Thus,whentheclusteringprocessfinishescreatingasetofclusters,theexactmeaningofeachclustermaynotbeobvious.Hereiswhereadomainexpertisneededtoassignalabelorinterpretationforeachcluster.

lThereisnoonecorrectanswertoaclusteringproblem.Infact,manyanswersmaybefound.Theexactnumberofclustersrequiredisnoteasytodetermine.Again,adomainexpertmayberequired.Forexample,supposewehaveasetofdataaboutplantsthathavebeencollectedduringafieldtrip.Withoutanypriorknowledgeofplantclassification,ifweattempttodividethissetofdataintosimilargroupings,itwouldnotbeclearhowmanygroupsshouldbecreated.

lAnotherrelatedissueiswhatdatashouldbeusedofclustering.Unlikelearningduringaclassificationprocess,wherethereissomeaprioriknowledgeconcerningwhattheattributesofeachclassificationshouldbe,inclusteringwehavenosupervisedlearningtoaidtheprocess.Indeed,clusteringcanbeviewedassimilartounsupervisedlearning.

Wecanthensummarizesomebasicfeaturesofclustering(asopposedtoclassification):

lThe(best)numberofclustersisnotknown.

lTheremaynotbeanyaprioriknowledgeconcerningtheclusters.

lClusterresultsaredynamic.

TheclusteringproblemisstatedasshowninDefinition5.1.Hereweassumethatthenumberofclusterstobecreatedisaninputvalue,k.Theactualcontent(andinterpretation)ofeachcluster,,,isdeterminedasaresultofthefunctiondefinition.Withoutlossofgenerality,wewillviewthattheresultofsolvingaclusteringproblemisthatasetofclustersiscreated:

K={}.

DEFINITION5.1.GivenadatabaseD={}oftuplesandanintegervaluek,theclusteringproblemistodefineamappingf:

whereeachisassignedtoonecluster,.Acluster,containspreciselythosetuplesmappedtoit;

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 小学教育 > 小学作文

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1