大数据挖掘外文翻译文献Word下载.docx

上传人:b****8 文档编号:22432639 上传时间:2023-02-04 格式:DOCX 页数:19 大小:168.43KB
下载 相关 举报
大数据挖掘外文翻译文献Word下载.docx_第1页
第1页 / 共19页
大数据挖掘外文翻译文献Word下载.docx_第2页
第2页 / 共19页
大数据挖掘外文翻译文献Word下载.docx_第3页
第3页 / 共19页
大数据挖掘外文翻译文献Word下载.docx_第4页
第4页 / 共19页
大数据挖掘外文翻译文献Word下载.docx_第5页
第5页 / 共19页
点击查看更多>>
下载资源
资源描述

大数据挖掘外文翻译文献Word下载.docx

《大数据挖掘外文翻译文献Word下载.docx》由会员分享,可在线阅读,更多相关《大数据挖掘外文翻译文献Word下载.docx(19页珍藏版)》请在冰豆网上搜索。

大数据挖掘外文翻译文献Word下载.docx

VHShastri,VSreeprada

文献出处:

《InternationalJournalofEmergingTrendsandTechnologyinComputerScience》,2016,38

(2):

99-103

字数统计:

英文2291单词,12196字符;

中文3868汉字

外文文献:

A StudyofDataMiningwithBig Data

AbstractDatahasbecomeanimportantpartof every economy,industry,organization,business,function and individual.BigDatais aterm usedto identifylargedatasetstypically whosesizeislarger thanthe typicaldatabase.Bigdataintroducesunique computational andstatisticalchallenges.Big Dataareatpresentexpandinginmostofthedomains ofengineeringandscience.Data mininghelpstoextractusefuldatafromthe huge datasets duetoitsvolume, variabilityandvelocity.ThisarticlepresentsaHACEtheoremthatcharacterizes the featuresoftheBig Datarevolution,andproposes aBigDataprocessingmodel, fromthedataminingperspective.

Keywords:

 BigData,DataMining,HACEtheorem,structuredandunstructured.

I.Introduction

BigDatareferstoenormousamountofstructureddata and unstructureddatathatoverflowtheorganization.If this data isproperlyused,it can leadtomeaningfulinformation.Bigdataincludes alargenumberofdatawhichrequiresalot ofprocessing inrealtime. Itprovidesaroomtodiscovernewvalues, tounderstand in-depthknowledgefromhiddenvaluesandprovide aspacetomanagethedataeffectively.A database isanorganizedcollection of logically relateddata whichcanbeeasily managed,updatedand accessed.Dataminingisaprocess discoveringinterestingknowledgesuchasassociations,patterns,changes,anomaliesandsignificantstructuresfromlargeamount ofdata storedin thedatabasesorotherrepositories.

BigData includes3 V’sasitscharacteristics.Theyarevolume,velocityandvariety. Volumemeanstheamountofdata generatedevery second.Thedataisinstateof rest.It isalso knownforitsscale characteristics.Velocity isthe speedwithwhichthe data isgenerated. Itshould havehigh speed data.Thedatagenerated fromsocialmediaisanexample. Varietymeansdifferent types ofdata canbe takensuchasaudio,videoor documents.Itcanbenumerals,images,timeseries, arraysetc.

Data Mininganalysesthedata fromdifferentperspectives andsummarizing it intouseful informationthatcanbeused for businesssolutions andpredictingthefuturetrends. Datamining (DM),alsocalledKnowledgeDiscoveryinDatabases (KDD)or KnowledgeDiscoveryandDataMining, istheprocess of searching large volumesofdata automaticallyforpatterns such asassociationrules.Itappliesmanycomputationaltechniquesfromstatistics,informationretrieval, machinelearningandpatternrecognition.Dataminingextractonly required patternsfromthedatabase inashorttime span.Basedonthetypeofpatternstobemined,dataminingtaskscanbeclassified into summarization,classification, clustering,associationandtrendsanalysis.

Big Dataisexpandingin alldomains includingscienceandengineeringfieldsincludingphysical,biologicalandbiomedicalsciences.

II.BIG DATAwithDATAMINING

Generallybig datarefersto acollectionof largevolumesofdataandthesedataaregenerated fromvarioussourceslikeinternet, social-media, business organization,sensorsetc.Wecanextract someusefulinformationwiththehelpof DataMining. It isatechniquefordiscovering patterns aswellasdescriptive,understandable,models from alarge scaleofdata.

Volumeisthesizeof the datawhichislargerthan petabytes and terabytes. Thescale andriseofsizemakes itdifficulttostoreandanalyseusingtraditional tools.BigData shouldbe usedtominelarge amountsofdatawithinthepredefinedperiod oftime.Traditional databasesystemswere designed toaddresssmallamountsofdata whichwere structured andconsistent,whereasBigDataincludeswidevarietyofdata suchasgeospatialdata,audio,video,unstructuredtextandsoon.

Big Data miningrefers totheactivityofgoing throughbig datasets to look forrelevant information. Toprocesslargevolumes ofdatafromdifferentsourcesquickly,Hadoopisused.Hadoop isafree, Java-basedprogrammingframeworkthatsupports theprocessingoflargedatasets inadistributedcomputingenvironment.Itsdistributedsupports fastdatatransfer ratesamongnodesandallowsthesystemtocontinueoperating uninterruptedattimesofnodefailure.Itruns MapReducefordistributeddataprocessingandis workswithstructuredandunstructureddata.

III.BIG DATA characteristics-HACETHEOREM.

Wehavelarge volumeofheterogeneousdata.Thereexistsa complexrelationshipamongthe data. Weneed todiscover usefulinformationfrom this voluminousdata.

Letus imagineascenarioin which theblindpeopleare askedtodrawelephant. Theinformationcollected byeach blind peoplemaythinkthetrunkaswall,legastree, bodyas walland tailasrope.Theblind men canexchangeinformationwitheachother.

Figure1:

 Blindmen andthe giantelephant

Some ofthecharacteristicsthatincludeare:

i.Vastdatawith heterogeneousand diversesources:

Oneof thefundamentalcharacteristicsofbig data is thelarge volumeofdatarepresentedbyheterogeneousanddiverse dimensions.Forexampleinthe biomedicalworld,asingle humanbeingisrepresentedasname,age,gender,family historyetc.,ForX-rayandCTscanimagesandvideosareused. Heterogeneityreferstothe differenttypesofrepresentations of sameindividualanddiverserefersto thevarietyoffeaturestorepresent singleinformation.

ii.Autonomouswithdistributedandde- centralizedcontrol:

 thesourcesare autonomous,i.e., automaticallygenerated;

itgenerates informationwithoutany centralizedcontrol.We cancompareit withWorldWideWeb(WWW)whereeachserverprovides a certain amountofinformationwithoutdependingonotherservers.

iii.Complexandevolvingrelationships:

As thesize ofthedatabecomesinfinitelylarge,therelationship that existsis also large.Inearlystages,when dataissmall,thereis nocomplexityinrelationships amongthe data. Datageneratedfrom social media and other sourceshavecomplex relationships.

IV.TOOLS:

OPEN SOURCE REVOLUTION

Largecompanies suchasFacebook, Yahoo,Twitter, LinkedInbenefitand contributeworkonopensourceprojects.In BigDataMining,therearemanyopensourceinitiatives. Themost popular of them are:

ApacheMahout:

Scalablemachinelearninganddata mining opensource softwarebasedmainly inHadoop.Ithasimplementations ofawiderangeofmachinelearninganddataminingalgorithms:

clustering,classification,collaborative filteringand frequentpatternmining.

R:

open sourceprogramminglanguageand software environmentdesigned forstatisticalcomputingand visualization.RwasdesignedbyRoss IhakaandRobert GentlemanattheUniversity ofAuckland,NewZealandbeginningin1993andisusedfor statisticalanalysisofverylargedata sets.

MOA:

 Streamdata mining opensourcesoftware toperformdatamininginrealtime. Ithas implementations ofclassification,regression;

clusteringandfrequentitemsetmining and frequent graphmining. ItstartedasaprojectoftheMachineLearning groupofUniversity of Waikato,New Zealand, famousfortheWEKAsoftware.Thestreamsframeworkprovidesanenvironmentfordefiningand runningstreamprocesses usingsimpleXML baseddefinitionsandisable touseMOA,Android andStorm.

SAMOA:

 Itisanewupcomingsoftwareprojectfordistributed streamminingthatwillcombineS4andStormwithMOA.

VowpalWabbit:

opensource projectstartedatYahoo!

 Researchand continuingatMicrosoftResearchtodesign a fast,scalable,usefullearningalgorithm.VW isabletolearnfromterafeaturedatasets. Itcanexceedthe throughputofanysinglemachinenetworkinterfacewhendoing linearlearning,viaparallellearning.

V.DATA MININGforBIGDATA

Dataminingis the processbywhichdata isanalysedcomingfromdifferent sources discoversusefulinformation.Data Miningcontainsseveralalgorithmswhich fall into 4 categories. Theyare:

1.Association Rule

2.Clustering

3.Classification

4.Regression

Associationisused tosearchrelationship between variables. Itis appliedin searching forfrequentlyvisited items.Inshort itestablishesrelationshipamongobjects. Clustering discoversgroupsandstructuresinthedata.Classificationdealswith associating anunknownstructuretoa knownstructure. Regressionfindsa functionto modelthe data.

Thedifferentdata miningalgorithms are:

Category

Algorithm

Association

Apriori,FPgrowth

Clustering

K-Means,Expectation.

Classification

Decisiontrees,SVM

Regression

Multivariate linearregression

Table1. ClassificationofAlgorithms

Data Miningalgorithmscan beconvertedintobigmapreduce algorithmbasedonparallelcomputingbasis.

BigData

Data Mining

Itiseverythinginthe worldnow.

It istheoldBigData.

Sizeof thedataislarger.

Sizeofthe dataissmaller.

Involvesstorageandprocessingoflargedatasets.

Interestingpatternscanbefound.

BigDataisthetermforlargedataset.

Dataminingrefers totheactivityofgoing throughbigdata settolookfor relevantinformation.

Bigdataisthe asset.

Datamining isthehandler which providebeneficialresult.

Big data"

variesdependingon thecapabilitiesoftheorganizationmanaging theset,and onthecapabilitiesoftheapplications thataretraditionallyusedtoprocess andanalysethedata.

Dataminingreferstotheoperation thatinvolverelativelysophisticatedsearchoperation.

Table2.Differences betweenDataMiningand BigData

VI.ChallengesinBIGDATA

M

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 高中教育 > 初中教育

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1