数据挖掘教程英文版.docx

上传人:b****8 文档编号:24057169 上传时间:2023-05-23 格式:DOCX 页数:13 大小:39.60KB
下载 相关 举报
数据挖掘教程英文版.docx_第1页
第1页 / 共13页
数据挖掘教程英文版.docx_第2页
第2页 / 共13页
数据挖掘教程英文版.docx_第3页
第3页 / 共13页
数据挖掘教程英文版.docx_第4页
第4页 / 共13页
数据挖掘教程英文版.docx_第5页
第5页 / 共13页
点击查看更多>>
下载资源
资源描述

数据挖掘教程英文版.docx

《数据挖掘教程英文版.docx》由会员分享,可在线阅读,更多相关《数据挖掘教程英文版.docx(13页珍藏版)》请在冰豆网上搜索。

数据挖掘教程英文版.docx

数据挖掘教程英文版

DataMiningTutorial

Author:

SethPaulJamieMacLennanZhaohuiTangScottOveson

Abstract:

Microsoft®SQLServer™2005providesanintegratedenvironmentforcreatingandworkingwithdataminingmodels.This tutorialusesfourscenarios,targeted mailing, forecasting, market basket,and sequence clustering,todemonstratehowtousetheminingmodelalgorithms,miningmodelviewers,anddataminingtools thatareincludedinthisreleaseofSQLServer.

TheinformationcontainedinthisdocumentrepresentsthecurrentviewofMicrosoftCorporationontheissuesdiscussedasofthedateofpublication.BecauseMicrosoftmustrespondtochangingmarketconditions,itshouldnotbeinterpretedtobeacommitmentonthepartofMicrosoft,andMicrosoftcannotguaranteetheaccuracyofanyinformationpresentedafterthedateofpublication.

Thiswhitepaperisforinformationalpurposesonly.MICROSOFTMAKESNOWARRANTIES,EXPRESSORIMPLIED,ASTOTHEINFORMATIONINTHISDOCUMENT.

Complyingwithallapplicablecopyrightlawsistheresponsibilityoftheuser.Withoutlimitingtherightsundercopyright,nopartofthisdocumentmaybereproduced,storedinorintroducedintoaretrievalsystem,ortransmittedinanyformorbyanymeans(electronic,mechanical,photocopying,recording,orotherwise),orforanypurpose,withouttheexpresswrittenpermissionofMicrosoftCorporation.

Microsoftmayhavepatents,patentapplications,trademarks,copyrights,orotherintellectualpropertyrightscoveringsubjectmatterinthisdocument.ExceptasexpresslyprovidedinanywrittenlicenseagreementfromMicrosoft,thefurnishingofthisdocumentdoesnotgiveyouanylicensetothesepatents,trademarks,copyrights,orotherintellectualproperty.

©2003MicrosoftCorporation.Allrightsreserved.

MicrosoftiseitheraregisteredtrademarkoratrademarkofMicrosoftCorporationintheUnitedStatesand/orothercountries.

Thenamesofactualcompaniesandproductsmentionedhereinmaybethetrademarksoftheirrespectiveowner

Introduction

ThedataminingtutorialisdesignedtowalkyouthroughtheprocessofcreatingdataminingmodelsinMicrosoftSQLServer2005.ThedataminingalgorithmsandtoolsinSQLServer2005makeiteasytobuildacomprehensivesolutionforavarietyofprojects,includingmarketbasketanalysis,forecastinganalysis,andtargetedmailinganalysis.Thescenariosforthesesolutionsareexplainedingreaterdetaillaterinthetutorial.

ThemostvisiblecomponentsinSQLServer2005aretheworkspacesthatyouusetocreateandworkwithdataminingmodels.Theonlineanalyticalprocessing(OLAP)anddataminingtoolsareconsolidatedintotwoworkingenvironments:

BusinessIntelligenceDevelopmentStudioandSQLServerManagementStudio.UsingBusinessIntelligenceDevelopmentStudio,youcandevelopanAnalysisServicesprojectdisconnectedfromtheserver.Whentheprojectisready,youcandeployittotheserver.Youcanalsoworkdirectlyagainsttheserver.ThemainfunctionofSQLServerManagementStudioistomanagetheserver.Eachenvironmentisdescribedinmoredetaillaterinthisintroduction.Formoreinformationonchoosingbetweenthetwoenvironments,see"ChoosingBetweenSQLServerManagementStudioandBusinessIntelligenceDevelopmentStudio"inSQLServerBooksOnline.

Allofthedataminingtoolsexistinthedataminingeditor.Usingtheeditoryoucanmanageminingmodels,createnewmodels,viewmodels,comparemodels,andcreatepredictionsbasedonexistingmodels.

Afteryoubuildaminingmodel,youwillwanttoexploreit,lookingforinterestingpatternsandrules.Eachminingmodelviewerintheeditoriscustomizedtoexploremodelsbuiltwithaspecificalgorithm.Formoreinformationabouttheviewers,see"ViewingaDataMiningModel"inSQLServerBooksOnline.

Oftenyourprojectwillcontainseveralminingmodels,sobeforeyoucanuseamodeltocreatepredictions,youneedtobeabletodeterminewhichmodelisthemostaccurate.Forthisreason,theeditorcontainsamodelcomparisontoolcalledtheMiningAccuracyCharttab.Usingthistoolyoucancomparethepredictiveaccuracyofyourmodelsanddeterminethebestmodel.

Tocreatepredictions,youwillusetheDataMiningExtensions(DMX)language.DMXextendsSQL,containingcommandstocreate,modify,andpredictagainstminingmodels.FormoreinformationaboutDMX,see"DataMiningExtensions(DMX)Reference"inSQLServerBooksOnline.Becausecreatingapredictioncanbecomplicated,thedataminingeditorcontainsatoolcalledPredictionQueryBuilder,whichallowsyoutobuildqueriesusingagraphicalinterface.YoucanalsoviewtheDMXcodethatisgeneratedbythequerybuilder.

Justasimportantasthetoolsthatyouusetoworkwithandcreatedataminingmodelsarethemechanicsbywhichtheyarecreated.Thekeytocreatingaminingmodelisthedataminingalgorithm.Thealgorithmfindspatternsinthedatathatyoupassit,andittranslatesthemintoaminingmodel—itistheenginebehindtheprocess.SQLServer2005includesninealgorithms:

∙MicrosoftDecisionTrees

∙MicrosoftClustering

∙MicrosoftNaïveBayes

∙MicrosoftSequenceClustering

∙MicrosoftTimeSeries

∙MicrosoftAssociation

∙MicrosoftNeuralNetwork

∙MicrosoftLinearRegression

∙MicrosoftLogisticRegression

Usingacombinationoftheseninealgorithms,youcancreatesolutionstocommonbusinessproblems.Thesealgorithmsaredescribedinmoredetaillaterinthistutorial.

Someofthemostimportantstepsincreatingadataminingsolutionareconsolidating,cleaning,andpreparingthedatatobeusedtocreatetheminingmodels.SQLServer2005includestheDataTransformationServices(DTS)workingenvironment,whichcontainstoolsthatyoucanusetoclean,validate,andprepareyourdata.FormoreinformationonusingDTSinconjunctionwithadataminingsolution,see"DTSDataMiningTasksandTransformations"inSQLServerBooksOnline.

InordertodemonstratetheSQLServerdataminingfeatures,thistutorialusesanewsampledatabasecalledAdventureWorksDW.ThedatabaseisincludedwithSQLServer2005,anditsupportsOLAPanddataminingfunctionality.Inordertomakethesampledatabaseavailable,youneedtoselectthesampledatabaseattheinstallationtimeinthe“Advanced”dialogforcomponentselection.

Theaudienceforthistutorialisbusinessanalysts,developers,anddatabaseadministratorswhohaveuseddataminingtoolsbeforeandarefamiliarwithdataminingconcepts.Ifyouarenewtodatamining,download"PreparingandMiningDatawithMicrosoftSQLServer2000andAnalysisServices"(

AdventureWorks

AdventureWorksDWisbasedonafictionalbicyclemanufacturingcompanynamedAdventureWorksCycles.AdventureWorksproducesanddistributesmetalandcompositebicyclestoNorthAmerican,European,andAsiancommercialmarkets.ThebaseofoperationsislocatedinBothell,Washingtonwith500employees,andseveralregionalsalesteamsarelocatedthroughouttheirmarketbase.

AdventureWorkssellsproductswholesaletospecialtyshopsandtoindividualsthroughtheInternet.Forthedataminingexercises,youwillworkwiththeAdventureWorksDWInternetsalestables,whichcontainrealisticpatternsthatworkwellfordataminingexercises.

FormoreinformationonAdventureWorksCyclessee"SampleDatabasesandBusinessScenarios"inSQLServerBooksOnline.

DatabaseDetails

TheInternetsalesschemacontainsinformationabout9,242customers.Thesecustomersliveinsixcountries,whicharecombinedintothreeregions:

∙NorthAmerica(83%)

∙Europe(12%)

∙Australia(7%)

Thedatabasecontainsdataforthreefiscalyears:

2002,2003,and2004.

Theproductsinthedatabasearebrokendownbysubcategory,model,andproduct.

BusinessIntelligenceDevelopmentStudio

BusinessIntelligenceDevelopmentStudioisasetoftoolsdesignedforcreatingbusinessintelligenceprojects.BecauseBusinessIntelligenceDevelopmentStudiowascreatedasanIDEenvironmentinwhichyoucancreateacompletesolution,youworkdisconnectedfromtheserver.Youcanchangeyourdataminingobjectsasmuchasyouwant,butthechangesarenotreflectedontheserveruntilafteryoudeploytheproject.

WorkinginanIDEisbeneficialforthefollowingreasons:

∙YouhavepowerfulcustomizationtoolsavailabletoconfigureBusinessIntelligenceDevelopmentStudiotosuityourneeds.

∙YoucanintegrateyourAnalysisServicesprojectwithavarietyofotherbusinessintelligenceprojectsencapsulatingyourentiresolutionintoasingleview.

∙Fullsourcecontrolintegrationenablesyourentireteamtocollaborateincreatingacompletebusinessintelligencesolution.

TheAnalysisServicesprojectistheentrypointforabusinessintelligencesolution.AnAnalysisServicesprojectencapsulatesminingmodelsandOLAPcubes,alongwithsupplementalobjectsthatmakeuptheAnalysisServicesdatabase.FromBusinessIntelligenceDevelopmentStudio,youcancreateandeditAnalysisServicesobjectswithinaprojectanddeploytheprojecttotheappropriateAnalysisServicesserverorservers.

IfyouareworkingwithanexistingAnalysisServicesproject,youcanalsouseBusinessIntelligenceDevelopmentStudiotoworkconnectedtheserver.Inthisway,changesarereflecteddirectlyontheserverwithouthavingtodeploythesolution.

SQLServerManagementStudio

SQLServerManagementStudioisacollectionofadministrativeandscriptingtoolsforworkingwithMicrosoftSQLServercomponents.ThisworkspacediffersfromBusinessIntelligenceDevelopmentStudiointhatyouareworkinginaconnectedenvironmentwhereactionsareprop

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 求职职场 > 自我管理与提升

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1