如何选择机器学习演算法Microsoft Docs.docx

资源描述

如何选择机器学习演算法Microsoft Docs.docx

《如何选择机器学习演算法Microsoft Docs.docx》由会员分享，可在线阅读，更多相关《如何选择机器学习演算法Microsoft Docs.docx（7页珍藏版）》请在冰豆网上搜索。

如何选择机器学习演算法Microsoft Docs.docx

如何选择机器学习演算法MicrosoftDocs

如何選擇機器學習演算法MicrosoftDocs

如何選擇MicrosoftAzureMachineLearning的演算法HowtochoosealgorithmsforMicrosoftAzureMachineLearning18/12/2017

「我該使用何種機器學習演算法？

Theanswertothequestion'WhatmachinelearningalgorithmshouldIuse?

'」的答案永遠都是「視情況。

isalways'Itdepends.'」這可視資料的大小、品質和本質而定。

Itdependsonthesize,quality,andnatureofthedata.也可取決於您想用這個答案來做些什麼。

Itdependsonwhatyouwanttodowiththeanswer.或是取決於演算法的數學運算如何針對您正在使用的電腦轉譯成指令。

Itdependsonhowthemathofthealgorithmwastranslatedintoinstructionsforthecomputeryouareusing.而這又需視您有多少時間。

Anditdependsonhowmuchtimeyouhave.即使經驗最豐富的資料科學家，在沒有嘗試之前，也無法確認哪一個演算法效果會最好。

Eventhemostexperienceddatascientistscan'ttellwhichalgorithmwillperformbestbeforetryingthem.機器學習演算法小祕技TheMachineLearningAlgorithmCheatSheetMicrosoftAzureMachineLearning演算法小祕技可協助您從MicrosoftAzureMachineLearning演算法資源庫中選擇適合您預測性分析解決方案的機器學習演算法。

TheMicrosoftAzureMachineLearningAlgorithmCheatSheethelpsyouchoosetherightmachinelearningalgorithmforyourpredictiveanalyticssolutionsfromtheMicrosoftAzureMachineLearninglibraryofalgorithms.本文將引導您如何使用它。

Thisarticlewalksyouthroughhowtouseit.注意若要下載小祕技，並搭配本文使用，請移至適用於MicrosoftAzureMachineLearningStudio的機器學習演算法小祕技。

Todownloadthecheatsheetandfollowalongwiththisarticle,gotoMachinelearningalgorithmcheatsheetforMicrosoftAzureMachineLearningStudio.請記住，這份小祕技有非常特定的預設對象：

一位剛起步的資料科學家，其機器學習的經驗為大學生程度，正試著在AzureMachineLearningStudio中選擇要開始使用的演算法。

Thischeatsheethasaveryspecificaudienceinmind:

abeginningdatascientistwithundergraduate-levelmachinelearning,tryingtochooseanalgorithmtostartwithinAzureMachineLearningStudio.這表示小祕技可能會比較概括且過於簡化，但它為您指引一個可靠的方向。

Thatmeansthatitmakessomegeneralizationsandoversimplifications,butitpointsyouinasafedirection.同時這也意味著還有許多演算法並未列入其中。

Italsomeansthattherearelotsofalgorithmsnotlistedhere.當Azure機器學習成長到擁有一組更完整的可用方法時，我們就會新增這些演算法。

AsAzureMachineLearninggrowstoencompassamorecompletesetofavailablemethods,we'lladdthem.這些建議是收集許多資料科學家與機器學習專家的意見反應和提示所編撰而成。

Theserecommendationsarecompiledfeedbackandtipsfrommanydatascientistsandmachinelearningexperts.雖然我們的想法並不一致，但我已試著將我們的意見整理成粗略的共識。

Wedidn'tagreeoneverything,butI'vetriedtoharmonizeouropinionsintoaroughconsensus.而大部分的爭論其實都具有同一個考量：

「視情況而定。

」Mostofthestatementsofdisagreementbeginwith'Itdepends…'如何使用小祕技Howtousethecheatsheet請將圖表上的路徑和演算法標籤解讀為「如果需要則使用。

」Readthepathandalgorithmlabelsonthechartas'For,use.'例如「如果需要speed（速度），則使用twoclasslogisticregression（雙類別羅吉斯迴歸）。

」Forexample,'Forspeed,usetwoclasslogisticregression.'有時候適用於多個分支。

Sometimesmorethanonebranchapplies.有時候則不完全適用。

Sometimesnoneofthemareaperfectfit.這些建議通常是來自經驗法則，因此不必擔心是否準確。

They'reintendedtoberule-of-thumbrecommendations,sodon'tworryaboutitbeingexact.我和一些資料科學家討論過，他們都認為唯有全部試用一次，才能找出最佳的演算法。

SeveraldatascientistsItalkedwithsaidthattheonlysurewaytofindtheverybestalgorithmistotryallofthem.以下是AzureAI資源庫中的實驗範例，該實驗對相同的資料嘗試數種演算法，並比較其結果：

比較多類別分類器：

字母辨識。

Here'sanexamplefromtheAzureAIGalleryofanexperimentthattriesseveralalgorithmsagainstthesamedataandcomparestheresults:

CompareMulti-classClassifiers:

Letterrecognition.提示若要下載並列印提供MachineLearningStudio功能概觀的圖表，請參閱AzureMachineLearningStudio功能的概觀圖。

TodownloadandprintadiagramthatgivesanoverviewofthecapabilitiesofMachineLearningStudio,seeOverviewdiagramofAzureMachineLearningStudiocapabilities.機器學習的類型Flavorsofmachinelearning監督式Supervised監督式學習演算法會根據一組範例做出預測。

Supervisedlearningalgorithmsmakepredictionsbasedonasetofexamples.例如，利用歷史股價來大膽猜測未來的價格。

Forinstance,historicalstockpricescanbeusedtohazardguessesatfutureprices.用於定型的各個範例都會標上需要關注的值，在這裡指的就是股價。

Eachexampleusedfortrainingislabeledwiththevalueofinterest—inthiscasethestockprice.監督式學習演算法會在這些值標籤中尋找模式。

Asupervisedlearningalgorithmlooksforpatternsinthosevaluelabels.它可以使用任何可能相關的資訊（星期幾、季度、公司的財務資料、產業類型、是否有破壞性的地緣政治事件等），然後每個演算法就會尋找不同類型的模式。

Itcanuseanyinformationthatmightberelevant—thedayoftheweek,theseason,thecompany'sfinancialdata,thetypeofindustry,thepresenceofdisruptivegeopoliticalevents—andeachalgorithmlooksfordifferenttypesofpatterns.當演算法找到最佳模式之後，它會使用這種模式為沒有標示的測試資料（也就是未來的股價）做出預測。

Afterthealgorithmhasfoundthebestpatternitcan,itusesthatpatterntomakepredictionsforunlabeledtestingdata—tomorrow'sprices.監督式學習是常見且實用的機器學習類型。

Supervisedlearningisapopularandusefultypeofmachinelearning.除了一個例外之外，AzureMachineLearning中的所有模組都是監督式學習演算法。

Withoneexception,allthemodulesinAzureMachineLearningaresupervisedlearningalgorithms.Azure機器學習中有幾個代表性的特定監督式學習類型：

分類、迴歸和異常偵測。

ThereareseveralspecifictypesofsupervisedlearningthatarerepresentedwithinAzureMachineLearning:

classification,regression,andanomalydetection.分類。

Classification.當資料用來預測類別時，這種監督式學習也稱為分類。

Whenthedataarebeingusedtopredictacategory,supervisedlearningisalsocalledclassification.將影像指定為'cat'或'dog'的圖片便屬這種情況。

Thisisthecasewhenassigninganimageasapictureofeithera'cat'ora'dog'.如果只有兩個選擇，則稱作雙類別或二項式分類。

Whenthereareonlytwochoices,it'scalledtwo-classorbinomialclassification.如果有多個類別，例如預測NCAA季後賽的優勝隊伍，則這個問題就稱為多類別分類。

Whentherearemorecategories,aswhenpredictingthewinneroftheNCAAMarchMadnesstournament,thisproblemisknownasmulti-classclassification.迴歸。

Regression.如果要預測值，例如股價，這種監督式學習稱為迴歸。

Whenavalueisbeingpredicted,aswithstockprices,supervisedlearningiscalledregression.異常偵測。

Anomalydetection.有時候它的目的只是要找出異常的資料點。

Sometimesthegoalistoidentifydatapointsthataresimplyunusual.例如在偵測詐騙時，只要是極不尋常的信用卡消費模式都有嫌疑。

Infrauddetection,forexample,anyhighlyunusualcreditcardspendingpatternsaresuspect.由於詐騙可能產生的變化過多，而定型的範例過少，因此難以學習何謂詐騙活動。

Thepossiblevariationsaresonumerousandthetrainingexamplessofew,thatit'snotfeasibletolearnwhatfraudulentactivitylookslike.異常偵測採用的方法，只能使用非詐騙交易的歷史記錄來了解何謂正常活動，並找出與正常活動明顯不同的情況。

Theapproachthatanomalydetectiontakesistosimplylearnwhatnormalactivitylookslike（usingahistorynon-fraudulenttransactions）andidentifyanythingthatissignificantlydifferent.未監督式Unsupervised在未監督的學習中，資料點沒有與其相關聯的標籤。

Inunsupervisedlearning,datapointshavenolabelsassociatedwiththem.然而，未經指導的學習演算法的目標在於以某種方式組織資料或描述其結構。

Instead,thegoalofanunsupervisedlearningalgorithmistoorganizethedatainsomewayortodescribeitsstructure.這種方式可能是將資料劃分為叢集，或尋找各種查看複雜資料的方式，讓資料變得更簡單或更整齊。

Thiscanmeangroupingitintoclustersorfindingdifferentwaysoflookingatcomplexdatasothatitappearssimplerormoreorganized.增強式學習Reinforcementlearning在增強式學習中，演算法需要選擇一個動作來回應每個資料點。

Inreinforcementlearning,thealgorithmgetstochooseanactioninresponsetoeachdatapoint.此學習演算法也會在短時間內收到獎勵訊號，指出決策的好壞程度。

Thelearningalgorithmalsoreceivesarewardsignalashorttimelater,indicatinghowgoodthedecisionwas.演算法會據此修改其策略，以達到最高的獎勵。

Basedonthis,thealgorithmmodifiesitsstrategyinordertoachievethehighestreward.Azure機器學習中目前沒有增強式學習演算法模組。

CurrentlytherearenoreinforcementlearningalgorithmmodulesinAzureMachineLearning.增強式學習是機器人領域中的常見方法，其中在某個時間點的感應器讀數集就是一個資料點，而演算法必須選擇機器人的下一個動作。

Reinforcementlearningiscommoninrobotics,wherethesetofsensorreadingsatonepointintimeisadatapoint,andthealgorithmmustchoosetherobot'snextaction.它的性質也很適合物聯網應用。

ItisalsoanaturalfitforInternetofThingsapplications.選擇演算法時的考量Considerationswhenchoosinganalgorithm精確度Accuracy您不一定常常需要取得最準確的答案。

Gettingthemostaccurateanswerpossibleisn'talwaysnecessary.視您的用途而定，有時候近似值便已足夠。

Sometimesanapproximationisadequate,dependingonwhatyouwanttouseitfor.如果是這樣，您就能採用近似法，並大幅縮短處理時間。

Ifthat'sthecase,youmaybeabletocutyourprocessingtimedramaticallybystickingwithmoreapproximatemethods.近似法的另一項優點是，它們會自然傾向於避免過度學習。

Anotheradvantageofmoreapproximatemethodsisthattheynaturallytendtoavoidoverfitting.定型時間Trainingtime定型出一個模型可能需要幾分鐘或幾小時，這在各個演算法間有很大的差異。

Thenumberofminutesorhoursnecessarytotrainamodelvariesagreatdealbetweenalgorithms.定型時間通常取決於精確度，這兩者的關係密不可分。

Trainingtimeisoftencloselytiedtoaccuracy—onetypicallyaccompaniestheother.此外，有些演算法對資料點的數目較為敏感。

Inaddition,somealgorithmsaremoresensitivetothenumberofdatapointsthanothers.如果有時間限制，就可以促使演算法做出選擇（尤其是資料集很大時）。

Whentimeislimiteditcandrivethechoiceofalgorithm,especiallywhenthedatasetislarge.線性Linearity許多機器學習演算法都會使用線性。

Lotsofmachinelearningalgorithmsmakeuseoflinearity.線性分類演算法會假設可以直線（或較高維度類比）分隔類別。

Linearclassificationalgorithmsassumethatclassescanbeseparatedbyastraightline（oritshigher-dimensionalanalog）.這些演算法包括羅吉斯迴歸和支援向量機器（如同Azure機器學習中所實作）。

Theseincludelogisticregressionandsupportvectormachines（asimplementedinAzureMachineLearning）.線性迴歸演算法會假設資料趨勢依循著一條直線。

Linearregressionalgorithmsassumethatdatatrendsfollowastraightline.這類假設對某些問題而言還不錯，但在其他問題上會降低精確度。

Theseassumptionsaren'tbadforsomeproblems,butonotherstheybringaccuracydown.非線性類別界限-依賴線性分類演算法會造成低精確度的結果Non-linearclassboundary-relyingonalinearclassificationalgorithmwouldresultinlowaccuracy具有非線性趨勢的資料：

使用線性迴歸方法會產生較大且不必要的誤差Datawithanonlineartrend-usingalinearregressionmethodwouldgeneratemuchlargererrorsthannecessary儘管有風險，線性演算法對於首次攻擊而言仍是一種非常熱門的方式。

Despitetheirdangers,linearalgorithmsareverypopularasafirstlineofattack.這種演算法定型起來通常又快又簡單。

Theytendtobealgorithmicallysimpleandfasttotrain.參數數目Numberofparameters參數是資料科學家在設定演算法時的必經之路。

Parametersaretheknobsadatascientistgetstoturnwhensettingupanalgorithm.參數就是會影響演算法行為的數值，例如容錯或反覆運算次數，或是演算法運作方式的變化選項。

Theyarenumbersthataffectthealgorithm'sbehavior,suchaserrortoleranceornumberofiterat

展开阅读全文