Data Mining Techniques on Course Selection System Application.docx

上传人:b****5 文档编号:5795833 上传时间:2023-01-01 格式:DOCX 页数:10 大小:168.46KB
下载 相关 举报
Data Mining Techniques on Course Selection System Application.docx_第1页
第1页 / 共10页
Data Mining Techniques on Course Selection System Application.docx_第2页
第2页 / 共10页
Data Mining Techniques on Course Selection System Application.docx_第3页
第3页 / 共10页
Data Mining Techniques on Course Selection System Application.docx_第4页
第4页 / 共10页
Data Mining Techniques on Course Selection System Application.docx_第5页
第5页 / 共10页
点击查看更多>>
下载资源
资源描述

Data Mining Techniques on Course Selection System Application.docx

《Data Mining Techniques on Course Selection System Application.docx》由会员分享,可在线阅读,更多相关《Data Mining Techniques on Course Selection System Application.docx(10页珍藏版)》请在冰豆网上搜索。

Data Mining Techniques on Course Selection System Application.docx

DataMiningTechniquesonCourseSelectionSystemApplication

DataMiningTechniquesonCourseSelectionSystemApplication

[文档副标题]

Abstract:

Thistechniquereportmainlytalksaboutthecourseselectionsystemapplicationbasedondataminingtechniques.Theapplicationcontainssomespecificfunctionsincludingassociationanalysis,classification,clusteringandoutlierdetection.Intheapplication,thesetechniquesareusedbasedontheapplicationusers’requirement,specificationandwhataspecttheyareinterestedin.Moreover,theymainuseroftheapplicationisthepeoplewhoarrangethecourseinformationsuchascourseinformationregistrar(选课管理员).Ingeneral,thegoaloftheapplicationistoprovidethecourseinformationregistrarwithdetailinformationaboutwhichcourseshouldbeadded,reducedorcanceled,coursepopularityanalysis,operationoncourseandetc.Inthisreport,somebackgroundknowledgewillbeintroducedfirstlyandhowtoapplytheseknowledgeintotheapplicationwillbeexplainedindetail,thatis,detailsofthedesignandimplementationapproacheswillbespecified.Finally,conclusionwillbedrawnattheendofreport.

Keyword:

datamining,courseselection,application

 

1Introduction

Dataminingisanindispensablestepintheprocessofknowledge-discoveryindatabase.Ingeneral,dataminingisaprocessthatsearchforsomehiddeninformationfrommassivedatabyusingsomespecificalgorithm.Dataminingisusuallyrelatedtosomesubjectssuchascomputerscienceandetc.Byusingstatistics,onlineanalyticalprocessing,informationretrieval,machinelearning,expertsystem(dependonthepastruleofthumb),patternrecognitionandetc.toachievethesegoalsabove.Byassociatingwithexampleinreallife,thedataminingtechniquescanbeusedinwidelyrange.Inthisexample,courseselectionisacommonsenseincollege.However,differentstudentshavedifferentpreferencesonthecourseselectionwhichleadstosomeprofessorshavemorestudentsthanexpectedorlessstudentthanexpected.Ifthenumberofstudentsismuchmorethanexpected.Theprofessormustapplyforabiggerclassroomoropenanothercourseindifferenttime.Ontheotherhand,ifthenumberstudentsislessthanexpected.Theprofessormustapplyforasmallerclassroomorevencancelthecourse.Inthisway,professorandstudentcanmakethemostofteachingresource.Allthedecisionaremadebycourseinformationregistrarwhocanusethiscourseselectionsystemapplicationtoachievethisgoal.Allinall,theapplicationhelpscourseinformationregistrartoarrangethecourseinamoreefficientway.

2ExistingDatabase

ExistingDatabasemeansthatthecourseinformationregistrarhavetheaccesstothesedata,thatis,thesearerawdataforregistrartousethisapplicationtoanalyze.

✓StudentInformation:

Includingstudentpersonalinformation,studentname,gender,dateofbirth,studentID,classnumber,major,grade,selectionofcourseandetc.

✓ProfessorInformation:

Includingprofessorpersonalinformation,professorname,gender,dateofbirth,professorID,coursetoteachgradeandetc.

✓ClassroomInformation:

Includingclassroominformation,classroomnumber,size,availabletimeandetc.

✓CourseInformation:

Includingcourseinformation,coursename,coursenumber,coursehours,creditandetc.

✓HistoryRecord:

Recordsomedataaboutthecoursethatprofessortaughtincludingstudentlist,professorlistandclasslistofeverycourseinhistorywhicharemassiveofdatatobestoredindatabasetobeanalyzed.

Therelationshipabovecanbedescribeasthegraphasfollow:

3DataMiningTechniques

3.1AssociationAnalysis

3.1.1Theory

Themainaspectofassociationanalysisfrequentpatternanalysiswhichisapattern(asetofitem,subsequences,substructuresandetc.)thatoccursfrequentlyinadataset.Thetheoremisfirstproposedinthecontextoffrequentitemsetsandassociationruleminingwhichaimsatfindinginherentregularitiesindata.Someapplicationssuchasbasketdataanalysis,cross-marking,catalogdesign,salecampaignanalysis,weblog(clickstream)analysis,andDNAsequenceanalysisarerelatedtofrequentpatternanalysis.Moredetail,oneofthescalableminingmethodiscalledApriorialgorithm.Thealgorithmcanbedescribedasaflowchartasfollow:

ByusingtheApriorialgorithm,n-frequentpatterncanbefoundasnchangeswhilenisthenumberofitemsets.Forexample,byanalyzingtheitemboughtofasupermarket,runtheApriorialgorithmandtheclerkwillfindthatpeoplewillbuybeeranddiaperatthesametimewhichhelpthemanagerofsupermarkettoadoptsomesalestrategytopromotethesalevolume.Similarly,theApriorialgorithmalsoapplyforthescenarioofcourseselectionsystemapplication.

3.1.2Implementation

Inthecourseselectionsystemapplication,thecourseinformationregistrarshouldlogintothesystemfirstlyandthenchoosedataminingtechniquesandrawdatabase.Inthissituation,thedataminingtechniquesisassociationanalysisandthedatabaseisstudentandcoursedatabase.Moredetail,theregistrarruntheApriorialgorithmbasedonstudentandcoursedatabaseandsettheitemsetsashewants,forexampletwoitemsets.Then,theapplicationwillshowalotoffrequentpatternsrankedfrommaximumnumbertominimumnumberofpatternswhichhelptheregistrartoanalyzethedataandmakedecision.Forexample,themaximumnumberofpatternsmeansthatthesetwocoursearemostrelatedwhichindicatethatmajorityofstudentswillselectthesetwocourseatthesametime.Moreover,thiswillimpliestheregistrartoarrangethesetwocourseadjacentinthecourseselectionsystemapplicationforstudenttoselectconveniently.Forexample,theregistrarfindsthat60%studentswillselectAdvancedMathematicalandMathematicalModellingatthesametime.Byanalyzingtheresult,theregistrarshouldarrangethesetwocoursesatthesamewebpageforstudenttoselectfromwhichbringsalotofconvenienceforstudenttoselectrelativecourse.

3.2Classification

3.2.1Theory

Theclassificationisalsocalledsupervisedlearningwhichthetrainingdatasuchasobservations,measurementandetc.areaccompaniedbylabelsindicatingtheclassoftheobservations.Moreover,newdataisclassifiedbasedonthetrainingset.Beforeusetheclassificationtheory,categoricalclasslabels(discreteornominal)shouldbepredicted.Moreover,data(constructsamodel)basedonthetrainingsetandthevalues(classlabels)inaclassifyingattributeandusesitinclassifyingnewdatashouldalsobeclassified.Onewaytoclassifythedataisthedecisiontree.Decisiontreeinductionisthelearningofdecisiontreesfromclass-labeledtrainingtuples.Adecisiontreeisaflowchartliketreestructure,whereeachinternalnode(non-leafnode)denotesatestonanattribute,eachbranchrepresentsanoutcomeofthetest,andeachleafnode(orterminalnode)holdsaclasslabel.Thetop-mostnodeinatreeistherootnode.Thebasicalgorithmofdecisiontreeisthegreedyalgorithm.Treeisconstructedinatop-downrecursivedivided-and-conquermanner.Atstart,allthetrainingexamplesareattheroot.Attributesarecategorical(ifcontinuous-valued,theyarediscretizedinadvance).Examplesarepartitionedrecursivelybasedonselectedattributes.Testattributesareselectedonthebasisofaheuristicorstatisticalmeasure(e.g.informationgain).Conditionforstoppingpartitioningisasfollow:

✓Allsamplesforagivennodebelongtothesameclass.

✓Therearenoremainingattributesforfurtherpartitioning-majorityvotingisemployedforclassifyingtheleaf.

✓Therearenosamplesleft.

However,howtoselectsuitableattributeisbasedontheinformationgainwhichiscalculatedbasedonthestepsasfollow:

✓LetpibetheprobabilitythatanarbitrarytupleinDbelongstoclassCi,estimatedby|Ci,D|/|D|.

✓Expectedinformation(entropy)neededtoclassifyatupleinD:

✓Informationneeded(afterusingAtosplitDintovpartitions)toclassifyD:

✓InformationgainedbybranchingonattributeA:

Bycalculatingtheinformationgain,selecttheattributewiththehighestinformationgain,nevertheless,suchanapproachminimizestheexpectednumberoftestsneededtoclassifyagiventupleandguaranteesthatasimple(butnotnecessarilythesimplest)treeisfound.Allinall,thedecisiontreeissuitableinthecourseselectioninformation.

3.2.2Implementation

Inthecourseselectionsystemapplication,differentcoursemayhavedifferentstudenttoselectfrom.Byanalyzingthestudentattributeandconstructingadecisiontreerespectively,somepatternscanbefoundwhichishelpfulforanalyzing.Thetrainingdatasetisthestudentdatabase.Firstly,calculatetheinformationgainbasedoneachattributeandselecttheattributewiththehighestinformationgain.Then,runthealgorithmfordecisiontreeandclassifythedata.Moredetails,bycalculatingtheinformationgain,theattributemajorhashighestinformationgain,thatis,themajoristherootofthedecisiontreeandrecalculatetheinformationgain,andfindhighestinformationgainandsoon.Theprocesscanbedescribedasflowchartasflow:

Inthisapplication,byrunningthedecisiontreealgorithm,somepatternscanbefind.Forexample,foreachcourse,suchasjava,thedecisiontreecanbedrawnbasedonthealgorithm.Thedecisiontreeisasfollow:

Inthisdiagram,rootismajorwhichhasthehighestinformationgain,thatis,studentwhosemajoriscomputersciencewillselectthejavacourse.Studentwhosemajorislanguagewilldependongender.Ifgenderismale,thestudentwillselectthec

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 经管营销 > 企业管理

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1