Data Mining Techniques on Course Selection System Application.docx
《Data Mining Techniques on Course Selection System Application.docx》由会员分享,可在线阅读,更多相关《Data Mining Techniques on Course Selection System Application.docx(10页珍藏版)》请在冰豆网上搜索。
DataMiningTechniquesonCourseSelectionSystemApplication
DataMiningTechniquesonCourseSelectionSystemApplication
[文档副标题]
Abstract:
Thistechniquereportmainlytalksaboutthecourseselectionsystemapplicationbasedondataminingtechniques.Theapplicationcontainssomespecificfunctionsincludingassociationanalysis,classification,clusteringandoutlierdetection.Intheapplication,thesetechniquesareusedbasedontheapplicationusers’requirement,specificationandwhataspecttheyareinterestedin.Moreover,theymainuseroftheapplicationisthepeoplewhoarrangethecourseinformationsuchascourseinformationregistrar(选课管理员).Ingeneral,thegoaloftheapplicationistoprovidethecourseinformationregistrarwithdetailinformationaboutwhichcourseshouldbeadded,reducedorcanceled,coursepopularityanalysis,operationoncourseandetc.Inthisreport,somebackgroundknowledgewillbeintroducedfirstlyandhowtoapplytheseknowledgeintotheapplicationwillbeexplainedindetail,thatis,detailsofthedesignandimplementationapproacheswillbespecified.Finally,conclusionwillbedrawnattheendofreport.
Keyword:
datamining,courseselection,application
1Introduction
Dataminingisanindispensablestepintheprocessofknowledge-discoveryindatabase.Ingeneral,dataminingisaprocessthatsearchforsomehiddeninformationfrommassivedatabyusingsomespecificalgorithm.Dataminingisusuallyrelatedtosomesubjectssuchascomputerscienceandetc.Byusingstatistics,onlineanalyticalprocessing,informationretrieval,machinelearning,expertsystem(dependonthepastruleofthumb),patternrecognitionandetc.toachievethesegoalsabove.Byassociatingwithexampleinreallife,thedataminingtechniquescanbeusedinwidelyrange.Inthisexample,courseselectionisacommonsenseincollege.However,differentstudentshavedifferentpreferencesonthecourseselectionwhichleadstosomeprofessorshavemorestudentsthanexpectedorlessstudentthanexpected.Ifthenumberofstudentsismuchmorethanexpected.Theprofessormustapplyforabiggerclassroomoropenanothercourseindifferenttime.Ontheotherhand,ifthenumberstudentsislessthanexpected.Theprofessormustapplyforasmallerclassroomorevencancelthecourse.Inthisway,professorandstudentcanmakethemostofteachingresource.Allthedecisionaremadebycourseinformationregistrarwhocanusethiscourseselectionsystemapplicationtoachievethisgoal.Allinall,theapplicationhelpscourseinformationregistrartoarrangethecourseinamoreefficientway.
2ExistingDatabase
ExistingDatabasemeansthatthecourseinformationregistrarhavetheaccesstothesedata,thatis,thesearerawdataforregistrartousethisapplicationtoanalyze.
✓StudentInformation:
Includingstudentpersonalinformation,studentname,gender,dateofbirth,studentID,classnumber,major,grade,selectionofcourseandetc.
✓ProfessorInformation:
Includingprofessorpersonalinformation,professorname,gender,dateofbirth,professorID,coursetoteachgradeandetc.
✓ClassroomInformation:
Includingclassroominformation,classroomnumber,size,availabletimeandetc.
✓CourseInformation:
Includingcourseinformation,coursename,coursenumber,coursehours,creditandetc.
✓HistoryRecord:
Recordsomedataaboutthecoursethatprofessortaughtincludingstudentlist,professorlistandclasslistofeverycourseinhistorywhicharemassiveofdatatobestoredindatabasetobeanalyzed.
Therelationshipabovecanbedescribeasthegraphasfollow:
3DataMiningTechniques
3.1AssociationAnalysis
3.1.1Theory
Themainaspectofassociationanalysisfrequentpatternanalysiswhichisapattern(asetofitem,subsequences,substructuresandetc.)thatoccursfrequentlyinadataset.Thetheoremisfirstproposedinthecontextoffrequentitemsetsandassociationruleminingwhichaimsatfindinginherentregularitiesindata.Someapplicationssuchasbasketdataanalysis,cross-marking,catalogdesign,salecampaignanalysis,weblog(clickstream)analysis,andDNAsequenceanalysisarerelatedtofrequentpatternanalysis.Moredetail,oneofthescalableminingmethodiscalledApriorialgorithm.Thealgorithmcanbedescribedasaflowchartasfollow:
ByusingtheApriorialgorithm,n-frequentpatterncanbefoundasnchangeswhilenisthenumberofitemsets.Forexample,byanalyzingtheitemboughtofasupermarket,runtheApriorialgorithmandtheclerkwillfindthatpeoplewillbuybeeranddiaperatthesametimewhichhelpthemanagerofsupermarkettoadoptsomesalestrategytopromotethesalevolume.Similarly,theApriorialgorithmalsoapplyforthescenarioofcourseselectionsystemapplication.
3.1.2Implementation
Inthecourseselectionsystemapplication,thecourseinformationregistrarshouldlogintothesystemfirstlyandthenchoosedataminingtechniquesandrawdatabase.Inthissituation,thedataminingtechniquesisassociationanalysisandthedatabaseisstudentandcoursedatabase.Moredetail,theregistrarruntheApriorialgorithmbasedonstudentandcoursedatabaseandsettheitemsetsashewants,forexampletwoitemsets.Then,theapplicationwillshowalotoffrequentpatternsrankedfrommaximumnumbertominimumnumberofpatternswhichhelptheregistrartoanalyzethedataandmakedecision.Forexample,themaximumnumberofpatternsmeansthatthesetwocoursearemostrelatedwhichindicatethatmajorityofstudentswillselectthesetwocourseatthesametime.Moreover,thiswillimpliestheregistrartoarrangethesetwocourseadjacentinthecourseselectionsystemapplicationforstudenttoselectconveniently.Forexample,theregistrarfindsthat60%studentswillselectAdvancedMathematicalandMathematicalModellingatthesametime.Byanalyzingtheresult,theregistrarshouldarrangethesetwocoursesatthesamewebpageforstudenttoselectfromwhichbringsalotofconvenienceforstudenttoselectrelativecourse.
3.2Classification
3.2.1Theory
Theclassificationisalsocalledsupervisedlearningwhichthetrainingdatasuchasobservations,measurementandetc.areaccompaniedbylabelsindicatingtheclassoftheobservations.Moreover,newdataisclassifiedbasedonthetrainingset.Beforeusetheclassificationtheory,categoricalclasslabels(discreteornominal)shouldbepredicted.Moreover,data(constructsamodel)basedonthetrainingsetandthevalues(classlabels)inaclassifyingattributeandusesitinclassifyingnewdatashouldalsobeclassified.Onewaytoclassifythedataisthedecisiontree.Decisiontreeinductionisthelearningofdecisiontreesfromclass-labeledtrainingtuples.Adecisiontreeisaflowchartliketreestructure,whereeachinternalnode(non-leafnode)denotesatestonanattribute,eachbranchrepresentsanoutcomeofthetest,andeachleafnode(orterminalnode)holdsaclasslabel.Thetop-mostnodeinatreeistherootnode.Thebasicalgorithmofdecisiontreeisthegreedyalgorithm.Treeisconstructedinatop-downrecursivedivided-and-conquermanner.Atstart,allthetrainingexamplesareattheroot.Attributesarecategorical(ifcontinuous-valued,theyarediscretizedinadvance).Examplesarepartitionedrecursivelybasedonselectedattributes.Testattributesareselectedonthebasisofaheuristicorstatisticalmeasure(e.g.informationgain).Conditionforstoppingpartitioningisasfollow:
✓Allsamplesforagivennodebelongtothesameclass.
✓Therearenoremainingattributesforfurtherpartitioning-majorityvotingisemployedforclassifyingtheleaf.
✓Therearenosamplesleft.
However,howtoselectsuitableattributeisbasedontheinformationgainwhichiscalculatedbasedonthestepsasfollow:
✓LetpibetheprobabilitythatanarbitrarytupleinDbelongstoclassCi,estimatedby|Ci,D|/|D|.
✓Expectedinformation(entropy)neededtoclassifyatupleinD:
✓Informationneeded(afterusingAtosplitDintovpartitions)toclassifyD:
✓InformationgainedbybranchingonattributeA:
Bycalculatingtheinformationgain,selecttheattributewiththehighestinformationgain,nevertheless,suchanapproachminimizestheexpectednumberoftestsneededtoclassifyagiventupleandguaranteesthatasimple(butnotnecessarilythesimplest)treeisfound.Allinall,thedecisiontreeissuitableinthecourseselectioninformation.
3.2.2Implementation
Inthecourseselectionsystemapplication,differentcoursemayhavedifferentstudenttoselectfrom.Byanalyzingthestudentattributeandconstructingadecisiontreerespectively,somepatternscanbefoundwhichishelpfulforanalyzing.Thetrainingdatasetisthestudentdatabase.Firstly,calculatetheinformationgainbasedoneachattributeandselecttheattributewiththehighestinformationgain.Then,runthealgorithmfordecisiontreeandclassifythedata.Moredetails,bycalculatingtheinformationgain,theattributemajorhashighestinformationgain,thatis,themajoristherootofthedecisiontreeandrecalculatetheinformationgain,andfindhighestinformationgainandsoon.Theprocesscanbedescribedasflowchartasflow:
Inthisapplication,byrunningthedecisiontreealgorithm,somepatternscanbefind.Forexample,foreachcourse,suchasjava,thedecisiontreecanbedrawnbasedonthealgorithm.Thedecisiontreeisasfollow:
Inthisdiagram,rootismajorwhichhasthehighestinformationgain,thatis,studentwhosemajoriscomputersciencewillselectthejavacourse.Studentwhosemajorislanguagewilldependongender.Ifgenderismale,thestudentwillselectthec