数据挖掘实验报告资料下载.pdf
《数据挖掘实验报告资料下载.pdf》由会员分享,可在线阅读,更多相关《数据挖掘实验报告资料下载.pdf(11页珍藏版)》请在冰豆网上搜索。
givenasetofpointsinsomespace,itgroupstogetherpointsthatarecloselypackedtogether(pointswithmanynearbyneighbors),markingasoutlierspointsthatliealoneinlow-densityregions(whosenearestneighborsaretoofaraway).DBSCANisoneofthemostcommonclusteringalgorithmsandalsomostcitedinscientificliterature.二、实验设计1.K-Means算法思想:
任意选取点集中的k个点作为中心,对每一个点与k个中心进行对比,划分至以这k个中心为中心点的簇中.划分结束后,重新计算每一个簇的中心点.重复以上过程,直至这些中心点不再变化.哈尔滨工业大学Page2of10Designedby谢浩哲程序流程图:
核心代码:
1publicclassKMeans2publicClustergetClusters(intk,Pointpoints)3if(k=points.length)4returnnull;
567Clusterclusters=getInitialClusters(k,points);
8ClusternewClusters=null;
9do10newClusters=getClusters(k,points,clusters);
1112if(isClustersTheSame(clusters,newClusters)13break;
哈尔滨工业大学Page3of10Designedby谢浩哲1415clusters=newClusters;
16while(true);
17returnclusters;
181920privateClustergetClusters(intk,Pointpoints,Clustercluster)21for(inti=0;
ipoints.length;
+i)22PointcurrentPoint=pointsi;
23Clusterc=getClosestClusters(currentPoint,cluster);
24c.points.add(currentPoint);
252627ClusternewClusters=newClusterk;
28for(inti=0;
ik;
+i)29Clusterc=clusteri;
30intnumberOfPointsInCluster=c.points.size();
3132if(numberOfPointsInCluster=0)33/Iftheclusterisempty34intrandomIndex=(int)(Math.random()*points.length);
35newClustersi=newCluster(pointsrandomIndex);
36else37/Iftheclusterisnotempty38doublenewCentroidX=0;
39doublenewCentroidY=0;
40for(intj=0;
jnumberOfPointsInCluster;
+j)41Pointp=c.points.get(j);
42newCentroidX+=p.x;
43newCentroidY+=p.y;
4445newCentroidX/=numberOfPointsInCluster;
46newCentroidY/=numberOfPointsInCluster;
48ClusternewCluster=newCluster(newPoint(newCentroidX,newCentroidY);
49newClustersi=newCluster;
5051哈尔滨工业大学Page4of10Designedby谢浩哲52returnnewClusters;
53542.AGNES(层次聚类)算法思想:
算法选用GroupAverage作为合并估量.第一次循环选取n个点中GroupAverage最小值进行合并,将合并后的簇加入列表中,移除之前的2个簇,并重新计算该簇中的点与其他n2个簇的GroupAverage.重复执行之前的步骤,直至所有的簇都被合并.程序流程图:
哈尔滨工业大学Page5of10Designedby谢浩哲核心代码:
1publicclassAgnes2publicClustergetCluster(Listclusters)3while(clusters.size()1)4doubleminProximity=Double.MAX_VALUE;
5intminProximityIndex1=0,minProximityIndex2=0;
67for(inti=0;
iclusters.size();
+i)8for(intj=i+1;
jclusters.size();
+j)9doubleproximity=getProximity(clusters.get(i),clusters.get(j);
1011if(proximityminProximity)12minProximity=proximity;
13minProximityIndex1=i;
14minProximityIndex2=j;
15161718Clusterc=newCluster(clusters.get(minProximityIndex1),clusters.get(minProximityIndex2);
19clusters.add(c);
20clusters.remove(minProximityIndex2);
21clusters.remove(minProximityIndex1);
2223returnclusters.size()=0?
null:
clusters.get(0);
24253.DBSCAN算法思想:
首先在所有的点集中识别出CorePoint(对其邻域内点的个数进行计数),再在剩余的点集中识别出CorePoint(即该点在CorePoint的邻域内).接着,若两个CorePoint彼此相连,他们是一个Cluster中的点,将所有的CorePoint合并成若干的Cluster.再检查所有的BorderPoint,看该BorderPoint在哪一个CorePoint的邻域内,将其合并至该CorePoint所在的簇.哈尔滨工业大学Page6of10Designedby谢浩哲程序流程图:
以下为该算法核心代码的实现(仅包含识别CorePoint,并将CorePoint分类成簇)1publicclassDbscan2publicListgetClusters(Listpoints,intminPoints,doubleeps)3ListcorePoints=getCorePoints(points,minPoints,eps);
4Mapclusters=getClustersOfCorePoints(corePoints,eps);
56ListborderPoints=getBorderPoints(points,corePoints,minPoints,eps);
7getClustersOfBorderPoints(corePoints,borderPoints,clusters,eps);
8哈尔滨工业大学Page7of10Designedby谢浩哲9returnnewArrayList(clusters.values();
101112privateListgetCorePoints(Listpoints,intminPoints,doubleeps)13ListcorePoints=newArrayList();
1415for(inti=0;
ipoints.size();
+i)16PointcurrentPoint=points.get(i);