1、given a set of points in some space,it groups together points that are closely packed together(points with many nearby neighbors),marking as outliers points that lie alone in low-density regions(whose nearest neighbors are too far away).DBSCAN is one of the most common clustering algorithms and also
2、 most cited in scientific literature.二、实验设计 1.K-Means 算法思想:任意选取点集中的 k 个点作为中心,对每一个点与 k 个中心进行对比,划分至以这 k 个中心为中心点的簇中.划分结束后,重新计算每一个簇的中心点.重复以上过程,直至这些中心点不再变化.哈尔滨工业大学 Page 2 of 10 Designed by 谢浩哲 程序流程图:核心代码:1 public class KMeans 2 public Cluster getClusters(int k,Point points)3 if(k=points.length)4 return n
3、ull;5 6 7 Cluster clusters=getInitialClusters(k,points);8 Cluster newClusters=null;9 do 10 newClusters=getClusters(k,points,clusters);11 12 if(isClustersTheSame(clusters,newClusters)13 break;哈尔滨工业大学 Page 3 of 10 Designed by 谢浩哲 14 15 clusters=newClusters;16 while(true);17 return clusters;18 19 20 pr
4、ivate Cluster getClusters(int k,Point points,Cluster cluster)21 for(int i=0;i points.length;+i)22 Point currentPoint=pointsi;23 Cluster c=getClosestClusters(currentPoint,cluster);24 c.points.add(currentPoint);25 26 27 Cluster newClusters=new Clusterk;28 for(int i=0;i k;+i)29 Cluster c=clusteri;30 in
5、t numberOfPointsInCluster=c.points.size();31 32 if(numberOfPointsInCluster=0)33 /If the cluster is empty 34 int randomIndex=(int)(Math.random()*points.length);35 newClustersi=new Cluster(pointsrandomIndex);36 else 37 /If the cluster is not empty 38 double newCentroidX=0;39 double newCentroidY=0;40 f
6、or(int j=0;j numberOfPointsInCluster;+j)41 Point p=c.points.get(j);42 newCentroidX+=p.x;43 newCentroidY+=p.y;44 45 newCentroidX/=numberOfPointsInCluster;46 newCentroidY/=numberOfPointsInCluster;48 Cluster newCluster=new Cluster(new Point(newCentroidX,newCentroidY);49 newClustersi=newCluster;50 51 哈尔
7、滨工业大学 Page 4 of 10 Designed by 谢浩哲 52 return newClusters;53 54 2.AGNES(层次聚类)算法思想:算法选用 Group Average 作为合并估量.第一次循环选取 n 个点中 Group Average 最小值进行合并,将合并后的簇加入列表中,移除之前的2个簇,并重新计算该簇中的点与其他n 2 个簇的 Group Average.重复执行之前的步骤,直至所有的簇都被合并.程序流程图:哈尔滨工业大学 Page 5 of 10 Designed by 谢浩哲 核心代码:1 public class Agnes 2 public Cl
8、uster getCluster(List clusters)3 while(clusters.size()1)4 double minProximity=Double.MAX_VALUE;5 int minProximityIndex1=0,minProximityIndex2=0;6 7 for(int i=0;i clusters.size();+i)8 for(int j=i+1;j clusters.size();+j)9 double proximity=getProximity(clusters.get(i),clusters.get(j);10 11 if(proximity
9、minProximity)12 minProximity=proximity;13 minProximityIndex1=i;14 minProximityIndex2=j;15 16 17 18 Cluster c=new Cluster(clusters.get(minProximityIndex1),clusters.get(minProximityIndex2);19 clusters.add(c);20 clusters.remove(minProximityIndex2);21 clusters.remove(minProximityIndex1);22 23 return clu
10、sters.size()=0?null:clusters.get(0);24 25 3.DBSCAN 算法思想:首先在所有的点集中识别出 Core Point(对其邻域内点的个数进行计数),再在剩余的点集中识别出 Core Point(即该点在 Core Point 的邻域内).接着,若两个 Core Point 彼此相连,他们是一个 Cluster 中的点,将所有的 Core Point合并成若干的Cluster.再检查所有的Border Point,看该Border Point在哪一个Core Point的邻域内,将其合并至该 Core Point 所在的簇.哈尔滨工业大学 Page 6
11、of 10 Designed by 谢浩哲 程序流程图:以下为该算法核心代码的实现(仅包含识别 Core Point,并将 Core Point 分类成簇)1 public class Dbscan 2 public List getClusters(List points,int minPoints,double eps)3 List corePoints=getCorePoints(points,minPoints,eps);4 Map clusters=getClustersOfCorePoints(corePoints,eps);5 6 List borderPoints=getBor
12、derPoints(points,corePoints,minPoints,eps);7 getClustersOfBorderPoints(corePoints,borderPoints,clusters,eps);8 哈尔滨工业大学 Page 7 of 10 Designed by 谢浩哲 9 return new ArrayList(clusters.values();10 11 12 private List getCorePoints(List points,int minPoints,double eps)13 List corePoints=new ArrayList();14 15 for(int i=0;i points.size();+i)16 Point currentPoint=points.get(i);
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1