数据挖掘实验报告资料下载.pdf

资源描述

数据挖掘实验报告资料下载.pdf

《数据挖掘实验报告资料下载.pdf》由会员分享，可在线阅读，更多相关《数据挖掘实验报告资料下载.pdf（11页珍藏版）》请在冰豆网上搜索。

数据挖掘实验报告资料下载.pdf

givenasetofpointsinsomespace,itgroupstogetherpointsthatarecloselypackedtogether（pointswithmanynearbyneighbors）,markingasoutlierspointsthatliealoneinlow-densityregions（whosenearestneighborsaretoofaraway）.DBSCANisoneofthemostcommonclusteringalgorithmsandalsomostcitedinscientificliterature.二、实验设计1.K-Means算法思想:

任意选取点集中的k个点作为中心,对每一个点与k个中心进行对比,划分至以这k个中心为中心点的簇中.划分结束后,重新计算每一个簇的中心点.重复以上过程,直至这些中心点不再变化.哈尔滨工业大学Page2of10Designedby谢浩哲程序流程图:

核心代码:

1publicclassKMeans2publicClustergetClusters（intk,Pointpoints）3if（k=points.length）4returnnull;

567Clusterclusters=getInitialClusters（k,points）;

8ClusternewClusters=null;

9do10newClusters=getClusters（k,points,clusters）;

1112if（isClustersTheSame（clusters,newClusters）13break;

哈尔滨工业大学Page3of10Designedby谢浩哲1415clusters=newClusters;

16while（true）;

17returnclusters;

181920privateClustergetClusters（intk,Pointpoints,Clustercluster）21for（inti=0;

ipoints.length;

+i）22PointcurrentPoint=pointsi;

23Clusterc=getClosestClusters（currentPoint,cluster）;

24c.points.add（currentPoint）;

252627ClusternewClusters=newClusterk;

28for（inti=0;

ik;

+i）29Clusterc=clusteri;

30intnumberOfPointsInCluster=c.points.size（）;

3132if（numberOfPointsInCluster=0）33/Iftheclusterisempty34intrandomIndex=（int）（Math.random（）*points.length）;

35newClustersi=newCluster（pointsrandomIndex）;

36else37/Iftheclusterisnotempty38doublenewCentroidX=0;

39doublenewCentroidY=0;

40for（intj=0;

jnumberOfPointsInCluster;

+j）41Pointp=c.points.get（j）;

42newCentroidX+=p.x;

43newCentroidY+=p.y;

4445newCentroidX/=numberOfPointsInCluster;

46newCentroidY/=numberOfPointsInCluster;

48ClusternewCluster=newCluster（newPoint（newCentroidX,newCentroidY）;

49newClustersi=newCluster;

5051哈尔滨工业大学Page4of10Designedby谢浩哲52returnnewClusters;

53542.AGNES（层次聚类）算法思想:

算法选用GroupAverage作为合并估量.第一次循环选取n个点中GroupAverage最小值进行合并,将合并后的簇加入列表中,移除之前的2个簇,并重新计算该簇中的点与其他n2个簇的GroupAverage.重复执行之前的步骤,直至所有的簇都被合并.程序流程图:

哈尔滨工业大学Page5of10Designedby谢浩哲核心代码:

1publicclassAgnes2publicClustergetCluster（Listclusters）3while（clusters.size（）1）4doubleminProximity=Double.MAX_VALUE;

5intminProximityIndex1=0,minProximityIndex2=0;

67for（inti=0;

iclusters.size（）;

+i）8for（intj=i+1;

jclusters.size（）;

+j）9doubleproximity=getProximity（clusters.get（i）,clusters.get（j）;

1011if（proximityminProximity）12minProximity=proximity;

13minProximityIndex1=i;

14minProximityIndex2=j;

15161718Clusterc=newCluster（clusters.get（minProximityIndex1）,clusters.get（minProximityIndex2）;

19clusters.add（c）;

20clusters.remove（minProximityIndex2）;

21clusters.remove（minProximityIndex1）;

2223returnclusters.size（）=0?

null:

clusters.get（0）;

24253.DBSCAN算法思想:

首先在所有的点集中识别出CorePoint（对其邻域内点的个数进行计数）,再在剩余的点集中识别出CorePoint（即该点在CorePoint的邻域内）.接着,若两个CorePoint彼此相连,他们是一个Cluster中的点,将所有的CorePoint合并成若干的Cluster.再检查所有的BorderPoint,看该BorderPoint在哪一个CorePoint的邻域内,将其合并至该CorePoint所在的簇.哈尔滨工业大学Page6of10Designedby谢浩哲程序流程图:

以下为该算法核心代码的实现（仅包含识别CorePoint,并将CorePoint分类成簇）1publicclassDbscan2publicListgetClusters（Listpoints,intminPoints,doubleeps）3ListcorePoints=getCorePoints（points,minPoints,eps）;

4Mapclusters=getClustersOfCorePoints（corePoints,eps）;

56ListborderPoints=getBorderPoints（points,corePoints,minPoints,eps）;

7getClustersOfBorderPoints（corePoints,borderPoints,clusters,eps）;

8哈尔滨工业大学Page7of10Designedby谢浩哲9returnnewArrayList（clusters.values（）;

101112privateListgetCorePoints（Listpoints,intminPoints,doubleeps）13ListcorePoints=newArrayList（）;

1415for（inti=0;

ipoints.size（）;

+i）16PointcurrentPoint=points.get（i）;

展开阅读全文