1、实验报告聚类分析报告实验报告 聚类分析实验原理:K均值聚类、中心点聚类、系统聚类和EM算法聚类分析技术。实验题目:用鸢尾花的数据集,进行聚类挖掘分析。实验要求:探索鸢尾花数据的基本特征,利用不同的聚类挖掘方法,获得基本结论并简明解释。实验题目-分析报告:data(iris) rm(list=ls() gc() used (Mb) gc trigger (Mb) max used (Mb)Ncells 431730 23.1 929718 49.7 607591 32.5Vcells 787605 6.1 8388608 64.0 1592403 12.2 data(iris) data hea
2、d(data) Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa3 4.7 3.2 1.3 0.2 setosa4 4.6 3.1 1.5 0.2 setosa5 5.0 3.6 1.4 0.2 setosa6 5.4 3.9 1.7 0.4 setosa#Kmean聚类分析 newiris newiris$Species (kc table(iris$Species, kc$cluster) 1 2 3 setosa 0 50 0
3、versicolor 48 0 2 virginica 14 0 36 plot(newirisc(Sepal.Length, Sepal.Width), col = kc$cluster) points(kc$centers,c(Sepal.Length, Sepal.Width), col = 1:3, pch = 8, cex=2)#K-Mediods 进行聚类分析 install.packages(cluster) library(cluster) iris.pam table(iris$Species,iris.pam$clustering) 1 2 3 setosa 50 0 0
4、versicolor 0 3 47 virginica 0 49 1 layout(matrix(c(1,2),1,2) plot(iris.pam) layout(matrix(1)#hc iris.hc plot( iris.hc, hang = -1) plclust( iris.hc, labels = FALSE, hang = -1) re iris.id sapply(unique(iris.id),+ function(g)iris$Speciesiris.id=g)1 1 setosa setosa setosa setosa setosa setosa setosa set
5、osa setosa setosa setosa12 setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa23 setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa34 setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa45 setosa setosa setosa setosa setosa s
6、etosaLevels: setosa versicolor virginica2 1 versicolor versicolor versicolor versicolor versicolor versicolor versicolor 8 versicolor versicolor versicolor versicolor versicolor versicolor versicolor15 versicolor versicolor versicolor versicolor versicolor versicolor versicolor22 versicolor versicol
7、or virginica virginica virginica virginica virginica 29 virginica virginica virginica virginica virginica virginica virginica 36 virginica virginica virginica virginica virginica virginica virginica 43 virginica virginica virginica virginica virginica virginica virginica 50 virginica virginica virgi
8、nica virginica virginica virginica virginica 57 virginica virginica virginica virginica virginica virginica virginica 64 virginica virginica virginica virginica virginica virginica virginica 71 virginica virginica Levels: setosa versicolor virginica3 1 versicolor versicolor versicolor versicolor ver
9、sicolor versicolor versicolor 8 versicolor versicolor versicolor versicolor versicolor versicolor versicolor15 versicolor versicolor versicolor versicolor versicolor versicolor versicolor22 versicolor versicolor versicolor versicolor versicolor versicolor virginica Levels: setosa versicolor virginic
10、a plot(iris.hc) rect.hclust(iris.hc,k=4,border=light grey)#用浅灰色矩形框出4分类聚类结果 rect.hclust(iris.hc,k=3,border=dark grey)#用浅灰色矩形框出3分类聚类结果 rect.hclust(iris.hc,k=7,which=c(2,6),border=dark grey)# DBSCAN #基于密度的聚类 install.packages(fpc) library(fpc) ds1=dbscan(iris,1:4,eps=1,MinPts=5)#半径参数为1,密度阈值为5 ds1dbscan
11、Pts=150 MinPts=5 eps=1 1 2border 0 1seed 50 99total 50 100 ds2=dbscan(iris,1:4,eps=4,MinPts=5) ds3=dbscan(iris,1:4,eps=4,MinPts=2) ds4=dbscan(iris,1:4,eps=8,MinPts=2) par(mfcol=c(2,2) plot(ds1,iris,1:4,main=1: MinPts=5 eps=1) plot(ds3,iris,1:4,main=3: MinPts=2 eps=4) plot(ds2,iris,1:4,main=2: MinPts
12、=5 eps=4) plot(ds4,iris,1:4,main=4: MinPts=2 eps=8) d=dist(iris,1:4)#计算数据集的距离矩阵d max(d);min(d)#计算数据集样本的距离的最值1 7.0851961 0 install.packages(ggplot2) library(ggplot2) interval=cut_interval(d,30) table(interval)interval 0,0.236 (0.236,0.472 (0.472,0.709 (0.709,0.945 (0.945,1.18 (1.18,1.42 88 585 876 89
13、1 831 688 (1.42,1.65 (1.65,1.89 (1.89,2.13 (2.13,2.36 (2.36,2.6 (2.6,2.83 543 369 379 339 335 406 (2.83,3.07 (3.07,3.31 (3.31,3.54 (3.54,3.78 (3.78,4.01 (4.01,4.25 458 459 465 480 468 505 (4.25,4.49 (4.49,4.72 (4.72,4.96 (4.96,5.2 (5.2,5.43 (5.43,5.67 349 385 321 291 187 138 (5.67,5.9 (5.9,6.14 (6.1
14、4,6.38 (6.38,6.61 (6.61,6.85 (6.85,7.09 97 92 78 50 18 4 which.max(table(interval)(0.709,0.945 4 for(i in 3:5)+ for(j in 1:10)+ ds=dbscan(iris,1:4,eps=i,MinPts=j)+ print(ds)+ + dbscan Pts=150 MinPts=1 eps=3 1seed 150total 150dbscan Pts=150 MinPts=2 eps=3 1seed 150total 150dbscan Pts=150 MinPts=3 eps
15、=3 1seed 150total 150dbscan Pts=150 MinPts=4 eps=3 1seed 150total 150dbscan Pts=150 MinPts=5 eps=3 1seed 150total 150dbscan Pts=150 MinPts=6 eps=3 1seed 150total 150dbscan Pts=150 MinPts=7 eps=3 1seed 150total 150dbscan Pts=150 MinPts=8 eps=3 1seed 150total 150dbscan Pts=150 MinPts=9 eps=3 1seed 150
16、total 150dbscan Pts=150 MinPts=10 eps=3 1seed 150total 150dbscan Pts=150 MinPts=1 eps=4 1seed 150total 150dbscan Pts=150 MinPts=2 eps=4 1seed 150total 150dbscan Pts=150 MinPts=3 eps=4 1seed 150total 150dbscan Pts=150 MinPts=4 eps=4 1seed 150total 150dbscan Pts=150 MinPts=5 eps=4 1seed 150total 150db
17、scan Pts=150 MinPts=6 eps=4 1seed 150total 150dbscan Pts=150 MinPts=7 eps=4 1seed 150total 150dbscan Pts=150 MinPts=8 eps=4 1seed 150total 150dbscan Pts=150 MinPts=9 eps=4 1seed 150total 150dbscan Pts=150 MinPts=10 eps=4 1seed 150total 150dbscan Pts=150 MinPts=1 eps=5 1seed 150total 150dbscan Pts=15
18、0 MinPts=2 eps=5 1seed 150total 150dbscan Pts=150 MinPts=3 eps=5 1seed 150total 150dbscan Pts=150 MinPts=4 eps=5 1seed 150total 150dbscan Pts=150 MinPts=5 eps=5 1seed 150total 150dbscan Pts=150 MinPts=6 eps=5 1seed 150total 150dbscan Pts=150 MinPts=7 eps=5 1seed 150total 150dbscan Pts=150 MinPts=8 e
19、ps=5 1seed 150total 150dbscan Pts=150 MinPts=9 eps=5 1seed 150total 150dbscan Pts=150 MinPts=10 eps=5 1seed 150total 150#30次dbscan的聚类结果 ds5=dbscan(iris,1:4,eps=3,MinPts=2) ds6=dbscan(iris,1:4,eps=4,MinPts=5) ds7=dbscan(iris,1:4,eps=5,MinPts=9) par(mfcol=c(1,3) plot(ds5,iris,1:4,main=1: MinPts=2 eps=
20、3) plot(ds6,iris,1:4,main=3: MinPts=5 eps=4) plot(ds7,iris,1:4,main=2: MinPts=9 eps=5)# EM 期望最大化聚类 install.packages(mclust) library(mclust) fit_EM=Mclust(iris,1:4)fitting . |=| 100% summary(fit_EM)- Gaussian finite mixture model fitted by EM algorithm - Mclust VEV (ellipsoidal, equal shape) model wi
21、th 2 components: log.likelihood n df BIC ICL -215.726 150 26 -561.7285 -561.7289Clustering table: 1 2 50 100 summary(fit_EM,parameters=TRUE)- Gaussian finite mixture model fitted by EM algorithm - Mclust VEV (ellipsoidal, equal shape) model with 2 components: log.likelihood n df BIC ICL -215.726 150
22、 26 -561.7285 -561.7289Clustering table: 1 2 50 100 Mixing probabilities: 1 2 0.3333319 0.6666681 Means: ,1 ,2Sepal.Length 5.0060022 6.261996Sepal.Width 3.4280049 2.871999Petal.Length 1.4620007 4.905992Petal.Width 0.2459998 1.675997Variances:,1 Sepal.Length Sepal.Width Petal.Length Petal.WidthSepal.
23、Length 0.15065114 0.13080115 0.02084463 0.01309107Sepal.Width 0.13080115 0.17604529 0.01603245 0.01221458Petal.Length 0.02084463 0.01603245 0.02808260 0.00601568Petal.Width 0.01309107 0.01221458 0.00601568 0.01042365,2 Sepal.Length Sepal.Width Petal.Length Petal.WidthSepal.Length 0.4000438 0.1086544
24、4 0.3994018 0.14368256Sepal.Width 0.1086544 0.10928077 0.1238904 0.07284384Petal.Length 0.3994018 0.12389040 0.6109024 0.25738990Petal.Width 0.1436826 0.07284384 0.2573899 0.16808182 plot(fit_EM)#对EM聚类结果作图Model-based clustering plots: 1: BIC2: classification3: uncertainty4: densitySelection: (下面显示选项
25、) #选1#选2#选3#选4Selection: 0 iris_BIC=mclustBIC(iris,1:4)fitting . |=| 100% iris_BICsum=summary(iris_BIC,data=iris,1:4) iris_BICsum #获取数1据集iris在各模型和类别数下的BIC值Best BIC values: VEV,2 VEV,3 VVV,2BIC -561.7285 -562.5522369 -574.01783BIC diff 0.0000 -0.8237748 -12.28937Classification table for model (VEV,2): 1 2 50 100 iris_BICBayesian Information Criterion (BIC): EII VII EEI VEI EVI VVI EEE1 -1804.0854 -1804.0854 -1522.1202 -1522.1202 -1522.1202 -1522.1202 -829.97
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1