ImageVerifierCode 换一换
格式:DOCX , 页数:18 ,大小:1.14MB ,
资源ID:10168449      下载积分:3 金币
快捷下载
登录下载
邮箱/手机:
温馨提示:
快捷下载时,用户名和密码都是您填写的邮箱或者手机号,方便查询和重复下载(系统自动生成)。 如填写123,账号就是123,密码也是123。
特别说明:
请自助下载,系统不会自动发送文件的哦; 如果您已付费,想二次下载,请登录后访问:我的下载记录
支付方式: 支付宝    微信支付   
验证码:   换一换

加入VIP,免费下载
 

温馨提示:由于个人手机设置不同,如果发现不能下载,请复制以下地址【https://www.bdocx.com/down/10168449.html】到电脑端继续下载(重复下载不扣费)。

已注册用户请登录:
账号:
密码:
验证码:   换一换
  忘记密码?
三方登录: 微信登录   QQ登录  

下载须知

1: 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。
2: 试题试卷类文档,如果标题没有明确说明有答案则都视为没有答案,请知晓。
3: 文件的所有权益归上传用户所有。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 本站仅提供交流平台,并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

版权提示 | 免责声明

本文(数据挖掘复习题和答案.docx)为本站会员(b****7)主动上传,冰豆网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知冰豆网(发送邮件至service@bdocx.com或直接QQ联系客服),我们立即给予删除!

数据挖掘复习题和答案.docx

1、数据挖掘复习题和答案一、 考虑表中二元分类问题的训练样本集1. 整个训练样本集关于类属性的熵是多少2. 关于这些训练集中a1,a2的信息增益是多少3. 对于连续属性a3,计算所有可能的划分的信息增益。4. 根据信息增益,a1,a2,a3哪个是最佳划分5. 根据分类错误率,a1,a2哪具最佳6. 根据gini指标,a1,a2哪个最佳答1.P(+) = 4/9 and P() = 5/94/9 log2(4/9) 5/9 log2(5/9) = .答2:(估计不考)答3:答4: According to information gain, a1 produces the best split.答5

2、:For attribute a1: error rate = 2/9.For attribute a2: error rate = 4/9.Therefore, according to error rate, a1 produces the best split.答6:二、 考虑如下二元分类问题的数据集 1. 计算信息增益,决策树归纳算法会选用哪个属性2. 计算 gini指标,决策树归纳会用哪个属性这个答案没问题3. 从图4-13可以看出熵和gini指标在0,都是单调递增,而,1之间单调递减。有没有可能信息增益和gini指标增益支持不同的属性解释你的理由Yes, even though t

3、hese measures have similar range and monotonousbehavior, their respective gains, , which are scaled differences of themeasures, do not necessarily behave in the same way, as illustrated bythe results in parts (a) and (b).贝叶斯分类1. P(A = 1|) = 2/5 = , P(B = 1|) = 2/5 = ,P(C = 1|) = 1, P(A = 0|) = 3/5 =

4、 ,P(B = 0|) = 3/5 = , P(C = 0|) = 0; P(A = 1|+) = 3/5 = ,P(B = 1|+) = 1/5 = , P(C = 1|+) = 2/5 = ,P(A = 0|+) = 2/5 = , P(B = 0|+) = 4/5 = ,P(C = 0|+) = 3/5 = .2. 3. P(A = 0|+) = (2 + 2)/(5 + 4) = 4/9,P(A = 0|) = (3+2)/(5 + 4) = 5/9,P(B = 1|+) = (1 + 2)/(5 + 4) = 3/9,P(B = 1|) = (2+2)/(5 + 4) = 4/9,P

5、(C = 0|+) = (3 + 2)/(5 + 4) = 5/9,P(C = 0|) = (0+2)/(5 + 4) = 2/9.4. Let P(A = 0,B = 1, C = 0) = K5. 当的条件概率之一是零,则估计为使用m-估计概率的方法的条件概率是更好的,因为我们不希望整个表达式变为零。1. P(A = 1|+) = , P(B = 1|+) = , P(C = 1|+) = , P(A =1|) = , P(B = 1|) = , and P(C = 1|) = 2.Let R : (A = 1,B = 1, C = 1) be the test record. To de

6、termine itsclass, we need to compute P(+|R) and P(|R). Using Bayes theorem, P(+|R) = P(R|+)P(+)/P(R) and P(|R) = P(R|)P()/P(R).Since P(+) = P() = and P(R) is constant, R can be classified bycomparing P(+|R) and P(|R).For this question,P(R|+) = P(A = 1|+) P(B = 1|+) P(C = 1|+) = P(R|) = P(A = 1|) P(B

7、 = 1|) P(C = 1|) = Since P(R|+) is larger, the record is assigned to (+) class.3.P(A = 1) = , P(B = 1) = and P(A = 1,B = 1) = P(A) P(B) = . Therefore, A and B are independent.4.P(A = 1) = , P(B = 0) = , and P(A = 1,B = 0) = P(A =1) P(B = 0) = . A and B are still independent.5.Compare P(A = 1,B = 1|+

8、) = against P(A = 1|+) = andP(B = 1|Class = +) = . Since the product between P(A = 1|+)and P(A = 1|) are not the same as P(A = 1,B = 1|+), A and B arenot conditionally independent given the class.三、 使用下表中的相似度矩阵进行单链和全链层次聚类。绘制树状况显示结果,树状图应该清楚地显示合并的次序。 There are no apparent relationships between s1, s2, c1, and c2.A2: Percentage of frequent itemsets = 16/32 = % (including the nullset).A4: False alarm rate is the ratio of I to the total number of itemsets. Sincethe count of I = 5, therefore the false alarm rate is 5/32 = %.

copyright@ 2008-2022 冰豆网网站版权所有

经营许可证编号:鄂ICP备2022015515号-1