hadoop集群环境搭建new.docx
《hadoop集群环境搭建new.docx》由会员分享,可在线阅读,更多相关《hadoop集群环境搭建new.docx(23页珍藏版)》请在冰豆网上搜索。
hadoop集群环境搭建new
一.准备环境
1.1.安装包
1)准备4台PC
2)安装配置Linux系统:
CentOS-7.0-1406-x86_64-DVD.iso
3)安装配置Java环境:
jdk-8u121-linux-x64.gz
4)安装配置Hadoop:
hadoop-2.7.4-x64.tar.gz
5)安装配置Hbase:
hbase-1.2.1-bin.tar.gz
1.2.网络配置
主机名
IP
master
172.16.18.102
slave1
172.16.18.103
slave2
172.16.18.104
slave3
172.16.18.105
1.3.常用命令
#systemctlstartfoo.service #运行一个服务
#systemctlstopfoo.service #停止一个服务
#systemctlrestartfoo.service #重启一个服务
#systemctlstatusfoo.service #显示一个服务(无论运行与否)的状态
#systemctlenablefoo.service #在开机时启用一个服务
#systemctldisablefoo.service #在开机时禁用一个服务
#systemctlis-enablediptables.service #查看服务是否开机启动
#reboot #重启主机
#shutdown-hnow #立即关机
#source/etc/profile #配置文件修改立即生效
#yuminstallnet-tools
二.安装配置CentOS
2.1安装CentOS
1)选择启动盘CentOS-7.0-1406-x86_64-DVD.iso,启动安装
2)选择InstallCentOS7,回车,继续安装
3)选择语言,默认是English,学习可以选择中文,正时环境选择English
4)配置网络和主机名,主机名:
master,网络选择开启,配置手动的IPV4
5)选择安装位置;在分区处选择手动配置;选择标准分区,点击这里自动创建他们,点击完成,收受更改
6)修改root密码,密码:
Jit123
7)重启,安装完毕。
2.2配置IP
2.2.1检查IP
#ipaddr
或
#iplink
2.2.2配置IP和网关
#cd/etc/sysconfig/network-scripts#进入网络配置文件目录
#findifcfg-em* #查到网卡配置文件,例如ifcfg-em1
#viifcfg-em1 #编辑网卡配置文件
或
#vi/etc/sysconfig/network-scripts/ifcfg-em1 #编辑网卡配置文件
配置内容:
BOOTPROTO=static #静态IP配置为static,动态配置为dhcp
ONBOOT=yes #开机启动
IPADDR=172.16.18.102 #IP地址
NETMASK=255.255.255.0 #子网掩码
GATEWAY=172.16.18.1
DNS1=219.149.194.55
#systemctlrestartnetwork.service #重启网络
2.2.3配置hosts
#vi/etc/hosts
编辑内容:
172.16.18.102master
172.16.18.103slave1
172.16.18.104slave2
172.16.18.105slave3
2.3关闭防火墙
#systemctlstatusfirewalld.service #检查防火墙状态
#systemctlstopfirewalld.service #关闭防火墙
#systemctldisablefirewalld.service#禁止开机启动防火墙
2.4时间同步
#yuminstall-yntp #安装ntp服务
#ntpdatecn.pool.ntp.org #同步网络时间
2.5安装配置jdk
2.5.1卸载自带jdk
安装好的CentOS会自带OpenJdk,用命令java-version,会有下面的信息:
Java version"1.6.0"
OpenJDKRuntimeEnvironment(build1.6.0-b09)
OpenJDK64-BitServerVM(build1.6.0-b09,mixedmode)
最好还是先卸载掉openjdk,在安装sun公司的jdk.
先查看rpm-qa|grepjava
显示如下信息:
java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
java-1.6.0-openjdk-1.6.0.0-1.7.b09.el5
卸载:
rpm-e--nodepsjava-1.4.2-gcj-compat-1.4.2.0-40jpp.115
rpm-e--nodepsjava-1.6.0-openjdk-1.6.0.0-1.7.b09.el5
还有一些其他的命令
rpm-qa|grepgcj
rpm-qa|grepjdk
如果出现找不到openjdksource的话,那么还可以这样卸载
yum-yremovejavajava-1.4.2-gcj-compat-1.4.2.0-40jpp.115
yum-yremovejavajava-1.6.0-openjdk-1.6.0.0-1.7.b09.el5
2.5.2安装jdk
上传jdk-8u121-linux-x64.gz安装包到root根目录
#mkdir/home
#tar-zxvfjdk-8u121-linux-x64.gz-C/home/
#rm-rfjdk-8u121-linux-x64.gz
2.5.3各个主机之间复制jdk
#scp-r/homeroot@slave1:
/home/hadoop
#scp-r/homeroot@slave2:
/home/hadoop
#scp-r/homeroot@slave3:
/home/hadoop
2.5.4各个主机配置jdk环境变量
#vi/etc/profile
编辑内容
exportJAVA_HOME=/home/jdk1.8.0_121
exportPATH=$JAVA_HOME/bin:
$PATH
exportCLASSPATH=.:
$JAVA_HOME/lib/dt.jar:
$JAVA_HOME/lib/tools.jar
#source/etc/profile #使配置文件生效
#java-version #查看java版本
创建hadoop用户(每台主机上执行)
[root@Master1~]#groupaddhadoop//创建用户组
[root@Master1~]#useradd-ghadoophadoop//新建hadoop用户并增加到hadoop工作组
[root@Master1~]#passwdhadoop //设置密码
2.6配置ssh无密钥访问
分别在各个主机上检查ssh服务状态:
#systemctlstatussshd.service #检查ssh服务状态
#yuminstallopenssh-serveropenssh-clients #安装ssh服务,如果已安装,则不用执行该步骤
#systemctlstartsshd.service #启动ssh服务,如果已安装,则不用执行该步骤
分别在各个主机上生成密钥(每台主机分别执行)
#su-hadoop//登录到hadoop用户
#ssh-keygen-trsa -P#生成密钥(按三次回车完成),如下图所示
在slave1上
#cp~/.ssh/id_rsa.pub~/.ssh/slave1.id_rsa.pub
#scp~/.ssh/slave1.id_rsa.pubhadoop@master:
~/.ssh
在slave2上
#cp~/.ssh/id_rsa.pub~/.ssh/slave2.id_rsa.pub
#scp~/.ssh/slave2.id_rsa.pubhadoop@master:
~/.ssh
在slave3上
#cp~/.ssh/id_rsa.pub~/.ssh/slave3.id_rsa.pub
#scp~/.ssh/slave3.id_rsa.pubhadoop@master:
~/.ssh
在master上
#cd~/.ssh
#catid_rsa.pub>>authorized_keys
#catslave1.id_rsa.pub>>authorized_keys
#catslave2.id_rsa.pub>>authorized_keys
#catslave3.id_rsa.pub>>authorized_keys
#scpauthorized_keyshadoop@slave1:
~/.ssh
#scpauthorized_keyshadoop@slave2:
~/.ssh
#scpauthorized_keyshadoop@slave3:
~/.ssh
分别在各个主机上执行如下命令(赋予权限)
su-hadoop
chmod 600 ~/.ssh/authorized_keys
测试ssh免密登录
sshslave1#第一次登录需要输入yes然后回车,如没提示输入密码,则配置成功。
三.安装配置hadoop
3.1安装hadoop
上传hadoop-2.7.4.tar.gz安装包到root根目录
#tar-zxvfhadoop-2.7.4.tar.gz-C/home/hadoop
#rm-rfhadoop-2.7.4.tar.gz
#mkdir/home/hadoop/hadoop-2.7.4/tmp
#mkdir/home/hadoop/hadoop-2.7.4/logs
#mkdir/home/hadoop/hadoop-2.7.4/hdf
#mkdir/home/hadoop/hadoop-2.7.4/hdf/data
#mkdir/home/hadoop/hadoop-2.7.4/hdf/name
3.1.1在hadoop中配置hadoop-env.sh文件
editthefileetc/hadoop/hadoop-env.shtodefinesomeparametersasfollows:
#settotherootofyourJavainstallation
exportJAVA_HOME=/home/jdk1.8.0_121
3.1.2修改yarn-env.sh
#exportJAVA_HOME=/home/y/libexec/jdk1.7.0/
exportJAVA_HOME=/home/jdk1.8.0_121
3.1.3修改slaves
#vi/home/hadoop/hadoop-2.7.4/etc/hadoop/slaves
配置内容:
删除:
localhost
添加:
slave1
slave2
slave3
3.1.4修改core-site.xml
#vi/home/hadoop/hadoop-2.7.4/etc/hadoop/core-site.xml
配置内容:
fs.default.name
hdfs:
//master:
9000
hadoop.tmp.dir
file:
/home/hadoop/hadoop-2.7.4/tmp
io.file.buffer.size
131072
该属性值单位为KB,131072KB即为默认的64M
3.1.5修改hdfs-site.xml
#vi/home/hadoop/hadoop-2.7.4/etc/hadoop/hdfs-site.xml
配置内容:
dfs.nameservices
hadoop-cluster1
dfs.datanode.data.dir
/home/hadoop/hadoop-2.7.4/hdf/data
true
dfs.namenode.name.dir
/home/hadoop/hadoop-2.7.4/hdf/name
true
dfs.replication
1
分片数量,伪分布式将其配置成1即可
dfs.permissions
false
3.1.6修改mapred-site.xml
#cp/home/hadoop/hadoop-2.7.4/etc/hadoop/mapred-site.xml.template/home/hadoop/hadoop-2.7.4/etc/hadoop/mapred-site.xml
#vi/home/hadoop/hadoop-2.7.4/etc/hadoop/mapred-site.xml
配置内容:
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
master:
10020
mapreduce.jobhistory.webapp.address
master:
19888
3.1.7修改yarn-site.xml
#vi/home/hadoop/hadoop-2.7.4/etc/hadoop/yarn-site.xml
配置内容:
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.mapred.ShuffleHandler
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.resourcemanager.address
master:
8032
yarn.resourcemanager.scheduler.address
master:
8030
yarn.resourcemanager.resource-tracker.address
master:
8031
yarn.resourcemanager.admin.address
master:
8033
yarn.resourcemanager.webapp.address
master:
8088
3.2各个主机之间复制hadoop
#scp-r/home/hadoop/hadoop-2.7.4hadoop@slave1:
/home/hadoop
#scp-r/home/hadoop/hadoop-2.7.4hadoop@slave2:
/home/hadoop
#scp-r/home/hadoop/hadoop-2.7.4hadoop@slave3:
/home/hadoop
3.3各个主机配置hadoop环境变量
#su-root
#vi/etc/profile
编辑内容:
exportHADOOP_HOME=/home/hadoop/hadoop-2.7.4
exportPATH=$HADOOP_HOME/bin:
$HADOOP_HOME/sbin:
$PATH
exportHADOOP_LOG_DIR=/home/hadoop/hadoop-2.7.4/logs
exportYARN_LOG_DIR=$HADOOP_LOG_DIR
#source/etc/profile #使配置文件生效
3.4格式化namenode
#cd/home/hadoop/hadoop-2.7.4/sbin
#hdfsnamenode-format
3.5启动hadoop
启动hdfs:
#cd/home/hadoop/hadoop-2.7.4/sbin
#start-all.sh
检查hadoop启动情况:
http:
//172.16.18.102:
50070#如下图所示
http:
//172.16.18.102:
8088/cluster#如下图所示
检查进程:
#jps
master主机包含ResourceManager、SecondaryNameNode、NameNode等,则表示启动成功,例如
2212ResourceManager
2484Jps
1917NameNode
2078SecondaryNameNode
各个slave主机包含DataNode、NodeManager等,则表示启用成功,例如
17153DataNode
17334Jps
17241NodeManager
停止hadoop命名
#stop-all.sh
四.安装配置zookeeper
4.1配置zookeeper环境变量
vi/etc/profile
exportZOOKEEPER_HOME=/home/hadoop/zookeeper-3.4.6
exportPATH=$ZOOKEEPER_HOME/bin:
$PATH
source/etc/profile
4.2配置zookeeper
1、到zookeeper官网下载zookeeper
2、在slave1,slave2,slave3上面搭建zookeeper
例如:
slave1172.16.18.103
slave2 172.16.18.104
slave3 172.16.18.105
3、上传zookeeper-3.4.6.tar.gz到任意一台服务器的根目录,并解压:
zookeeper:
tar–zxvfzookeeper-3.4.6.tar.gz-C/home/hadoop
4、在zookeeper目录下建立zookeeper-data目录,同时将zookeeper目录下conf/zoo_simple.cfg文件复制一份成zoo.cfg
cp/home/hadoop/zookeeper-3.4.6/conf/zoo_sample.cfgzoo.cfg
5、修改zoo.cfg
#Thenumberofmillisecondsofeachtick
tickTime=2000
#Thenumberofticksthattheinitial
#synchronizationphasecantake
initLimit=10
#Thenumberofticksthatcanpassbetween
#sendingarequestandgettinganacknowledgement
syncLimit=5
#thedirectorywherethesnapshotisstored.
#donotuse/tmpforstorage,/tmphereisjust
#examplesakes.
dataDir=/home/hadoop/zookeeper-3.4.6/zookeeper-data