Sqoop安装与使用sqoop145 on hadoop 104.docx
《Sqoop安装与使用sqoop145 on hadoop 104.docx》由会员分享,可在线阅读,更多相关《Sqoop安装与使用sqoop145 on hadoop 104.docx(8页珍藏版)》请在冰豆网上搜索。
Sqoop安装与使用sqoop145onhadoop104
Sqoop安装与使用(sqoop-1.4.5onhadoop1.0.4)
1.什么是Sqoop
Sqoop即SQLtoHadoop,是一款方便的在传统型数据库与Hadoop之间进行数据迁移的工具,充分利用MapReduce并行特点以批处理的方式加快数据传输,发展至今主要演化了二大版本,Sqoop1和Sqoop2。
Sqoop工具是hadoop下连接关系型数据库和Hadoop的桥梁,支持关系型数据库和hive、hdfs,hbase之间数据的相互导入,可以使用全表导入和增量导入。
那么为什么选择Sqoop呢?
高效可控的利用资源,任务并行度,超时时间。
数据类型映射与转化,可自动进行,用户也可自定义 支持多种主流数据库,MySQL,Oracle,SQLServer,DB2等等
2.Sqoop1和Sqoop2对比的异同之处
两个不同的版本,完全不兼容 版本号划分区别,Apache版本:
1.4.x(Sqoop1);1.99.x(Sqoop2) CDH版本:
Sqoop-1.4.3-cdh4(Sqoop1);Sqoop2-1.99.2-cdh4.5.0(Sqoop2)Sqoop2比Sqoop1的改进 引入Sqoopserver,集中化管理connector等 多种访问方式:
CLI,WebUI,RESTAPI 引入基于角色的安全机制
3.Sqoop1与Sqoop2的架构图
Sqoop架构图1
Sqoop架构图2
4.Sqoop1与Sqoop2的优缺点
比较
Sqoop1
Sqoop2
架构
仅仅使用一个Sqoop客户端
引入了Sqoopserver集中化管理connector,以及restapi,web,UI,并引入权限安全机制
部署
部署简单,安装需要root权限,connector必须符合JDBC模型
架构稍复杂,配置部署更繁琐
使用
命令行方式容易出错,格式紧耦合,无法支持所有数据类型,安全机制不够完善,例如密码暴漏
多种交互方式,命令行,webUI,restAPI,conncetor集中化管理,所有的链接安装在Sqoopserver上,完善权限管理机制,connector规范化,仅仅负责数据的读写
5.Sqoop的安装部署
5.0安装环境
hadoop:
hadoop-1.0.4
sqoop:
sqoop-1.4.5.bin__hadoop-1.0.0
5.1下载安装包及解压
tar-zxvfsqoop-1.4.5.bin__hadoop-1.0.0.tar.gz
ln-s./package/sqoop-1.4.5.bin__hadoop-1.0.0/sqoop
5.2配置环境变量和配置文件
cdsqoop/conf/
mvsqoop-env-template.shsqoop-env.sh
visqoop-env.sh
在sqoop-env.sh中添加如下代码
#Setpathtowherebin/hadoopisavailable
#exportHADOOP_COMMON_HOME=
exportHADOOP_COMMON_HOME=/home/hadoop/hadoop-2.2.0
#Setpathtowherehadoop-*-core.jarisavailable
#exportHADOOP_MAPRED_HOME=
exportHADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.2.0
#setthepathtowherebin/hbaseisavailable
#exportHBASE_HOME=
exportHBASE_HOME=/home/hadoop/hbase-0.96.2-hadoop2
#Setthepathtowherebin/hiveisavailable
#exportHIVE_HOME=
exportHIVE_HOME=/home/hadoop/apache-hive-0.13.1-bin
#Setthepathforwherezookeperconfigdiris
#exportZOOCFGDIR=
exportZOOCFGDIR=/home/hadoop/zookeeper-3.4.5
(如果数据读取不设计hbase和hive,那么相关hbase和hive的配置可以不加,如果集群有独立的zookeeper集群,那么配置zookeeper,反之,不用配置)。
5.3copy需要的lib包到Sqoop/lib
所需的包:
hadoop-core包、Oracle的jdbc包、mysql的jdbc包(oracle的jar包:
ojdbc6.jar,mysql的jar包mysql-connector-java-5.1.17.jar)
cp~/hadoop/hadoop-core-2.2.0.jar~/sqoop/lib/
cp~/ojdbc6.jar~/sqoop/lib/
cp~/mysql-connector-java-5.1.17.jar~/sqoop/lib/
5.4添加环境变量
vi~/.bash_profile
添加如下内容
viewsourceprint?
1.#Sqoop
2.exportSQOOP_HOME=/home/hadoop/sqoop
3.exportPATH=$PATH:
$SQOOP_HOME/bin
source~/.bash_profile
5.5测试oracle数据库的连接使用
①连接oracle数据库,列出所有的数据库
[hadoop@eb179sqoop]$sqooplist-databases--connectjdbc:
oracle:
thin:
@10.1.69.173:
1521:
ORCLBI--usernamehuangq-P
或者sqooplist-databases--connectjdbc:
oracle:
thin:
@10.1.69.173:
1521:
ORCLBI--usernamehuangq--password123456
或者MySQL:
sqooplist-databases--connectjdbc:
mysql:
//172.19.17.119:
3306/--usernamehadoop--passwordhadoop
Warning:
/home/hadoop/sqoop/../hcatalogdoesnotexist!
HCatalogjobswillfail.
Pleaseset$HCAT_HOMEtotherootofyourHCataloginstallation.
Warning:
/home/hadoop/sqoop/../accumulodoesnotexist!
Accumuloimportswillfail.
Pleaseset$ACCUMULO_HOMEtotherootofyourAccumuloinstallation.
Warning:
$HADOOP_HOMEisdeprecated.
14/08/1711:
59:
24INFOsqoop.Sqoop:
RunningSqoopversion:
1.4.5
Enterpassword:
14/08/1711:
59:
27INFOoracle.OraOopManagerFactory:
DataConnectorforOracleandHadoopisdisabled.
14/08/1711:
59:
27INFOmanager.SqlManager:
UsingdefaultfetchSizeof1000
14/08/1711:
59:
51INFOmanager.OracleManager:
TimezonehasbeensettoGMT
MRDRP
MKFOW_QH
②Oracle数据库的表导入到HDFS
注意:
默认情况下会使用4个map任务,每个任务都会将其所导入的数据写到一个单独的文件中,4个文件位于同一目录,本例中-m1表示只使用一个map任务文本文件不能保存为二进制字段,并且不能区分null值和字符串值"null" 执行下面的命令后会生成一个ENTERPRISE.java文件,可以通过lsENTERPRISE.java查看,代码生成是sqoop导入过程的必要部分,sqoop在将源数据库中的数据写到HDFS前,首先会用生成的代码将其进行反序列化
[hadoop@eb179~]$ sqoopimport--connectjdbc:
oracle:
thin:
@10.1.69.173:
1521:
ORCLBI--usernamehuangq--password123456--tableORD_UV-m1--target-dir/user/sqoop/test--direct-split-size67108864
Warning:
/home/hadoop/sqoop/../hcatalogdoesnotexist!
HCatalogjobswillfail.
Pleaseset$HCAT_HOMEtotherootofyourHCataloginstallation.
Warning:
/home/hadoop/sqoop/../accumulodoesnotexist!
Accumuloimportswillfail.
Pleaseset$ACCUMULO_HOMEtotherootofyourAccumuloinstallation.
Warning:
$HADOOP_HOMEisdeprecated.
14/08/1715:
21:
34INFOsqoop.Sqoop:
RunningSqoopversion:
1.4.5
14/08/1715:
21:
34WARNtool.BaseSqoopTool:
Settingyourpasswordonthecommand-lineisinsecure.Considerusing-Pinstead.
14/08/1715:
21:
34INFOoracle.OraOopManagerFactory:
DataConnectorforOracleandHadoopisdisabled.
14/08/1715:
21:
34INFOmanager.SqlManager:
UsingdefaultfetchSizeof1000
14/08/1715:
21:
34INFOtool.CodeGenTool:
Beginningcodegeneration
14/08/1715:
21:
46INFOmanager.OracleManager:
TimezonehasbeensettoGMT
14/08/1715:
21:
46INFOmanager.SqlManager:
ExecutingSQLstatement:
SELECTt.*FROMORD_UVtWHERE1=0
14/08/1715:
21:
46INFOorm.CompilationManager:
HADOOP_MAPRED_HOMEis/home/hadoop/hadoop
Note:
/tmp/sqoop-hadoop/compile/328657d577512bd2c61e07d66aaa9bb7/ORD_UV.javausesoroverridesadeprecatedAPI.
Note:
Recompilewith-Xlint:
deprecationfordetails.
14/08/1715:
21:
47INFOorm.CompilationManager:
Writingjarfile:
/tmp/sqoop-hadoop/compile/328657d577512bd2c61e07d66aaa9bb7/ORD_UV.jar
14/08/1715:
21:
47INFOmanager.OracleManager:
TimezonehasbeensettoGMT
14/08/1715:
21:
47INFOmanager.OracleManager:
TimezonehasbeensettoGMT
14/08/1715:
21:
47INFOmapreduce.ImportJobBase:
BeginningimportofORD_UV
14/08/1715:
21:
47INFOmanager.OracleManager:
TimezonehasbeensettoGMT
14/08/1715:
21:
49INFOdb.DBInputFormat:
Usingreadcommitedtransactionisolation
14/08/1715:
21:
49INFOmapred.JobClient:
Runningjob:
job_201408151734_0027
14/08/1715:
21:
50INFOmapred.JobClient:
map0%reduce0%
14/08/1715:
22:
12INFOmapred.JobClient:
map100%reduce0%
14/08/1715:
22:
17INFOmapred.JobClient:
Jobcomplete:
job_201408151734_0027
14/08/1715:
22:
17INFOmapred.JobClient:
Counters:
18
14/08/1715:
22:
17INFOmapred.JobClient:
JobCounters
14/08/1715:
22:
17INFOmapred.JobClient:
SLOTS_MILLIS_MAPS=15862
14/08/1715:
22:
17INFOmapred.JobClient:
Totaltimespentbyallreduceswaitingafterreservingslots(ms)=0
14/08/1715:
22:
17INFOmapred.JobClient:
Totaltimespentbyallmapswaitingafterreservingslots(ms)=0
14/08/1715:
22:
17INFOmapred.JobClient:
Launchedmaptasks=1
14/08/1715:
22:
17INFOmapred.JobClient:
SLOTS_MILLIS_REDUCES=0
14/08/1715:
22:
17INFOmapred.JobClient:
FileOutputFormatCounters
14/08/1715:
22:
17INFOmapred.JobClient:
BytesWritten=1472
14/08/1715:
22:
17INFOmapred.JobClient:
FileSystemCounters
14/08/1715:
22:
17INFOmapred.JobClient:
HDFS_BYTES_READ=87
14/08/1715:
22:
17INFOmapred.JobClient:
FILE_BYTES_WRITTEN=33755
14/08/1715:
22:
17INFOmapred.JobClient:
HDFS_BYTES_WRITTEN=1472
14/08/1715:
22:
17INFOmapred.JobClient:
FileInputFormatCounters
14/08/1715:
22:
17INFOmapred.JobClient:
BytesRead=0
14/08/1715:
22:
17INFOmapred.JobClient:
Map-ReduceFramework
14/08/1715:
22:
17INFOmapred.JobClient:
Mapinputrecords=81
14/08/1715:
22:
17INFOmapred.JobClient:
Physicalmemory(bytes)snapshot=192405504
14/08/1715:
22:
17INFOmapred.JobClient:
SpilledRecords=0
14/08/1715:
22:
17INFOmapred.JobClient:
CPUtimespent(ms)=1540
14/08/1715:
22:
17INFOmapred.JobClient:
Totalcommittedheapusage(bytes)=503775232
14/08/1715:
22:
17INFOmapred.JobClient:
Virtualmemory(bytes)snapshot=2699571200
14/08/1715:
22:
17INFOmapred.JobClient:
Mapoutputrecords=81
14/08/1715:
22:
17INFOmapred.JobClient:
SPLIT_RAW_BYTES=87
14/08/1715:
22:
17INFOmapreduce.ImportJobBase:
Transferred1.4375KBin29.3443seconds(50.1631bytes/sec)
14/08/1715:
22:
17INFOmapreduce.ImportJobBase:
Retrieved81records.
③数据导出Oracle和HBase
使用export可将hdfs中数据导入到远程数据库中
export--connectjdbc:
oracle:
thin:
@192.168.**.**:
**:
**--username**--password=**-m1tableVEHICLE--export-dir/user/root/VEHICLE
向Hbase导入数据
sqoopimport--connectjdbc:
oracle:
thin:
@192.168.**.**:
**:
**--username**--password=**--m1--tableVEHICLE--hbase-create-table--hbase-tableVEHICLE--hbase-row-keyID--column-familyVEHICLEINFO--split-byID
5.6测试Mysql数据库的使用
前提:
导入mysqljdbc的jar包
1测试数据库连接
sqooplist-databases–connectjdbc:
mysql:
//192.168.10.63–usernameroot–password123456
②Sqoop的使用
以下所有的命令每行之后都存在一个空格,不要忘记
(以下6中命令都没有进行过成功测试)
<1>mysql–>hdfs
sqoopexport–connect
jdbc:
mysql:
//192.168.10.63/ipj
–usernameroot
–password123456
–tableipj_flow_user
–export-dirhdfs:
//192.168.10.63:
8020/user/flow/part-m-00000
前提:
(1)hdfs中目录/user/flow/part-m-00000必须存在
(2)如果集群设置了压缩方式lzo,那么本机必须得安装且配置成功lzo
(3)hadoop集群中每个节点都要有对mysql的操作权限
<2>hdfs–>mysql
sqoopimport–connect
jdbc:
mysql:
//192.168.10.63/ipj
–tableipj_flow_user
<3>mysql–>hbase
sqoop import –connect
jdbc:
mysql:
//192.168.10.63/ipj
–tableipj_flow_user
–hbase-tableipj_statics_test
–hbase-create-table
–hbase-row-keyid
–column-familyimei
<4>hbase–>mysql
关于将Hbase的数据导入到mysql里,Sqoop并不是直接支持的,一般采用如下3种方法:
第一种:
将Hbase数据扁平化成HDFS文件,然后再由Sqoop导入.
第二种:
将Hbase数据导入Hive表中,然后再导入mysql。
第三种:
直接使用Hbase的JavaAPI读取表数据,直接向mysql导入
不需要使用Sqoop。
<5>mysql–>hive
sqoopimport–connect
jdbc:
mysql:
//192.168.10.63/ipj
–tablehive_table_test
–hive-import
–hive-tablehive_test_table或–create-hive-tablehive_test_table
<6>hive–>mysql
sqoopexport–connect
jdbc:
mysql:
//192.168.10.63/ipj
–usernamehive
–password123456
–tabletarget_table
–export-dir/user/hive/warehouse/uv/dt=mytable
前提:
mysql中表必须存在
2
3Sqoop其他操作
<1>列出mysql中的所有数据库
sqooplist-databases–connectjdbc:
mysql:
//192.168.10.63:
3306/–usernameroot–password123456
<2>列出mysql中某个库下所有表
sqooplist-tables–connectjdbc:
mysql:
//192.168.10.63:
3306/ipj–usernameroot–password123456
6Sqoop1的性能
测试数据: