Hadoop本地运行模式深入理解.docx

资源描述

Hadoop本地运行模式深入理解.docx

《Hadoop本地运行模式深入理解.docx》由会员分享，可在线阅读，更多相关《Hadoop本地运行模式深入理解.docx（18页珍藏版）》请在冰豆网上搜索。

Hadoop本地运行模式深入理解.docx

Hadoop本地运行模式深入理解

Hadoop的运行模式分为3种：

本地运行模式，伪分布运行模式，集群运行模式，相应概念如下：

1、独立模式即本地运行模式（standalone或localmode）

无需运行任何守护进程（daemon），所有程序都在单个JVM上执行。

由于在本机模式下测试和调试MapReduce程序较为方便，因此，这种模式适宜用在开发阶段。

2、伪分布运行模式

伪分布:

如果Hadoop对应的Java进程都运行在一个物理机器上,称为伪分布运行模式，如下图所示：

[root@hadoop20dir2]#jps

8993Jps

7409SecondaryNameNode

7142NameNode

7260DataNode

8685NodeManager

8590ResourceManager

3、集群模式

如果Hadoop对应的Java进程运行在多台物理机器上,称为集群模式.[集群就是有主有从]，如下图所示：

[root@hadoop11local]#jps

18046NameNode

30927Jps

18225SecondaryNameNode

[root@hadoop22~]#jps

9741ResourceManager

16569Jps

[root@hadoop33~]#jps

12775DataNode

20189Jps

12653NodeManager

[root@hadoop44~]#jps

10111DataNode

17519Jps

9988NodeManager

[root@hadoop55~]#jps

11563NodeManager

11686DataNode

19078Jps

[root@hadoop66~]#jps

10682DataNode

10560NodeManager

18085Jps

注意：

伪分布模式就是在一台服务器上面模拟集群环境,但仅仅是机器数量少,其通信机制与运行过程与真正的集群模式是一样的，hadoop的伪分布运行模式可以看做是集群运行模式的特殊情况。

为了方便文章的后续说明，先介绍一下hadoop的体系结构：

这里写图片描述

从Hadoop的体系结构可以看出，HDFS与MapReduce分别是Hadoop的标配文件系统与标配计算框架，但是呢？

–我们完全可以选择别的文件系统（如Windows的NTFS，Linux的ext4）与别的计算框架（如Spark、storm等）为Hadoop所服务，这恰恰说明了hadoop的松耦合性。

在hadoop的配置文件中，我们是通过core-site.xml这个配置文件指定所用的文件系统的。

fs.defaultFS

hdfs:

//hadoop11:

9000

下面将基于Linux与Windows两种开发环境详细说明hadoop的本地运行模式，其中核心知识点如下：

Hadoop的本地执行模式：

1、在windows的eclipse里面直接运行main方法，就会将job提交给本地执行器localjobrunner执行

—-输入输出数据可以放在本地路径下（c:

/wc/srcdata/）

—-输入输出数据也可以放在hdfs中（hdfs:

//hadoop20:

9000/dir）

2、在linux的eclipse里面直接运行main方法，但是不要添加yarn相关的配置，也会提交给localjobrunner执行

—-输入输出数据可以放在本地路径下（/usr/local/）

—-输入输出数据也可以放在hdfs中（hdfs:

//hadoop20:

9000/dir）

首先先基于Linux的开发环境进行介绍：

这里写图片描述

以WordCount程序为例，输入输出文件都放在本地路径下，代码如下：

packageMapReduce;

importjava.io.IOException;

importorg.apache.hadoop.conf.Configuration;

importorg.apache.hadoop.fs.FileSystem;

importorg.apache.hadoop.fs.Path;

importorg.apache.hadoop.io.LongWritable;

importorg.apache.hadoop.io.Text;

importorg.apache.hadoop.mapreduce.Job;

importorg.apache.hadoop.mapreduce.Mapper;

importorg.apache.hadoop.mapreduce.Reducer;

importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;

importorg.apache.hadoop.mapreduce.lib.input.TextInputFormat;

importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

importorg.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

importorg.apache.hadoop.mapreduce.lib.partition.HashPartitioner;

publicclassWordCount

{

publicstaticStringpath1="file:

///usr/local/word.txt";//file:

///代表本地文件系统中路径的意思

publicstaticStringpath2="file:

///usr/local/dir1";

publicstaticvoidmain（String[]args）throwsException

{

Configurationconf=newConfiguration（）;

FileSystemfileSystem=FileSystem.get（conf）;

if（fileSystem.exists（newPath（path2）））

{

fileSystem.delete（newPath（path2）,true）;

}

Jobjob=Job.getInstance（conf）;

job.setJarByClass（WordCount.class）;

FileInputFormat.setInputPaths（job,newPath（path1））;

job.setInputFormatClass（TextInputFormat.class）;

job.setMapperClass（MyMapper.class）;

job.setMapOutputKeyClass（Text.class）;

job.setMapOutputValueClass（LongWritable.class）;

job.setNumReduceTasks

（1）;

job.setPartitionerClass（HashPartitioner.class）;

job.setReducerClass（MyReducer.class）;

job.setOutputKeyClass（Text.class）;

job.setOutputValueClass（LongWritable.class）;

job.setOutputFormatClass（TextOutputFormat.class）;

FileOutputFormat.setOutputPath（job,newPath（path2））;

job.waitForCompletion（true）;

}

publicstaticclassMyMapperextendsMapper

{

protectedvoidmap（LongWritablek1,Textv1,Contextcontext）throwsIOException,InterruptedException

{

String[]splited=v1.toString（）.split（"\t"）;

for（Stringstring:

splited）

{

context.write（newText（string）,newLongWritable（1L））;

}

publicstaticclassMyReducerextendsReducer

{

protectedvoidreduce（Textk2,Iterablev2s,Contextcontext）throwsIOException,InterruptedException

{

longsum=0L;

for（LongWritablev2:

v2s）

{

sum+=v2.get（）;

}

context.write（k2,newLongWritable（sum））;

}

在程序的运行过程中，相应的java进程如下：

[root@hadoop20local]#jps

7621//对应的是启动的eclipse

9833Jps

9790WordCount//对应的是WordCount程序

下面我们在本地查看运行结果：

[root@hadoop20dir]#pwd

/usr/local/dir1

[root@hadoop20dir1]#morepart-r-00000

hello2

me1

you1

接下来我们将输入路径选择HDFS文件系统中的路径，输出路径还是本地linux文件系统，首先我们在linux上面启动HDFS分布式文件系统。

[root@hadoop20dir]#start-dfs.sh

Startingnamenodeson[hadoop20]

hadoop20:

startingnamenode,loggingto/usr/local/hadoop/logs/hadoop-root-namenode-hadoop20.out

hadoop20:

startingdatanode,loggingto/usr/local/hadoop/logs/hadoop-root-datanode-hadoop20.out

Startingsecondarynamenodes[0.0.0.0]

0.0.0.0:

startingsecondarynamenode,loggingto/usr/local/hadoop/logs/hadoop-root-secondarynamenode-hadoop20.out

[root@hadoop20dir]#jps

10260SecondaryNameNode

7621

10360Jps

9995NameNode

10110DataNode

还是以WordCount程序为例，代码如下：

packageMapReduce;

importjava.io.IOException;

importorg.apache.hadoop.conf.Configuration;

importorg.apache.hadoop.fs.FileSystem;

importorg.apache.hadoop.fs.Path;

importorg.apache.hadoop.io.LongWritable;

importorg.apache.hadoop.io.Text;

importorg.apache.hadoop.mapreduce.Job;

importorg.apache.hadoop.mapreduce.Mapper;

importorg.apache.hadoop.mapreduce.Reducer;

importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;

importorg.apache.hadoop.mapreduce.lib.input.TextInputFormat;

importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

importorg.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

importorg.apache.hadoop.mapreduce.lib.partition.HashPartitioner;

publicclassWordCount

{

publicstaticStringpath1="hdfs:

//hadoop90:

2000/word.txt";//读取HDFS中的测试集

publicstaticStringpath2="file:

///usr/local/dir2";//输出数据输出到本地文件系统中

publicstaticvoidmain（String[]args）throwsException

{

Configurationconf=newConfiguration（）;

FileSystemfileSystem=FileSystem.get（conf）;//默认获取的是本地文件系统的FileSystem实例（在这里就是linux文件系统的实例）

if（fileSystem.exists（newPath（path2）））

{

fileSystem.delete（newPath（path2）,true）;

}

Jobjob=Job.getInstance（conf）;

job.setJarByClass（WordCount.class）;

FileInputFormat.setInputPaths（job,newPath（path1））;

job.setInputFormatClass（TextInputFormat.class）;

job.setMapperClass（MyMapper.class）;

job.setMapOutputKeyClass（Text.class）;

job.setMapOutputValueClass（LongWritable.class）;

job.setNumReduceTasks

（1）;

job.setPartitionerClass（HashPartitioner.class）;

job.setReducerClass（MyReducer.class）;

job.setOutputKeyClass（Text.class）;

job.setOutputValueClass（LongWritable.class）;

job.setOutputFormatClass（TextOutputFormat.class）;

FileOutputFormat.setOutputPath（job,newPath（path2））;

job.waitForCompletion（true）;

}

publicstaticclassMyMapperextendsMapper

{

protectedvoidmap（LongWritablek1,Textv1,Contextcontext）throwsIOException,InterruptedException

{

String[]splited=v1.toString（）.split（"\t"）;

for（Stringstring:

splited）

{

context.write（newText（string）,newLongWritable（1L））;

}

publicstaticclassMyReducerextendsReducer

{

protectedvoidreduce（Textk2,Iterablev2s,Contextcontext）throwsIOException,InterruptedException

{

longsum=0L;

for（LongWritablev2:

v2s）

{

sum+=v2.get（）;

}

context.write（k2,newLongWritable（sum））;

}

运行结果如下：

[root@hadoop20dir2]#morepart-r-00000

hello2

me1

you1

[root@hadoop20dir2]#pwd

/usr/local/dir2

接下来我们将输入输出路径都换成HDFS中的路径：

代码如下：

packageMapReduce;

importjava.io.IOException;

importorg.apache.hadoop.conf.Configuration;

importorg.apache.hadoop.fs.FileSystem;

importorg.apache.hadoop.fs.Path;

importorg.apache.hadoop.io.LongWritable;

importorg.apache.hadoop.io.Text;

importorg.apache.hadoop.mapreduce.Job;

importorg.apache.hadoop.mapreduce.Mapper;

importorg.apache.hadoop.mapreduce.Reducer;

importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;

importorg.apache.hadoop.mapreduce.lib.input.TextInputFormat;

importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

importorg.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

importorg.apache.hadoop.mapreduce.lib.partition.HashPartitioner;

publicclassWordCount

{

publicstaticStringpath1="hdfs:

//hadoop20:

9000/word.txt";//读取HDFS中的测试集

publicstaticStringpath2="hdfs:

//hadoop20:

9000/dir3";

publicstaticvoidmain（String[]args）throwsException

{

Configurationconf=newConfiguration（）;

FileSystemfileSystem=FileSystem.get（conf）;

if（fileSystem.exists（newPath（path2）））

{

fileSystem.delete（newPath（path2）,true）;

}

Jobjob=Job.getInstance（conf）;

job.setJarByClass（WordCount.class）;

FileInputFormat.setInputPaths（job,newPath（path1））;

job.setInputFormatClass（TextInputFormat.class）;

job.setMapperClass（MyMapper.class）;

job.setMapOutputKeyClass（Text.class）;

job.setMapOutputValueClass（LongWritable.class）;

job.setNumReduceTasks

（1）;

job.setPartitionerClass（HashPartitioner.class）;

job.setReducerClass（MyReducer.class）;

job.setOutputKeyClass（Text.class）;

job.setOutputValueClass（LongWritable.class）;

job.setOutputFormatClass（TextOutputFormat.class）;

FileOutputFormat.setOutputPath（job,newPath（path2））;

job.waitForCompletion（true）;

}

publicstaticclassMyMapperextendsMapper

{

protectedvoidmap（LongWritablek1,Textv1,Contextcontext）throwsIOException,InterruptedException

{

String[]splited=v1.toString（）.split（"\t"）;

for（Stringstring:

splited）

{

context.write（newText（string）,newLongWritable（1L））;

}

publicstaticclassMyReducerextendsReducer

{

protectedvoidreduce（Textk2,Iterable

展开阅读全文