Hadoop 在Windows7操作系统下使用Eclipse来搭建hadoop开发环境.docx
《Hadoop 在Windows7操作系统下使用Eclipse来搭建hadoop开发环境.docx》由会员分享,可在线阅读,更多相关《Hadoop 在Windows7操作系统下使用Eclipse来搭建hadoop开发环境.docx(10页珍藏版)》请在冰豆网上搜索。
Hadoop在Windows7操作系统下使用Eclipse来搭建hadoop开发环境
Hadoop在Windows7操作系统下使用Eclipse来搭建hadoop开发环境
网上有一些都是在Linux下使用安装Eclipse来进行hadoop应用开发,但是大部分Java程序员对linux系统不是那么熟悉,所以需要在windows下开发hadoop程序,所以经过试验,总结了下如何在windows下使用Eclipse来开发hadoop程序代码。
1、
需要下载hadoop的专门插件jar包
hadoop版本为2.3.0,hadoop集群搭建在centos6x上面,插件包下载地址为:
把插件包放到eclipse/plugins目录下
为了以后方便,我这里把尽可能多的jar包都放进来了,如下图所示:
3、重启eclipse,配置Hadoopinstallationdirectory
如果插件安装成功,打开Windows—Preferences后,在窗口左侧会有HadoopMap/Reduce选项,点击此选项,在窗口右侧设置Hadoop安装路径。
4、配置Map/ReduceLocations
打开Windows-->OpenPerspective-->Other
选择Map/Reduce,点击OK,在右下方看到有个Map/Reduce
Locations的图标,如下图所示:
点击Map/Reduce
Location选项卡,点击右边小象图标,打开HadoopLocation配置窗口:
输入Location
Name,任意名称即可.配置Map/ReduceMaster和DFSMastrer,Host和Port配置成与core-site.xml的设置一致即可。
去找core-site.xml配置:
fs.default.name
hdfs:
//name01:
9000
在界面配置如下:
点击"Finish"按钮,关闭窗口。
点击左侧的DFSLocations—>myhadoop(上一步配置的locationname),如能看到user,表示安装成功,但是进去看到报错信息:
Error:
Permissiondenied:
user=root,access=READ_EXECUTE,inode="/tmp";hadoop:
supergroup:
drwx---------,如下图所示:
应该是权限问题:
把/tmp/目录下面所有的关于hadoop的文件夹设置成hadoop用户所有然后分配授予777权限。
cd/tmp/
chmod777
/tmp/
chown-R
hadoop.hadoop/tmp/hsperfdata_root
之后重新连接打开DFS
Locations就显示正常了。
Map/ReduceMaster
(此处为Hadoop集群的Map/Reduce地址,应该和mapred-site.xml中的mapred.job.tracker设置相同)
(1):
点击报错:
Aninternalerroroccurred
during:
"ConnectingtoDFShadoopname01".
.UnknownHostException:
name01直接在hostname那一栏里面设置ip地址为:
192.168.52.128,即可,这样就正常打开了,如下图所示:
5、新建WordCount项目
File—>Project,选择Map/ReduceProject,输入项目名称WordCount等。
在WordCount项目里新建class,名称为WordCount,报错代码如下:
InvalidHadoopRuntimespecified;please
click'ConfigureHadoopinstalldirectory'orfillinlibrarylocationinput
field,报错原因是目录选择不对,不能选择在跟目录E:
\hadoop下,换成E:
\u\hadoop\就可以了,如下所示:
一路下一步过去,点击Finished按钮,完成工程创建,Eclipse控制台下面出现如下信息:
14-12-9下午04时03分10秒:
Eclipseisrunningina
JRE,butaJDKisrequired
SomeMavenpluginsmaynotworkwhenimportingprojectsorupdating
sourcefolders.
14-12-9下午04时03分13秒:
Refreshing
[/WordCount/pom.xml]
14-12-9下午04时03分14秒:
Refreshing
[/WordCount/pom.xml]
14-12-9下午04时03分14秒:
Refreshing
[/WordCount/pom.xml]
14-12-9下午04时03分14秒:
Updatingindex
central|http:
//repo1.maven.org/maven2
14-12-9下午04时04分10秒:
Updatedindexfor
central|http:
//repo1.maven.org/maven2
6,Lib包导入:
需要添加的hadoop相应jar包有:
/hadoop-2.3.0/share/hadoop/common下所有jar包,及里面的lib目录下所有jar包,
/hadoop-2.3.0/share/hadoop/hdfs下所有jar包,不包括里面lib下的jar包,
/hadoop-2.3.0/share/hadoop/mapreduce下所有jar包,不包括里面lib下的jar包,
/hadoop-2.3.0/share/hadoop/yarn下所有jar包,不包括里面lib下的jar包,
大概18个jar包左右。
7,Eclipse直接提交mapreduce任务所需要环境配置代码如下所示:
packagewc;importjava.io.IOException;importjava.util.StringTokenizer;importorg.apache.hadoop.conf.Configuration;importorg.apache.hadoop.fs.Path;importorg.apache.hadoop.io.IntWritable;importorg.apache.hadoop.io.Text;importorg.apache.hadoop.mapreduce.Job;importorg.apache.hadoop.mapreduce.Mapper;importorg.apache.hadoop.mapreduce.Reducer;importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;importorg.apache.hadoop.util.GenericOptionsParser;publicclassW2{publicstaticclassTokenizerMapperextendsMapper<Object,Text,Text,IntWritable>{privatefinalstaticIntWritableone=newIntWritable
(1);privateTextword=newText();publicvoidmap(Objectkey,Textvalue,Contextcontext)throwsIOException,InterruptedException{StringTokenizeritr=newStringTokenizer(value.toString());while(itr.hasMoreTokens()){word.set(itr.nextToken());context.write(word,one);}}}publicstaticclassIntSumReducerextendsReducer<Text,IntWritable,Text,IntWritable>{privateIntWritableresult=newIntWritable();publicvoidreduce(Textkey,Iterable<IntWritable>values,Contextcontext)throwsIOException,InterruptedException{intsum=0;for(IntWritableval:
values){sum+=val.get();}result.set(sum);context.write(key,result);}}publicstaticvoidmain(String[]args)throwsException{Configurationconf=newConfiguration();System.setProperty(\
8、运行
8.1、在HDFS上创建目录input
[hadoop@name01
hadoop-2.3.0]$hadoopfs-ls/
[hadoop@name01hadoop-2.3.0]$hadoopfs-mkdirinput
mkdir:
`input':
Nosuchfileordirectory
[hadoop@name01hadoop-2.3.0]$PS:
fs需要全目录的方式来创建文件夹
如果Apachehadoop版本是0.x或者1.x,
bin/hadoophdfsfs-mkdir-p/in
bin/hadoophdfsfs-put
/home/du/input/in
如果Apachehadoop版本是2.x.
bin/hdfsdfs-mkdir-p/in
bin/hdfsdfs-put/home/du/input/in
如果是发行版的hadoop,比如ClouderaCDH,IBMBI,HortonworksHDP则第一种命令即可。
要注意创建目录的全路径。
另外hdfs的根目录是/
2、拷贝本地README.txt到HDFS的input里
[hadoop@name01hadoop-2.3.0]$find.-name
README.txt
./share/doc/hadoop/common/README.txt
[hadoop@name01~]$hadoopfs-copyFromLocal
./src/hadoop-2.3.0/share/doc/hadoop/common/README.txt/data/input
[hadoop@name01~]$
[hadoop@name01~]$hadoopfs-ls/
Found2items
drwxr-xr-x-hadoopsupergroup02014-12-1523:
34/data
-rw-r--r--3hadoopsupergroup882014-08-2602:
21/input
Youhavenewmailin/var/spool/mail/root
[hadoop@name01~]$3,运行hadoop结束后,查看输出结果
(1),直接在hadoop服务器上面查看
[hadoop@name01~]$hadoopfs-ls/data/
Found2items
drwxr-xr-x-hadoopsupergroup02014-12-1523:
29/data/input
drwxr-xr-x-hadoopsupergroup02014-12-1523:
34/data/output
[hadoop@name01~]$
(2),去Eclipse下查看(3),在控制台上查看信息2014-12-1615:
34:
01,303INFO[main]Configuration.deprecation(Configuration.java:
warnOnceIfDeprecated(996))-session.idisdeprecated.Instead,usedfs.metrics.session-id2014-12-1615:
34:
01,309INFO[main]jvm.JvmMetrics(JvmMetrics.java:
init(76))-InitializingJVMMetricswithprocessName=JobTracker,sessionId=2014-12-1615:
34:
02,047INFO[main]input.FileInputFormat(FileInputFormat.java:
listStatus(287))-Totalinputpathstoprocess:
12014-12-1615:
34:
02,120INFO[main]mapreduce.JobSubmitter(JobSubmitter.java:
submitJobInternal(396))-numberofsplits:
12014-12-1615:
34:
02,323INFO[main]mapreduce.JobSubmitter(JobSubmitter.java:
printTokens(479))-Submittingtokensforjob:
job_local1764589720_00012014-12-1615:
34:
02,367WARN[main]conf.Configuration(Configuration.java:
loadProperty(2345))-file:
/tmp/hadoop-hadoop/mapred/staging/hadoop1764589720/.staging/job_local1764589720_0001/job.xml:
anattempttooverridefinalparameter:
mapreduce.job.end-notification.max.retry.interval;Ignoring.2014-12-1615:
34:
02,368WARN[main]conf.Configuration(Configuration.java:
loadProperty(2345))-file:
/tmp/hadoop-hadoop/mapred/staging/hadoop1764589720/.staging/job_local1764589720_0001/job.xml:
anattempttooverridefinalparameter:
mapreduce.job.end-notification.max.attempts;Ignoring.2014-12-1615:
34:
02,682WARN[main]conf.Configuration(Configuration.java:
loadProperty(2345))-file:
/tmp/hadoop-hadoop/mapred/local/localRunner/hadoop/job_local1764589720_0001/job_local1764589720_0001.xml:
anattempttooverridefinalparameter:
mapreduce.job.end-notification.max.retry.interval;Ignoring.2014-12-1615:
34:
02,682WARN[main]conf.Configuration(Configuration.java:
loadProperty(2345))-file:
/tmp/hadoop-hadoop/mapred/local/localRunner/hadoop/job_local1764589720_0001/job_local1764589720_0001.xml:
anattempttooverridefinalparameter:
mapreduce.job.end-notification.max.attempts;Ignoring.2014-12-1615:
34:
02,703INFO[main]mapreduce.Job(Job.java:
submit(1289))-Theurltotrackthejob:
http:
//localhost:
8080/2014-12-1615:
34:
02,704INFO[main]mapreduce.Job(Job.java:
monitorAndPrintJob(1334))-Runningjob:
job_local1764589720_00012014-12-1615:
34:
02,707INFO[Thread-4]mapred.LocalJobRunner(LocalJobRunner.java:
createOutputCommitter(471))-OutputCommittersetinconfignull2014-12-1615:
34:
02,719INFO[Thread-4]mapred.LocalJobRunner(LocalJobRunner.java:
createOutputCommitter(489))-OutputCommitterisorg.apache.hadoop.mapreduce.lib.output.FileOutputCommitter2014-12-1615:
34:
02,853INFO[Thread-4]mapred.LocalJobRunner(LocalJobRunner.java:
runTasks(448))-Waitingformaptasks2014-12-1615:
34:
02,857INFO[LocalJobRunnerMapTaskExecutor#0]mapred.LocalJobRunner(LocalJobRunner.java:
run(224))-Startingtask:
attempt_local1764589720_0001_m_000000_02014-12-1615:
34:
02,919INFO[LocalJobRunnerMapTaskExecutor#0]util.ProcfsBasedProcessTree(ProcfsBasedProcessTree.java:
isAvailable(129))-ProcfsBasedProcessTreecurrentlyissupportedonlyonLinux.2014-12-1615:
34:
03,281INFO[LocalJobRunnerMapTaskExecutor#0]mapred.Task(Task.java:
initialize(581))-UsingResourceCalculatorProcessTree:
org.apache.hadoop.yarn.util.WindowsBasedProcessTree@2e1022ec2014-12-1615:
34:
03,287INFO[LocalJobRunnerMapTaskExecutor#0]mapred.MapTask(MapTask.java:
runNewMapper(733))-Processingsplit:
hdfs:
//192.168.52.128:
9000/data/input/README.txt:
0+13662014-12-1615:
34:
03,304INFO[LocalJobRunnerMapTaskExecutor#0]mapred.MapTask(MapTask.java:
createSortingCollector(388))-Mapoutputcollectorclass=org.apache.hadoop.mapred.MapTask$MapOutputBuffer2014-12-1615:
34:
03,340INFO[LocalJobRunnerMapTaskExecutor#0]mapred.MapTask(MapTask.java:
setEquator(1181))-(EQUATOR)0kvi26214396(104857584)2014-12-1615:
34:
03,341INFO[LocalJobRunnerMapTaskExecutor#0]mapred.MapTask(MapTask.java:
init(975))-mapreduce.task.io.sort.mb:
1002014-12-1615:
34:
03,341INFO[LocalJobRunnerMapTaskExecutor#0]mapred.MapTask(MapTask.java:
init(976))-softlimitat838860802014-12-1615:
34:
03,341INFO[LocalJobRunnerMapTaskExecutor#0]mapred.MapTask(MapTask.java:
init(977))-bufstart=0;bufvoid=1048576002014-12-1615:
34:
03,341INFO[LocalJobRunnerMapTaskExecutor#0]mapred.MapTask(MapTask.java:
init(978))-kvstart=26214396;length=65536002014-12-1615:
34:
03,708INFO[main]mapreduce.Job(Job.java:
monitor