本文共 2648 字,大约阅读时间需要 8 分钟。
前期工作:
Hadoop正常开启,将WordCount.Java文件放在Hadoop安装目录下,并在目录下创建输入目录input,目录下有输入文件file1、file2。其中:
file1的内容为:
hello world
file2的内容为:
hello Hadoop
hello mapreduce
准备好之后在命令行输入命令运行。下面对执行的命令进行介绍:
1)在集群上创建输入文件夹:
xiaoqian@ubuntu:~/opt/hadoop$ sudo bin/hadoop fs -mkdir input3
2) 上传本地目录input下的几个file文件到集群上的input3目录下:
xiaoqian@ubuntu:~/opt/hadoop$ sudo bin/hadoop fs -put input/file* input3
3)编译WordCount.Java程序,将结果放入当前目录的wordcount_classes目录下:
xiaoqian@ubuntu:~/opt/hadoop$ javac -classpath hadoop-0.20.1-core.jar:lib/commons-cli-1.2.jar -d wordcount_classes WordCount.java
4)将编译结果打成jar包:
xiaoqian@ubuntu:~/opt/hadoop$ jar -cvf wordcount.jar -C WordCount
5) 在集群上运行WordCount程序,以input3目录作为输入目录,output3目录作为输出目录:
xiaoqian@ubuntu:~/opt/hadoop$ sudo bin/hadoop jar wordcount_classes.jar org.apache.hadoop.examples.WordCount input3 output3 14/04/21 17:56:52 INFO input.FileInputFormat: Total input paths to process : 2 14/04/21 17:56:52 INFO mapred.JobClient: Running job: job_201404211455_0013 14/04/21 17:56:53 INFO mapred.JobClient: map 0% reduce 0% 14/04/21 17:57:02 INFO mapred.JobClient: map 100% reduce 0% 14/04/21 17:57:14 INFO mapred.JobClient: map 100% reduce 100% 14/04/21 17:57:16 INFO mapred.JobClient: Job complete: job_201404211455_0013 14/04/21 17:57:16 INFO mapred.JobClient: Counters: 17 14/04/21 17:57:16 INFO mapred.JobClient: Job Counters 14/04/21 17:57:16 INFO mapred.JobClient: Launched reduce tasks=1 14/04/21 17:57:16 INFO mapred.JobClient: Launched map tasks=2 14/04/21 17:57:16 INFO mapred.JobClient: Data-local map tasks=2 14/04/21 17:57:16 INFO mapred.JobClient: FileSystemCounters 14/04/21 17:57:16 INFO mapred.JobClient: FILE_BYTES_READ=71 14/04/21 17:57:16 INFO mapred.JobClient: HDFS_BYTES_READ=41 14/04/21 17:57:16 INFO mapred.JobClient: FILE_BYTES_WRITTEN=212 14/04/21 17:57:16 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=37 14/04/21 17:57:16 INFO mapred.JobClient: Map-Reduce Framework 14/04/21 17:57:16 INFO mapred.JobClient: Reduce input groups=0 14/04/21 17:57:16 INFO mapred.JobClient: Combine output records=5 14/04/21 17:57:16 INFO mapred.JobClient: Map input records=3 14/04/21 17:57:16 INFO mapred.JobClient: Reduce shuffle bytes=47 14/04/21 17:57:16 INFO mapred.JobClient: Reduce output records=0 14/04/21 17:57:16 INFO mapred.JobClient: Spilled Records=10 14/04/21 17:57:16 INFO mapred.JobClient: Map output bytes=65 14/04/21 17:57:16 INFO mapred.JobClient: Combine input records=6 14/04/21 17:57:16 INFO mapred.JobClient: Map output records=614/04/21 17:57:16 INFO mapred.JobClient: Reduce input records=5
6)查看输出结果:
xiaoqian@ubuntu:~/opt/hadoop$ sudo bin/hadoop fs -cat output3/part-r-00000
hadoop 1
hello 3 mapreduce 1world 1
转载地址:http://xrsws.baihongyu.com/