티스토리 뷰

hadoop-example을 이용하여 랜덤 데이터를 생성한 후 이 데이터를 다시 정렬해 총 걸리는 시간을 측정하여 성능을 유추해 낼 수 있다.


다음의 명령을 사용해 random 데이터를 생성한다

$bin/hadoop jar hadoop-example-1.2.1.jar randomwriter -D test.randomwrite.bytes_per_map=100 -D test.randomwriter.maps_per_host = 10 data/unsorted-data

각 옵션은 맵에서 생성되는 데이터의 크기와 각 맵에서 생성되는 데이터의 크기를 지칭한다.

수행하면 다음과 같은 결과가 나온다.

hadoop@hadoop-VirtualBox:/usr/local/hadoop-1.2.1$ bin/hadoop jar hadoop-examples-1.2.1.jar randomwriter -D test.randomwrite.bytes_per_map=100 -D test.randomwriter.maps_per_host=10 data/unsorted-data

Running 10 maps.

Job started: Wed Oct 23 11:30:18 KST 2013

13/10/23 11:30:18 INFO mapred.JobClient: Running job: job_201310231129_0001

13/10/23 11:30:19 INFO mapred.JobClient:  map 0% reduce 0%

13/10/23 11:30:26 INFO mapred.JobClient:  map 20% reduce 0%

13/10/23 11:30:29 INFO mapred.JobClient:  map 30% reduce 0%

13/10/23 11:30:30 INFO mapred.JobClient:  map 40% reduce 0%

13/10/23 11:30:33 INFO mapred.JobClient:  map 60% reduce 0%

13/10/23 11:30:37 INFO mapred.JobClient:  map 80% reduce 0%

13/10/23 11:30:41 INFO mapred.JobClient:  map 90% reduce 0%

13/10/23 11:30:42 INFO mapred.JobClient:  map 100% reduce 0%

13/10/23 11:30:42 INFO mapred.JobClient: Job complete: job_201310231129_0001

13/10/23 11:30:42 INFO mapred.JobClient: Counters: 21

13/10/23 11:30:42 INFO mapred.JobClient:   Job Counters 

13/10/23 11:30:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=39263

13/10/23 11:30:42 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

13/10/23 11:30:42 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

13/10/23 11:30:42 INFO mapred.JobClient:     Launched map tasks=10

13/10/23 11:30:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0

13/10/23 11:30:42 INFO mapred.JobClient:   File Input Format Counters 

13/10/23 11:30:42 INFO mapred.JobClient:     Bytes Read=0

13/10/23 11:30:42 INFO mapred.JobClient:   File Output Format Counters 

13/10/23 11:30:42 INFO mapred.JobClient:     Bytes Written=49284

13/10/23 11:30:42 INFO mapred.JobClient:   org.apache.hadoop.examples.RandomWriter$Counters

13/10/23 11:30:42 INFO mapred.JobClient:     BYTES_WRITTEN=48164

13/10/23 11:30:42 INFO mapred.JobClient:     RECORDS_WRITTEN=10

13/10/23 11:30:42 INFO mapred.JobClient:   FileSystemCounters

13/10/23 11:30:42 INFO mapred.JobClient:     HDFS_BYTES_READ=1190

13/10/23 11:30:42 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=570920

13/10/23 11:30:42 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=49284

13/10/23 11:30:42 INFO mapred.JobClient:   Map-Reduce Framework

13/10/23 11:30:42 INFO mapred.JobClient:     Map input records=10

13/10/23 11:30:42 INFO mapred.JobClient:     Physical memory (bytes) snapshot=597352448

13/10/23 11:30:42 INFO mapred.JobClient:     Spilled Records=0

13/10/23 11:30:42 INFO mapred.JobClient:     CPU time spent (ms)=3930

13/10/23 11:30:42 INFO mapred.JobClient:     Total committed heap usage (bytes)=349700096

13/10/23 11:30:42 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=3946369024

13/10/23 11:30:42 INFO mapred.JobClient:     Map input bytes=0

13/10/23 11:30:42 INFO mapred.JobClient:     Map output records=10

13/10/23 11:30:42 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1190

Job ended: Wed Oct 23 11:30:42 KST 2013

The job took 23 seconds.

10개의 맵이 수행됬으며 23초가 걸린것을 확인할 수 있다. 그리고 reduce는 동작하지 않고 map만 동작한 것을 확인할 수 있다.

hadoop dfs -ls 명령어를 통해 파일이 생성되었는지 확인할 수 있다.
hadoop@hadoop-VirtualBox:/usr/local/hadoop-1.2.1$ bin/hadoop dfs -ls data/unsorted-data
Found 12 items
-rw-r--r--   1 hadoop supergroup          0 2013-10-23 11:30 /user/hadoop/data/unsorted-data/_SUCCESS
drwxr-xr-x   - hadoop supergroup          0 2013-10-23 11:30 /user/hadoop/data/unsorted-data/_logs
-rw-r--r--   1 hadoop supergroup        883 2013-10-23 11:30 /user/hadoop/data/unsorted-data/part-00000
-rw-r--r--   1 hadoop supergroup       5007 2013-10-23 11:30 /user/hadoop/data/unsorted-data/part-00001
-rw-r--r--   1 hadoop supergroup       8689 2013-10-23 11:30 /user/hadoop/data/unsorted-data/part-00002
-rw-r--r--   1 hadoop supergroup       1089 2013-10-23 11:30 /user/hadoop/data/unsorted-data/part-00003
-rw-r--r--   1 hadoop supergroup       1346 2013-10-23 11:30 /user/hadoop/data/unsorted-data/part-00004
-rw-r--r--   1 hadoop supergroup      14021 2013-10-23 11:30 /user/hadoop/data/unsorted-data/part-00005
-rw-r--r--   1 hadoop supergroup       2297 2013-10-23 11:30 /user/hadoop/data/unsorted-data/part-00006
-rw-r--r--   1 hadoop supergroup       2094 2013-10-23 11:30 /user/hadoop/data/unsorted-data/part-00007
-rw-r--r--   1 hadoop supergroup       6844 2013-10-23 11:30 /user/hadoop/data/unsorted-data/part-00008
-rw-r--r--   1 hadoop supergroup       7014 2013-10-23 11:30 /user/hadoop/data/unsorted-data/part-00009
다음의 명령어를 통해 정렬 프로그램을 실행한다

$bin/hadoop jar hadoop-examples-1.2.1.jar sort data/unsorted-data data/sorted-data

실행결과는 다음과 같다

hadoop@hadoop-VirtualBox:/usr/local/hadoop-1.2.1$ bin/hadoop jar hadoop-examples-1.2.1.jar sort data/unsorted-data data/sorted-data
Running on 1 nodes to sort from hdfs://localhost:9000/user/hadoop/data/unsorted-data into hdfs://localhost:9000/user/hadoop/data/sorted-data with 1 reduces.
Job started: Wed Oct 23 11:46:39 KST 2013
13/10/23 11:46:39 INFO mapred.FileInputFormat: Total input paths to process : 10
13/10/23 11:46:39 INFO mapred.JobClient: Running job: job_201310231129_0002
13/10/23 11:46:40 INFO mapred.JobClient:  map 0% reduce 0%
13/10/23 11:46:46 INFO mapred.JobClient:  map 20% reduce 0%
13/10/23 11:46:50 INFO mapred.JobClient:  map 30% reduce 0%
13/10/23 11:46:51 INFO mapred.JobClient:  map 40% reduce 0%
13/10/23 11:46:52 INFO mapred.JobClient:  map 50% reduce 0%
13/10/23 11:46:53 INFO mapred.JobClient:  map 60% reduce 0%
13/10/23 11:46:55 INFO mapred.JobClient:  map 70% reduce 0%
13/10/23 11:46:56 INFO mapred.JobClient:  map 80% reduce 20%
13/10/23 11:46:58 INFO mapred.JobClient:  map 100% reduce 20%
13/10/23 11:47:03 INFO mapred.JobClient:  map 100% reduce 100%
13/10/23 11:47:03 INFO mapred.JobClient: Job complete: job_201310231129_0002
13/10/23 11:47:03 INFO mapred.JobClient: Counters: 30
13/10/23 11:47:03 INFO mapred.JobClient:   Job Counters 
13/10/23 11:47:03 INFO mapred.JobClient:     Launched reduce tasks=1
13/10/23 11:47:03 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=30804
13/10/23 11:47:03 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/10/23 11:47:03 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/10/23 11:47:03 INFO mapred.JobClient:     Launched map tasks=10
13/10/23 11:47:03 INFO mapred.JobClient:     Data-local map tasks=10
13/10/23 11:47:03 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=16025
13/10/23 11:47:03 INFO mapred.JobClient:   File Input Format Counters 
13/10/23 11:47:03 INFO mapred.JobClient:     Bytes Read=49284
13/10/23 11:47:03 INFO mapred.JobClient:   File Output Format Counters 
13/10/23 11:47:03 INFO mapred.JobClient:     Bytes Written=48540
13/10/23 11:47:03 INFO mapred.JobClient:   FileSystemCounters
13/10/23 11:47:03 INFO mapred.JobClient:     FILE_BYTES_READ=48299
13/10/23 11:47:03 INFO mapred.JobClient:     HDFS_BYTES_READ=50444
13/10/23 11:47:03 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=725019
13/10/23 11:47:03 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=48540
13/10/23 11:47:03 INFO mapred.JobClient:   Map-Reduce Framework
13/10/23 11:47:03 INFO mapred.JobClient:     Map output materialized bytes=48353
13/10/23 11:47:03 INFO mapred.JobClient:     Map input records=10
13/10/23 11:47:03 INFO mapred.JobClient:     Reduce shuffle bytes=48353
13/10/23 11:47:03 INFO mapred.JobClient:     Spilled Records=20
13/10/23 11:47:03 INFO mapred.JobClient:     Map output bytes=48244
13/10/23 11:47:03 INFO mapred.JobClient:     Total committed heap usage (bytes)=1454899200
13/10/23 11:47:03 INFO mapred.JobClient:     CPU time spent (ms)=3180
13/10/23 11:47:03 INFO mapred.JobClient:     Map input bytes=48324
13/10/23 11:47:03 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1160
13/10/23 11:47:03 INFO mapred.JobClient:     Combine input records=0
13/10/23 11:47:03 INFO mapred.JobClient:     Reduce input records=10
13/10/23 11:47:03 INFO mapred.JobClient:     Reduce input groups=10
13/10/23 11:47:03 INFO mapred.JobClient:     Combine output records=0
13/10/23 11:47:03 INFO mapred.JobClient:     Physical memory (bytes) snapshot=1634041856
13/10/23 11:47:03 INFO mapred.JobClient:     Reduce output records=10
13/10/23 11:47:03 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=4325195776
13/10/23 11:47:03 INFO mapred.JobClient:     Map output records=10
Job ended: Wed Oct 23 11:47:03 KST 2013
The job took 24 seconds.
다음명령어를 통해 최종 결과를 확인할 수 있다

$bin/hadoop jar hadoop-test-1.2.1.jar testmapredsort -sortInput data/unsorted-data -sortOutput data/sorted-data

hadoop@hadoop-VirtualBox:/usr/local/hadoop-1.2.1$ bin/hadoop jar hadoop-test-1.2.1.jar testmapredsort -sortInput data/unsorted-data -sortOutput data/sorted-data

SortValidator.RecordStatsChecker: Validate sort from hdfs://localhost:9000/user/hadoop/data/unsorted-data (12 files), hdfs://localhost:9000/user/hadoop/data/sorted-data (1 files) into hdfs://localhost:9000/tmp/sortvalidate/recordstatschecker with 1 reducer.
Job started: Wed Oct 23 12:52:05 KST 2013
13/10/23 12:52:06 INFO mapred.FileInputFormat: Total input paths to process : 11
13/10/23 12:52:06 INFO mapred.JobClient: Running job: job_201310231129_0003
13/10/23 12:52:07 INFO mapred.JobClient:  map 0% reduce 0%
13/10/23 12:52:12 INFO mapred.JobClient:  map 18% reduce 0%
13/10/23 12:52:16 INFO mapred.JobClient:  map 36% reduce 0%
13/10/23 12:52:19 INFO mapred.JobClient:  map 54% reduce 0%
13/10/23 12:52:21 INFO mapred.JobClient:  map 63% reduce 0%
13/10/23 12:52:22 INFO mapred.JobClient:  map 72% reduce 18%
13/10/23 12:52:24 INFO mapred.JobClient:  map 81% reduce 18%
13/10/23 12:52:25 INFO mapred.JobClient:  map 90% reduce 18%
13/10/23 12:52:26 INFO mapred.JobClient:  map 100% reduce 18%
13/10/23 12:52:31 INFO mapred.JobClient:  map 100% reduce 30%
13/10/23 12:52:33 INFO mapred.JobClient:  map 100% reduce 100%
13/10/23 12:52:34 INFO mapred.JobClient: Job complete: job_201310231129_0003
13/10/23 12:52:34 INFO mapred.JobClient: Counters: 30
13/10/23 12:52:34 INFO mapred.JobClient:   Job Counters 
13/10/23 12:52:34 INFO mapred.JobClient:     Launched reduce tasks=1
13/10/23 12:52:34 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=32749
13/10/23 12:52:34 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
13/10/23 12:52:34 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
13/10/23 12:52:34 INFO mapred.JobClient:     Launched map tasks=11
13/10/23 12:52:34 INFO mapred.JobClient:     Data-local map tasks=11
13/10/23 12:52:34 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=20592
13/10/23 12:52:34 INFO mapred.JobClient:   File Input Format Counters 
13/10/23 12:52:34 INFO mapred.JobClient:     Bytes Read=97824
13/10/23 12:52:34 INFO mapred.JobClient:   File Output Format Counters 
13/10/23 12:52:34 INFO mapred.JobClient:     Bytes Written=179
13/10/23 12:52:34 INFO mapred.JobClient:   FileSystemCounters
13/10/23 12:52:34 INFO mapred.JobClient:     FILE_BYTES_READ=171
13/10/23 12:52:34 INFO mapred.JobClient:     HDFS_BYTES_READ=99098
13/10/23 12:52:34 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=693753
13/10/23 12:52:34 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=179
13/10/23 12:52:34 INFO mapred.JobClient:   Map-Reduce Framework
13/10/23 12:52:34 INFO mapred.JobClient:     Map output materialized bytes=231
13/10/23 12:52:34 INFO mapred.JobClient:     Map input records=20
13/10/23 12:52:34 INFO mapred.JobClient:     Reduce shuffle bytes=231
13/10/23 12:52:34 INFO mapred.JobClient:     Spilled Records=22
13/10/23 12:52:34 INFO mapred.JobClient:     Map output bytes=260
13/10/23 12:52:34 INFO mapred.JobClient:     Total committed heap usage (bytes)=1636040704
13/10/23 12:52:34 INFO mapred.JobClient:     CPU time spent (ms)=3350
13/10/23 12:52:34 INFO mapred.JobClient:     Map input bytes=96768
13/10/23 12:52:34 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1274
13/10/23 12:52:34 INFO mapred.JobClient:     Combine input records=20
13/10/23 12:52:34 INFO mapred.JobClient:     Reduce input records=11
13/10/23 12:52:34 INFO mapred.JobClient:     Reduce input groups=2
13/10/23 12:52:34 INFO mapred.JobClient:     Combine output records=11
13/10/23 12:52:34 INFO mapred.JobClient:     Physical memory (bytes) snapshot=1821650944
13/10/23 12:52:34 INFO mapred.JobClient:     Reduce output records=2
13/10/23 12:52:34 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=4718657536
13/10/23 12:52:34 INFO mapred.JobClient:     Map output records=20
Job ended: Wed Oct 23 12:52:34 KST 2013
The job took 28 seconds.

SUCCESS! Validated the MapReduce framework's 'sort' successfully.

수행결과가 올바르게 된 것을 확인 할 수 있다.



댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2024/05   »
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
글 보관함