...
Code Block | ||
---|---|---|
| ||
[hue@sandbox jars]$ hdfs dfs -ls /user/hue/opengeorgia Found 1 items -rwxr-xr-x 3 hue hue 7612715 2014-04-29 16:53 /user/hue/opengeorgia/salaryTravelReport.csv [hue@sandbox jars]$ hadoop jar hadoop-exploration-0.0.1-SNAPSHOT.jar lestermartin.hadoop.exploration.opengeorgia.GenerateStatistics opengeorgia/salaryTravelReport.csv opengeorgia/mroutput ... MANY LINES REMOVED ... 14/04/29 19:29:42 INFO input.FileInputFormat: Total input paths to process : 1 14/04/29 19:29:42 INFO mapreduce.JobSubmitter: number of splits:1 ... MANY LINES REMOVED ... 14/04/29 19:29:43 INFO mapreduce.Job: Running job: job_1398691536449_0080 14/04/29 19:29:50 INFO mapreduce.Job: Job job_1398691536449_0080 running in uber mode : false 14/04/29 19:29:50 INFO mapreduce.Job: map 0% reduce 0% 14/04/29 19:29:58 INFO mapreduce.Job: map 100% reduce 0% 14/04/29 19:30:05 INFO mapreduce.Job: map 100% reduce 100% 14/04/29 19:30:05 INFO mapreduce.Job: Job job_1398691536449_0080 completed successfully 14/04/29 19:30:05 INFO mapreduce. XXXXXXxXXXXXXXXX |
...
Job: Counters: 43
File System Counters
FILE: Number of bytes read=1279390
FILE: Number of bytes written=2726197
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=7612865
HDFS: Number of bytes written=13583
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=48256
Total time spent by all reduces in occupied slots (ms)=34216
Map-Reduce Framework
Map input records=76943
Map output records=44986
Map output bytes=1189412
Map output materialized bytes=1279390
Input split bytes=144
Combine input records=0
Combine output records=0
Reduce input groups=181
Reduce shuffle bytes=1279390
Reduce input records=44986
Reduce output records=181
Spilled Records=89972
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=146
CPU time spent (ms)=5440
Physical memory (bytes) snapshot=598822912
Virtual memory (bytes) snapshot=2392014848
Total committed heap usage (bytes)=507117568
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=7612721
File Output Format Counters
Bytes Written=13583 |
Here's a few records from the output (not annotating the "... MANY LINES REMOVED ...
" as seen above).
Code Block | ||
---|---|---|
| ||
[hue@sandbox jars]$ hdfs dfs -cat opengeorgia/mroutput/part-r-00000
ASSISTANT PRINCIPAL {436,3418.530029296875,119646.3125,76514.0635219189}
AUDIOLOGIST {9,36329.58984375,102240.4609375,73038.71267361111}
BUS DRIVER {1321,289.9800109863281,59479.87890625,21016.95711573356}
CROSSING GUARD {100,188.1699981689453,77890.5703125,4792.818007659912}
CUSTODIAL PERSONNEL {1393,358.0199890136719,83233.4765625,27299.229113730096}
DEPUTY/ASSOC/ASSISTANT SUPT {36,15089.6298828125,208606.71875,121005.71796332466}
ELEMENTARY COUNSELOR {265,1472.8499755859375,96220.078125,58518.51150455115}
ESOL TEACHER {450,1595.3599853515625,92835.65625,51652.298943684895}
GRADE 1 TEACHER {1041,1912.280029296875,103549.28125,49438.147760777836}
GRADE 10 TEACHER {59,3861.159912109375,94732.7421875,55489.47224659031}
GRADE 11 TEACHER {27,5537.2998046875,101728.1875,58273.76153790509}
GRADE 12 TEACHER {13,40919.26171875,92376.703125,66923.12620192308}
GRADE 2 TEACHER {1010,1843.0999755859375,95968.078125,50479.97351545579}
GRADE 3 TEACHER {1036,730.75,102263.2890625,50409.13624861433}
GRADE 4 TEACHER {873,2955.43994140625,96430.078125,52342.17116019652}
GRADE 5 TEACHER {872,1400.0,104698.0,52721.27942734465}
GRADE 6 TEACHER {239,1665.0799560546875,85595.921875,48597.56130636686}
GRADE 7 TEACHER {257,1615.260009765625,90778.078125,50304.22050220772}
GRADE 8 TEACHER {240,1746.1600341796875,85965.3828125,51121.745357767744}
GRADE 9 TEACHER {35,4128.5,90588.578125,55027.259151785714}
GRADES 6-8 TEACHER {1607,-909.8400268554688,91402.0390625,49418.24870481651}
GRADES 9-12 TEACHER {3171,200.0,119430.15625,51375.531383229296}
GRADES K-5 TEACHER {165,150.0,87925.921875,44650.884348366475}
GRADUATION SPECIALIST {63,6351.25,91945.0625,58873.631510416664}
HIGH SCHOOL COUNSELOR {225,1100.0,111393.84375,63814.18197102864}
KINDERGARTEN TEACHER {1054,1615.4100341796875,103798.0,52818.983106001506}
LIBRARIAN/MEDIA SPECIALIST {342,3208.25,97282.1875,58324.767315423975}
MIDDLE SCHOOL COUNSELOR {131,3362.93994140625,99340.078125,61327.238445252864}
MILITARY SCIENCE TEACHER {98,2328.9599609375,100116.0,62636.252752810105}
PRINCIPAL {318,2202.22998046875,159299.515625,102604.1484375}
SUBSTITUTE TEACHER {3816,-1006.5,77007.859375,8846.330627846432}
SUPERINTENDENT {3,216697.15625,411545.8125,299117.1979166667}
TEACHER SUPPORT SPECIALIST {204,2409.64990234375,96133.21875,62175.75722608379}
TECHNICAL INSTITUTE PRESIDENT {1,96884.2421875,96884.2421875,96884.2421875} |
Again, many lines were removed as I just wanted an illustrative example. Being married to a Georgia educator myself, I probably am looking at these numbers and making more observations than most. For example, do we really have a Kindergarten Teacher making over $100K/year? Really?? Heck, even the highest paid Military Science Teacher is pulling in six-figures (he probably wasn't doing that on active duty!!). I also feel for the poor Substitue Teacher that was in the hole over $1000. I can say with certainty that the average pay for Principals of $102K/year surely isn't enough as that's a job with a TON of responsibilities.
Nonetheless, the goal was see if we could answer the Simple Open Georgia Use Case question which we did. The next installments will be doing the same thing, but with Pig and then Hive. Let's make sure we check to see that the average salary for the 9 Audiologists is $73,038.71 and the highest paid 3rd Grade Teacher is $102,263.29 when we perform this analysis again.