lestermartin.hadoop.exploration.opengeorgia

Wiki page associated with the Open Georgia Analysis effort. Specifically for Java code found in the following locations on GitHub > lestermartin > hadoop-exploration.

hadoop-exploration / src / main / java / lestermartin / hadoop / exploration / opengeorgia
hadoop-exploration / src / test / java / lestermartin / hadoop / exploration / opengeorgia

This code solves the Simple Open Georgia Use Case using the following classes.

The Mapper

The TitleMapper class first takes each row of CSV data (see Format & Sample Data for Open Georgia for more details) that it is passed during invocation of the map() method and constructs a SalaryReport object using the crude & primitive parsing logic of SalaryReportBuilder.

Then it simply bails out if it doesn't meet the basic Simple Open Georgia Use Case criteria. If it does get past this initial filtering, then it emits a KVP of the job title and the salary value that goes along with it.

The Reducer

SalaryStatisticsReducer simply calculates the total number of people for the given job title along with the minimum/maximum/average statistics.

The Driver

GenerateStatistics pulls it all together so the MapReduce job can be run.

The use mapreduce to calculate salary statistics for georgia educators (first of a three-part series) pulls it all together. Some alternatives to determine these same results are presented in use pig to calculate salary statistics for georgia educators (second of a three-part series) and use hive to calculate salary statistics for georgia educators (third of a three-part series).