Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Wiki page associated with the Open Georgia Analysis effort.  Specifically for Java code found in the following locations on GitHub > lestermartin > hadoop-exploration.

...

Tip

This content has moved to https://github.com/lestermartin/hadoop-exploration/tree/master/src/main/java/lestermartin/hadoop/exploration/opengeorgia

...

This code solves the Simple Open Georgia Use Case using the following classes.

The Mapper

The TitleMapper class first takes each row of CSV data (see Format & Sample Data for Open Georgia for more details) that it is passed during invocation of the map() method and constructs a SalaryReport object using the crude & primitive parsing logic of SalaryReportBuilder.

Then it simply bails out if it doesn't meet the basic Simple Open Georgia Use Case criteria.  If it does get past this initial filtering, then it emits a KVP of the job title and the salary value that goes along with it.

The Reducer

SalaryStatisticsReducer simply calculates the total number of people for the given job title along with the minimum/maximum/average statistics.

The Driver

GenerateStatistics pulls it all together so the MapReduce job can be run.

...