Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Current »

Wiki page associated with the Open Georgia Analysis effort.  Specifically for Java code found in the following locations on GitHub > lestermartin > hadoop-exploration.

This code solves the Simple Open Georgia Use Case using the following classes.

The Mapper

The TitleMapper class first takes each row of CSV data (see Format & Sample Data for Open Georgia for more details) that it is passed during invocation of the map() method and constructs a SalaryReport object using the crude & primitive parsing logic of SalaryReportBuilder.

Then it simply bails out if it doesn't meet the basic Simple Open Georgia Use Case criteria.  If it does get past this initial filtering, then it emits a KVP of the job title and the salary value that goes along with it.

The Reducer

SalaryStatisticsReducer simply calculates the total number of people for the given job title along with the minimum/maximum/average statistics.

The Driver

GenerateStatistics pulls it all together so the MapReduce job can be run.


The use mapreduce to calculate salary statistics for georgia educators (first of a three-part series) pulls it all together.  Some alternatives to determine these same results are presented in use pig to calculate salary statistics for georgia educators (second of a three-part series) and use hive to calculate salary statistics for georgia educators (third of a three-part series).

  • No labels