Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »

This is the parent page for (exploratory-oriented) analysis efforts done on the Open Georgia (Transparency in Government) website; http://www.open.georgia.gov/.

As the above screen capture indicates, Open Georgia provides public records regarding how/where money is spent in the State.  While there are many avenues to explore on the site, this analysis effort hones in on the salaries/expenses data presented; especially on the local boards of education.  Preparing Open Georgia Test Data walks you through the download/preparation process that can be visualized in the Format & Sample Data for Open Georgia.

For this analysis effort, the Hortonworks Sandbox was utilized.  Specifically, version 2.0 was utilized, but every effort, but testing was not done, to ensure the code will run on the 1.3 and 2.1 versions as well.

To get started, just load the salaryTravelReport.csv file that you create (or simply download) from the instructions in Preparing Open Georgia Test Data into HDFS itself.  For the examples provided, log into the Sandbox's Hue UI as the user hue and from the File Browser create an opengeorgia folder within /user/hue and then upload the file.  You should see something similar to the following once that is done.

Now, we've got some data and are ready to answer the following question as a simple, figurative, example of what kinds of analysis could be done.

For all Open Georgia Salary/Travel data loaded in HDFS that is aligned with Fiscal Year 2010 and Organization Type of Local Boards of Education, produce a distinct list of all Job Titles along with the total number of employees aligned with each Job Title & the minimum/maximum/average salaries for each of the identified Job Titles.

As for which tools to use, the following list of blog entries (if they are not linked, they are coming soon) presents varying tool options to address questions such as this.

  • use mapreduce to calculate salary statistics for georgia educators (first of a three-part series)
  • use pig to calculate salary statistics for georgia educators (second of a three-part series)
  • use hive to calculate salary statistics for georgia educators (third of a three-part series)

 

  • No labels

0 Comments

You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.