Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

If the intention was to build an automated "scraping" application to retrieve all the various information available on Open Georgia, then a rigorous, production-ready, ELT (not ETL) solution would need to be developed, tested, and deployed to ensure all data was presented and that it met a high degree of data quality.  For the Open Georgia Analysis effort, we just want to focus in on the salary data presented for teachers and staff across several local boards of education.

...

Quite obviously these 76,943 (very well-formed) records do not meet Gartner's 3Vs definition of "big data", but they do allow us to exercise the Hadoop & Big Data tooling to perform some level of data analysis.  I also did the following little bit of manual pre-processing (that would normally happen in the ELT space) prior to concatenating these six files together.

...