My landing page for "all things Hadoop", Big Data, and related technologies. The content is rather unstructured right now, but I'll get there. Take a look at David Streever's Hadoop space.
NOTE: The remainder of this document has no real meaningful structure and is as much a parking lot of ideas and links that I will SOMEDAY come back to apply some structure to. Thanks, Lester Martin. |
Figure out what the best practice is. Some notes at http://stackoverflow.com/questions/16825821/parsing-json-input-in-hadoop-java to get this topic going.
Based on thoughts from http://blog.cloudera.com/blog/2011/09/snappy-and-hadoop/ and http://stackoverflow.com/questions/5377118/how-to-convert-txt-file-to-hadoops-sequence-file-format write a simple utility that converts text files to sequence files and compresses them with Snappy. Or... am I overthinking this and there is a far easier way to do this?
Of course... want, no NEED, to build a Hadoop cluster with Raspberry PI devices as seen in the following urls:
Maybe could do it with Java on the BeagleBoard? Maybe just post a very straight forward post like http://java.dzone.com/articles/getting-hadoop-and-running.
How about a project using the BLS OES datasets?
Investigate MongoDB Connector for Hadoop as called out at http://www.mongodb.com/press/integration-hadoop-and-mongodb-big-data%E2%80%99s-two-most-popular-technologies-gets-significant.