Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

This is the first of a three-part series on showing alternative Hadoop tools being utilized for Open Georgia Analysis.  The data we are working against looks like the following which is an include of the Format & Sample Data for Open Georgia wiki page.


The following describes the format of the dataset used for Open Georgia Analysis and was created by the process described in Preparing Open Georgia Test Data.

NAME (String)TITLE (String)SALARY (float)TRAVEL (float)ORG TYPE (String)ORG (String)YEAR (int)
ABBOTT,DEEDEE WGRADES 9-12 TEACHER52,122.100.00LBOEATLANTA INDEPENDENT SCHOOL SYSTEM2010
ALLEN,ANNETTE DSPEECH-LANGUAGE PATHOLOGIST92,937.28260.42LBOEATLANTA INDEPENDENT SCHOOL SYSTEM2010
BAHR,SHERREEN TGRADE 5 TEACHER52,752.710.00LBOECOBB COUNTY SCHOOL DISTRICT2010
BAILEY,ANTOINETTE RSCHOOL SECRETARY/CLERK19,905.900.00LBOECOBB COUNTY SCHOOL DISTRICT2010
BAILEY,ASHLEY NEARLY INTERVENTION PRIMARY TEACHER43,992.82120.00LBOECOBB COUNTY SCHOOL DISTRICT2010
CALVERT,RONALD MARTINSTATE PATROL (SP)51,370.4062.00SABACPUBLIC SAFETY, DEPARTMENT OF2010
CAMERON,MICHAEL DPUBLIC SAFETY TRN (AL)34,748.60259.35SABACPUBLIC SAFETY, DEPARTMENT OF2010
DAAS,TARWYN TARAGRADES 9-12 TEACHER41,614.500.00LBOEFULTON COUNTY BOARD OF EDUCATION2011
DABBS,SANDRA LGRADES 9-12 TEACHER79,801.5941.00LBOEFULTON COUNTY BOARD OF EDUCATION2011
E'LOM,SOPHIA LIS PERSONNEL - GENERAL ADMIN75,509.00613.73LBOEFULTON COUNTY BOARD OF EDUCATION2012
EADDY,FENNER RSUBSTITUTE13,469.000.00LBOEFULTON COUNTY BOARD OF EDUCATION2012
EADY,ARNETTA AASSISTANT PRINCIPAL71,879.00319.60LBOEFULTON COUNTY BOARD OF EDUCATION2012


In this first installment, let's jump right in where Hadoop began; MapReduce.  After you visit Preparing Open Georgia Test Data and get some test data loaded into HDFS, then you'll want to clone my GitHub repo as referenced in GitHub > lestermartin > hadoop-exploration.  Once you have the code up in your favorite IDE (mine is IntelliJ on my MBPro) then you'll want to hone in on the lestermartin.hadoop.exploration.opengeorgia package (details on the major MapReduce stereotypes in that last link).  You can then build the jar file with Maven.

As with all three of this blog posting series, let's use the Hortonworks Sandbox to run everything.  Make sure the hue user has a folder to put your jar in and then put it there.

HW10653:target lmartin$ ssh root@127.0.0.1 -p 2222
root@127.0.0.1's password: 
Last login: Tue Apr 29 16:48:05 2014 from 10.0.2.2
[root@sandbox ~]# su hue
[hue@sandbox root]$ cd ~
[hue@sandbox ~]$ mkdir jars
[hue@sandbox ~]$ exit
exit
[root@sandbox ~]# exit
logout
Connection to 127.0.0.1 closed.
HW10653:target lmartin$ ls 
classes                    maven-archiver
generated-sources            surefire-reports
generated-test-sources            test-classes
hadoop-exploration-0.0.1-SNAPSHOT.jar
HW10653:target lmartin$ scp -P 2222 hadoop-exploration-0.0.1-SNAPSHOT.jar root@127.0.0.1:/usr/lib/hue/jars
root@127.0.0.1's password: 
hadoop-exploration-0.0.1-SNAPSHOT.jar         100%   22KB  22.2KB/s   00:00    
HW10653:target lmartin$ ssh root@127.0.0.1 -p 2222
root@127.0.0.1's password: 
Last login: Tue Apr 29 17:48:35 2014 from 10.0.2.2
[root@sandbox ~]# su hue
[hue@sandbox root]$ cd ~/jars
[hue@sandbox jars]$ ls -l
total 24
-rw-r--r-- 1 root root 22678 Apr 29 18:49 hadoop-exploration-0.0.1-SNAPSHOT.jar

Now go ahead and kick it off.

[hue@sandbox jars]$ hdfs dfs -ls /user/hue/opengeorgia
Found 1 items
-rwxr-xr-x   3 hue hue    7612715 2014-04-29 16:53 /user/hue/opengeorgia/salaryTravelReport.csv
[hue@sandbox jars]$ hadoop jar hadoop-exploration-0.0.1-SNAPSHOT.jar lestermartin.hadoop.exploration.opengeorgia.GenerateStatistics opengeorgia/salaryTravelReport.csv mroutput

   ... MANY LINES REMOVED ...

XXXXXXxXXXXXXXXX

 

 

 

 

  • No labels