WORK IN PROGRESS!!!!
The Apache Pig project's User Defined Functions gives a pretty good overview of how to create a UDF. In fact, I stole my simple UDF from there. For Pig UDF's the obligitory "Hello World" program is actually a "Convert to Upper Case" function. For this effort, I'm using the Hortonworks Sandbox (version 2.0). Once you have that setup operational, follow along and we'll get your first UDF created and placed on HDFS where others can easily share it.
...
Code Block | ||
---|---|---|
| ||
hw10653:~ lmartin$ ssh root@127.0.0.1 -p 2222
root@127.0.0.1's password:
Last login: Fri Mar 28 21:34:47 2014 from 10.0.2.2
[root@sandbox ~]# su hue
[hue@sandbox root]$ cd ~
[hue@sandbox ~]$ mkdir exampleudf
[hue@sandbox ~]$ cd exampleudf/
[hue@sandbox exampleudf]$ vi UPPER.java |
...
As you are starting to see, the goal is to create a SIMPLE User-Defined Function. This will get give you a strawman , but you'll have to add that you can build your own slick new function on top of. That, or pay some decent Java Hadoop programmer to do it for you – heck, I'm not allergic to a little moonlighting.
Then just compile the class and jar it up (your jdk and pig version numbers might vary slightly). If you have trouble compiling/jaring it, or don't even want to try, then just download exampleudf.jar directly and load it into the directory described further down in the post.
...
Now that we've got it created let's share it. The best way to make it accessible to everyone is to put the jar file on HDFS itself. Since we are using the Sandbox, we could just use Hue, but everything is always more fun at the command line.
Code Block | ||
---|---|---|
| ||
[hue@sandbox ~]$ hadoop fs -mkdir shared [hue@sandbox ~]$ hadoop fs -mkdir shared/pig [hue@sandbox ~]$ hadoop fs -mkdir shared/pig/udfs [hue@sandbox ~]$ ls -l *.jar -rw-rw-r-- 1 hue hue 1534 Mar 29 00:54 exampleudf.jar [hue@sandbox ~]$ hadoop fs -put exampleudf.jar shared/pig/udfs/exampleudf.jar [hue@sandbox ~]$ hadoop fs -ls /user/hue/shared/pig/udfs Found 1 items -rw-r--r-- 3 hue hue 1534 2014-03-29 00:59 /user/hue/shared/pig/udfs/exampleudf.jar |
...
Info |
---|
For this compiled UDF library to be accessible then for everyone, the jar file needs to have its HDFS permissions set to allow read rights to for all users. |
Now, create a file (example: typingText.txt) with some random text such and get it into HDFS as shown below.
...
Code Block | ||||
---|---|---|---|---|
| ||||
REGISTER 'hdfs:///user/hue/shared/pig/udfs/exampleudf.jar'; DEFINE SIMPLEUPPER exampleudf.UPPER(); typing_line = LOAD '/user/hue/testData/typingText.txt' AS (row:chararray); upper_typing_line = FOREACH typing_line GENERATE SIMPLEUPPER(row); DUMP upper_typing_line; |
The logical think thing would be to use the Pig UI component of Hue to run this super simple function, but I simply cannot figure out why it complains with the following error each time.
Code Block | ||
---|---|---|
| ||
2014-03-29 01:15:19,712 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. Pathname /tmp/udfs/'hdfs:/user/hue/shared/pig/udfs/exampleudf.jar' from hdfs://sandbox.hortonworks.com:8020/tmp/udfs/'hdfs:/user/hue/shared/pig/udfs/exampleudf.jar' is not a valid DFS filename. |
...
Code Block | ||
---|---|---|
| ||
[hue@sandbox ~]$ pig test-UPPER.pig 2014-03-29 01:20:40,579 [main] INFO org.apache.pig.Main - Apache Pig version 0.12.0.2.0.6.0-76 (rexported) compiled Oct 17 2013, 20:44:07 2014-03-29 01:20:40,580 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/lib/hue/pig_1396081240577.log ... LOTS of lines removed ... 2014-03-29 01:21:12,501 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2014-03-29 01:21:12,502 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 (NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR COUNTRY.) ( NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR COUNTRY) (. NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR COUNTR) (Y. NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR COUNT) (RY. NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR COUN) (TRY. NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR COU) (NTRY. NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR CO) (UNTRY. NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR C) (OUNTRY. NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR ) (COUNTRY. NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR) |
It worked! You did it!! Everything has been CAPITALIZED!!! Awesome!Congratulations!!!!