Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

WORK IN PROGRESS!!!!

 Image Added

The Apache Pig project's User Defined Functions gives a pretty good overview of how to create a UDF.  In fact, I stole my simple UDF from there.  For Pig UDF's the obligitory "Hello World" program is actually a "Convert to Upper Case" function.  For this effort, I'm using the Hortonworks Sandbox (version 2.0).  Once you have that setup operational, follow along and we'll get your first UDF created and placed on HDFS where others can easily share it. 

...

Code Block
languagebash
hw10653:~ lmartin$ ssh root@127.0.0.1 -p 2222
root@127.0.0.1's password: 
Last login: Fri Mar 28 21:34:47 2014 from 10.0.2.2
[root@sandbox ~]# su hue
[hue@sandbox root]$ cd ~
[hue@sandbox ~]$ mkdir exampleudf
[hue@sandbox ~]$ cd exampleudf/
[hue@sandbox exampleudf]$ vi UPPER.java

...

As you are starting to see, the goal is to create a SIMPLE User-Defined Function.  This will get give you a strawman , but you'll have to add that you can build your own slick new function on top of.  That, or pay some decent Java Hadoop programmer to do it for you – heck, I'm not allergic to a little moonlighting.  (wink)

Then just compile the class and jar it up (your jdk and pig version numbers might vary slightly).  If you have trouble compiling/jaring it, or don't even want to try, then just download exampleudf.jar directly and load it into the directory described further down in the post.

...

Now that we've got it created let's share it.  The best way to make it accessible to everyone is to put the jar file on HDFS itself.  Since we are using the Sandbox, we could just use Hue, but everything is always more fun at the command line

Code Block
languagebash
[hue@sandbox ~]$ hadoop fs -mkdir shared
[hue@sandbox ~]$ hadoop fs -mkdir shared/pig
[hue@sandbox ~]$ hadoop fs -mkdir shared/pig/udfs
[hue@sandbox ~]$ ls -l *.jar
-rw-rw-r-- 1 hue hue 1534 Mar 29 00:54 exampleudf.jar
[hue@sandbox ~]$ hadoop fs -put exampleudf.jar shared/pig/udfs/exampleudf.jar
[hue@sandbox ~]$ hadoop fs -ls /user/hue/shared/pig/udfs
Found 1 items
-rw-r--r--   3 hue hue       1534 2014-03-29 00:59 /user/hue/shared/pig/udfs/exampleudf.jar

...

Info

For this compiled UDF library to be accessible then for everyone, the jar file needs to have its HDFS permissions set to allow read rights to for all users.

Now, create a file (example: typingText.txt) with some random text such and get it into HDFS as shown below.

...

Code Block
languagetext
titletest-UPPER.pig
REGISTER 'hdfs:///user/hue/shared/pig/udfs/exampleudf.jar';
DEFINE SIMPLEUPPER exampleudf.UPPER();

typing_line = LOAD '/user/hue/testData/typingText.txt' AS (row:chararray);

upper_typing_line = FOREACH typing_line GENERATE SIMPLEUPPER(row);

DUMP upper_typing_line;

The logical think thing would be to use the Pig UI component of Hue to run this super simple function, but I simply cannot figure out why it complains with the following error each time.

Code Block
languagetext
2014-03-29 01:15:19,712 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. Pathname /tmp/udfs/'hdfs:/user/hue/shared/pig/udfs/exampleudf.jar' from hdfs://sandbox.hortonworks.com:8020/tmp/udfs/'hdfs:/user/hue/shared/pig/udfs/exampleudf.jar' is not a valid DFS filename.

...

Code Block
languagebash
[hue@sandbox ~]$ pig test-UPPER.pig
2014-03-29 01:20:40,579 [main] INFO  org.apache.pig.Main - Apache Pig version 0.12.0.2.0.6.0-76 (rexported) compiled Oct 17 2013, 20:44:07
2014-03-29 01:20:40,580 [main] INFO  org.apache.pig.Main - Logging error messages to: /usr/lib/hue/pig_1396081240577.log

... LOTS of lines removed ...

2014-03-29 01:21:12,501 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2014-03-29 01:21:12,502 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR COUNTRY.)
( NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR COUNTRY)
(. NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR COUNTR)
(Y. NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR COUNT)
(RY. NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR COUN)
(TRY. NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR COU)
(NTRY. NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR CO)
(UNTRY. NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR C)
(OUNTRY. NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR )
(COUNTRY. NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THEIR)

It worked!  You did it!!  Everything has been CAPITALIZED!!!  Awesome!Congratulations!!!!