Oozie

Oozie is a workflow coordination service to manage data processing jobs on your cluster. Oozie is essentially a task scheduler that allows advanced workflows.

Configure Oozie

Install Oozie

Cloudera Manager distributes Oozie in CDH and offers the following services:

  • Oozie Server – Install an Oozie Server on a host that is not used by an HBase Master or an HDFS NameNode as Oozie may use a high amount of memory. Only one Oozie Server is required.

Oozie Configuration

Configuration Description Value Calculation
Oozie Server Data Directory Directory where the Oozie Server places its data. Only applicable when using Derby as the database type. /space1/oozie/data Not on root.

Configure Oozie’s Whitelist

You will receive the following error if the whitelist has not been configured: ~/oozie/examples/apps/no-op$ oozie job -oozie http://oozie.servername01:11000/oozie -config job.properties -dryrun

Error: E0901 : E0901: Namenode [servername01:8020] not allowed, not in Oozies whitelist

In Cloudera Manager, browse to Oozie’s Service:

Oozie Server Default Group > Advanced > Oozie Server Configuration Safety Valve for oozie-site.xml

<property>
   <name>oozie.service.HadoopAccessorService.jobTracker.whitelist</name>
   <value>servername01:8021,servername02:8021</value>
   <description>Whitelisted job tracker for Oozie service.</description>
</property>
<property>
   <name>oozie.service.HadoopAccessorService.nameNode.whitelist</name>
   <value>servername01:8020,servername02:8020</value>
   <description>Whitelisted NameNode for Oozie service.</description>
</property>

 

Note: Leave the <value> blank to accept all servers (or for YARN).

Logs are available at: /var/log/oozie on the Oozie server.

Test Oozie

Load the examples and shared libraries.

  1. Create an oozie sub directory in your home folder >>mkdir oozie
  2. Copy /usr/lib/oozie/oozie-sharelib.tar.gz to your oozie directory >>cp /usr/lib/oozie/oozie-sharelib.tar.gz ~/oozie
  3. Tar the .gz file >>tar xvfz oozie-sharelib.tar.gz
  4. From your oozie folder make a sub directory called examples >>mk examples
  5. copy /usr/shared/doc/oozie/oozie-examples.tar.gz to your oozie folder. >>cp /usr/shared/doc/oozie/oozie-examples.tar.gz ~/oozie
  6. tar the .gz file >>tar xvfz oozie-examples.tar.gz

So now under your home folder you should have an oozie folder and in oozie and examples folder. Now we need to copy this data to hdfs

  1. make an oozie directory in hdfs. >>sudo -u hdfs hadoop fs -mkdir /user/oozie
  2. make an hdfs directory. >>sudo -u hdfs hadoop fs -mkdir /user/hdfs
  3. copy oozie files >>sudo -u hdfs hadoop fs -put share /user/oozie
  4. copy example files. >>sudo -u hdfs hadoop fs -put examples /user/hdfs

Run the examples:

sudo -u hdfs oozie job -oozie http://localhost:11000/oozie -config examples/apps/no-op/job.properties -run

sudo -u hdfs oozie job -oozie http://localhost:11000/oozie -config examples/apps/java-main/job.properties -run

sudo -u hdfs oozie job -oozie http://localhost:11000/oozie -config examples/apps/pig/job.properties -run

For each of the runs above you can check its status with the following (you will need to replace the id with the Id each run gives you):

sudo -u hdfs oozie job -oozie http://localhost:11000/oozie -info 0000002-130228105804751-oozie-oozi-W

Oozie Commands

Get all running jobs:

oozie jobs -oozie http://oozie.servername01:11000/oozie | grep RUNNING

Or if the job is a coordinator, not a workflow:

oozie jobs -jobtype coordinator -oozie http://oozie.servername01:11000/oozie

Get the jobids of the requested jobs then kill the job:

oozie job -oozie http://oozie.servername01:11000/oozie -kill jobid

Submit job (readies a job to be run):

oozie job -oozie http://oozie.servername01:11000/oozie -config oozieProject/workflowHdfsAndEmailActions/job.properties -submit job: 0000001-130712212133144-oozie-oozi-W

Run job:

oozie job -oozie http://oozie.servername01:11000/oozie -start 0000001-130712212133144-oozie-oozi-W

Check the status:

oozie job -oozie http://oozie.servername01:11000/oozie -info 0000001-130712212133144-oozie-oozi-W

Troubleshooting

Oozie Web Console is Disabled: To Enable Oozie Web Console Install the Ext JS Library

Download the ext-2.2 Oozie Ext JS library.

Copy the folder here: /var/lib/oozie/

Set permissions on the folder: sudo chmod -R 755 /var/lib/oozie/ext-2.2/

Then you can browse to http://oozie.servername01:11000/oozie/

Oozie: Cannot Create Oozie Workflow: Not able to cache shareLib

Stop Oozie Server, from the Actions menu, choose Install Oozie ShareLib.

Run the following command to test the sharedlib:

oozie admin -oozie http://oozie.servername01:11000/oozie -shareliblist pig

You should see the following:

[Available ShareLib]

pig

hdfs://oozie.servername01:8020/user/oozie/share/lib/lib_20140627103340/pig/ant-1.6.5.jar

hdfs://oozie.servername01:8020/user/oozie/share/lib/lib_20140627103340/pig/antlr-2.7.7.jar

Resolution: It turns out that this was a problem with Hue, see Hue’s problems for more details. I left this in the troubleshooting section because this is useful information about Oozie.

Log: vi /var/log/oozie/oozie-cmf-oozie5-OOZIE_SERVER-oozie.servername01.log.out

2014-06-27 10:04:29,064 ERROR org.apache.oozie.service.ShareLibService: SERVER[oozie.servername01] USER[-] GROUP[-] Not able to cache shareLib. Admin need to issue oozlie cli command to update sharelib.

java.lang.NullPointerException

at org.apache.oozie.service.ShareLibService.init(ShareLibService.java:108)

at org.apache.oozie.service.Services.setServiceInternal(Services.java:368)

at org.apache.oozie.service.Services.setService(Services.java:354)