Oozie is a workflow coordination service to manage data processing jobs on your cluster. Oozie is essentially a task scheduler that allows advanced workflows.
Configure Oozie
Install Oozie
Cloudera Manager distributes Oozie in CDH and offers the following services:
- Oozie Server – Install an Oozie Server on a host that is not used by an HBase Master or an HDFS NameNode as Oozie may use a high amount of memory. Only one Oozie Server is required.
Oozie Configuration
Configure Oozie’s Whitelist
You will receive the following error if the whitelist has not been configured: ~/oozie/examples/apps/no-op$ oozie job -oozie http://oozie.servername01:11000/oozie -config job.properties -dryrun
Error: E0901 : E0901: Namenode [servername01:8020] not allowed, not in Oozies whitelist
In Cloudera Manager, browse to Oozie’s Service:
Oozie Server Default Group > Advanced > Oozie Server Configuration Safety Valve for oozie-site.xml
<property> <name>oozie.service.HadoopAccessorService.jobTracker.whitelist</name> <value>servername01:8021,servername02:8021</value> <description>Whitelisted job tracker for Oozie service.</description> </property> <property> <name>oozie.service.HadoopAccessorService.nameNode.whitelist</name> <value>servername01:8020,servername02:8020</value> <description>Whitelisted NameNode for Oozie service.</description> </property>
Note: Leave the <value> blank to accept all servers (or for YARN).
Logs are available at: /var/log/oozie on the Oozie server.
Test Oozie
Load the examples and shared libraries.
- Create an oozie sub directory in your home folder >>mkdir oozie
- Copy /usr/lib/oozie/oozie-sharelib.tar.gz to your oozie directory >>cp /usr/lib/oozie/oozie-sharelib.tar.gz ~/oozie
- Tar the .gz file >>tar xvfz oozie-sharelib.tar.gz
- From your oozie folder make a sub directory called examples >>mk examples
- copy /usr/shared/doc/oozie/oozie-examples.tar.gz to your oozie folder. >>cp /usr/shared/doc/oozie/oozie-examples.tar.gz ~/oozie
- tar the .gz file >>tar xvfz oozie-examples.tar.gz
So now under your home folder you should have an oozie folder and in oozie and examples folder. Now we need to copy this data to hdfs
- make an oozie directory in hdfs. >>sudo -u hdfs hadoop fs -mkdir /user/oozie
- make an hdfs directory. >>sudo -u hdfs hadoop fs -mkdir /user/hdfs
- copy oozie files >>sudo -u hdfs hadoop fs -put share /user/oozie
- copy example files. >>sudo -u hdfs hadoop fs -put examples /user/hdfs
Run the examples:
sudo -u hdfs oozie job -oozie http://localhost:11000/oozie -config examples/apps/no-op/job.properties -run
sudo -u hdfs oozie job -oozie http://localhost:11000/oozie -config examples/apps/java-main/job.properties -run
sudo -u hdfs oozie job -oozie http://localhost:11000/oozie -config examples/apps/pig/job.properties -run
For each of the runs above you can check its status with the following (you will need to replace the id with the Id each run gives you):
sudo -u hdfs oozie job -oozie http://localhost:11000/oozie -info 0000002-130228105804751-oozie-oozi-W
Oozie Commands
Get all running jobs:
oozie jobs -oozie http://oozie.servername01:11000/oozie | grep RUNNING
Or if the job is a coordinator, not a workflow:
oozie jobs -jobtype coordinator -oozie http://oozie.servername01:11000/oozie
Get the jobids of the requested jobs then kill the job:
oozie job -oozie http://oozie.servername01:11000/oozie -kill jobid
Submit job (readies a job to be run):
oozie job -oozie http://oozie.servername01:11000/oozie -config oozieProject/workflowHdfsAndEmailActions/job.properties -submit job: 0000001-130712212133144-oozie-oozi-W
Run job:
oozie job -oozie http://oozie.servername01:11000/oozie -start 0000001-130712212133144-oozie-oozi-W
Check the status:
oozie job -oozie http://oozie.servername01:11000/oozie -info 0000001-130712212133144-oozie-oozi-W
Troubleshooting
Oozie Web Console is Disabled: To Enable Oozie Web Console Install the Ext JS Library
Download the ext-2.2 Oozie Ext JS library.
Copy the folder here: /var/lib/oozie/
Set permissions on the folder: sudo chmod -R 755 /var/lib/oozie/ext-2.2/
Then you can browse to http://oozie.servername01:11000/oozie/
Oozie: Cannot Create Oozie Workflow: Not able to cache shareLib
Stop Oozie Server, from the Actions menu, choose Install Oozie ShareLib.
Run the following command to test the sharedlib:
oozie admin -oozie http://oozie.servername01:11000/oozie -shareliblist pig
You should see the following:
[Available ShareLib]
pig
hdfs://oozie.servername01:8020/user/oozie/share/lib/lib_20140627103340/pig/ant-1.6.5.jar
hdfs://oozie.servername01:8020/user/oozie/share/lib/lib_20140627103340/pig/antlr-2.7.7.jar
…
Resolution: It turns out that this was a problem with Hue, see Hue’s problems for more details. I left this in the troubleshooting section because this is useful information about Oozie.
Log: vi /var/log/oozie/oozie-cmf-oozie5-OOZIE_SERVER-oozie.servername01.log.out
2014-06-27 10:04:29,064 ERROR org.apache.oozie.service.ShareLibService: SERVER[oozie.servername01] USER[-] GROUP[-] Not able to cache shareLib. Admin need to issue oozlie cli command to update sharelib.
java.lang.NullPointerException
at org.apache.oozie.service.ShareLibService.init(ShareLibService.java:108)
at org.apache.oozie.service.Services.setServiceInternal(Services.java:368)
at org.apache.oozie.service.Services.setService(Services.java:354)
…