Hue is a graphical user interface for Hadoop. Hue applications are collected into a desktop-style environment and delivered as a Web application, requiring no additional installation for individual users.
Configure Hue
Install Hue
Cloudera Manager distributes Hue in CDH and offers the following services:
- Hue Server – For small clusters of less than 10 nodes, you can place the Hue service on the same node as the active HDFS NameNode. For larger clusters or for production expect Hue to require more memory – and configure Hue to use MySQL instead of the default PostgreSQL database. To use Hue with HBase, make sure that the HBase Thrift service is installed (see HBase for more information about the HBase Thrift service).
Install and Configure Hue
- Browse to Cloudera Manager, select the arrow down next to the “host”
- Select “add service”
- Select Hue
- On “Add Service Wizard” page click on the box under Hue Service
- Select Node running the Active Name Node (NN)
- Click continue
- The service will then install and restart
- Deploy client configuration and restart (likely will require a restart)
- To configure the service click on Hue from Cloudera Manager site
- Click configuration
- Search for each configuration below from the search box located under “filters”
Configure an LDAP Backend
On the main Cloudera Manager site, click on Hue, and select Configurations. Click on the Security category.
For more information, refer to the Hue Installation Guide: http://cloudera.github.io/hue/docs-2.0.1/manual.html
Test Hue
- connect to http://hue.servername01:9090
- Log in with your Hue credentials
If you fail to connect to the Hue UI there may be a problem with The HBase Thrift server: After you install Hue, you need to make sure that your HBase installation has the HBase Thrift Server installed or you will receive this error from the Hue HBase browser: HBase browser couldn’t connect to localhost:9090
Here is the reason why: In Hue 2.5.0, there is a new feature called “HBase Browser”, it is for user to quickly browsing huge tables and accessing HBase content. You can also create new tables, add data, modify existing cells and filter data with the auto-completing search bar. If you click on “HBase Browser” icon and get “API error: couldn’t connect to localhost:9090”, probably you don’t have a HBase thrift server running.
And how to fix this: In your CM, go to “All Services” -> “hbase1” -> “Instances”, then under “Role Instances”, click on “Add”, choose a node to be “HBase Thrift Server”, then start the Thrift server. By default, Hue connects to itself on port 9090, so make sure Hue knows which node is the Thrift server.
Inspecting the Hue Database
Hue requires an SQL database to store small amounts of data, including user account information as well as history of job submissions and Hive queries. By default, Hue is configured to use either PostgreSQL or an embedded database SQLite for this purpose, and should require no configuration or management by the administrator. However, MySQL is the recommended database to use; this section contains instructions for configuring Hue to access MySQL and other databases.
The default SQLite database used by Hue is located in /usr/share/hue/desktop/desktop.db. You can inspect this database from the command line using the sqlite3 program.
Pig Scripts are located in the following tables:
pig_document
pig_pigscript
For example:
# sqlite3 /var/lib/hue/desktop.db
SQLite version 3.6.22
Enter “.help” for instructions
Enter SQL statements terminated with a “;”
sqlite> .tables
sqlite> .schema auth_user
sqlite> select username from auth_user;
admin
test
sample
sqlite> .quit
Troubleshooting
Hue Cannot See Pig Scripts After Upgrade
Missing Pig scripts: After upgrading from CDH4.7 to CDH5.1.0 the Hue landing page displays this error:
Server Error (500)
Sorry, there’s been an error. An email was sent to your administrators. Thank you for your patience.
More Info:
File Name Line Number Function Name
/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/build/env/lib/python2.6/site-packages/Django-1.4.5-py2.6.egg/django/core/handlers/base.py 111 get_response
/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/views.py 56 home
/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/desktop/core/src/desktop/api.py 37 _get_docs
…
/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/build/env/lib/python2.6/site-packages/Django-1.4.5-py2.6.egg/django/db/models/sql/compiler.py763 results_iter
/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/build/env/lib/python2.6/site-packages/Django-1.4.5-py2.6.egg/django/db/models/sql/compiler.py818 execute_sql
/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/build/env/lib/python2.6/site-packages/Django-1.4.5-py2.6.egg/django/db/backends/sqlite3/base.py 344 execute
Cause: Upgrade failed to create all the required tables for Hue.
Resolution:
1. Go to the Hue directory:
cd /var/lib/hue
2. Backup the database:
cp desktop.db desktop.db.back
3. Sync the database by running syncdb:
/opt/cloudera/parcels/CDH/lib/hue/build/env/bin/hue syncdb –noinput
4. Run the following:
/opt/cloudera/parcels/CDH/lib/hue/build/env/bin/hue migrate –delete-ghost-migrations
Hue is Running Slow
On some installations the Hue service shares its node with the Cloudera Manager Service, which can use quite a bit of memory. If Hue is running slow it is possible that the node is too busy. Restart the Cloudera Manager Service and watch memory. Consider reinstalling Hue on another node.
- Check if memory is a problem on the node, browse to Cloudera Manager, select the node.
- How much memory is used? Is it in the red zone, or yellow? For example, 80% used is generally good for Hue. Too much higher and you will notice slowness.
- If too much memory is in use, restart the Cloudera Manager Service.
- In Cloudera Manager, click Clusters, and select Cloudera Manager Service.
- Within the Cloudera Manager Service, click Actions, Restart.
- Make sure the service comes back up. You should notice that the memory used has gone down quite a bit and Hue is a little more responsive.
Hue is not responding – DatabaseError: database is locked
Problem: Hue does not open, the website spins but does not present a page.
Resolution: I had to restart the service twice, on the second time I took Hue completely down for about a minute to make sure the database had stopped completely. I then started the service and Hue was able to connect.
In the log: I see the following:
DatabaseError: database is locked
[14/Oct/2014 13:10:00 -0700] base ERROR Internal Server Error: /pig/dashboard/
Traceback (most recent call last):
File “/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/core/handlers/base.py”, line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File “/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/apps/oozie/src/oozie/views/dashboard.py”, line 88, in decorate
return view_func(request, *args, **kwargs)
File “/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/apps/pig/src/pig/views.py”, line 58, in dashboard
hue_jobs = Document.objects.available(PigScript, request.user, with_history=True)
File “/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/db/models/query.py”, line 445, in get_or_create
return self.get(**lookup), False
File “/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/db/models/sql/compiler.py”, line 818, in execute_sql
cursor.execute(sql, params)
File “/opt/cloudera/parcels/CDH-5.1.0-1.cdh5.1.0.p0.53/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/db/backends/sqlite3/base.py”, line 344, in execute
return Database.Cursor.execute(self, query, params)
DatabaseError: database is locked
[14/Oct/2014 15:00:51 -0700] api ERROR An error happen while watching the demo running: ‘NoneType’ object has no attribute ‘group’
[14/Oct/2014 15:00:51 -0700] api ERROR An error happen while watching the demo running: ‘NoneType’ object has no attribute ‘group’
[14/Oct/2014 15:00:52 -0700] api ERROR An error happen while watching the demo running: ‘NoneType’ object has no attribute ‘group’
Hue: Cannot access Spark from Hue
An error happened with the Spark Server:
HTTPConnectionPool(host=’localhost’, port=8090): Max retries exceeded with url: /jobs (Caused by <class ‘socket.error’>: [Errno 111] Connection refused)
Under Hue Configuration (within Cloudera Manager) / Advanced / Hue Server Advanced Configuration Snippet (Safety Valve) for hue_safety_valve_server.ini
Add the following section:
[spark]
# URL of the REST Spark Job Server.
server_url=http://spark.rest.servername01:18080/
See the Configure Spark section for more information.
Hue: Cannot run Pig Scripts from Hue to YARN (using MRv2)
Resolution: YARN’s resources were set too low (memory was set to 50 MB, when it should have been set to 1 GB).
I tried to narrow down the problem I’m having with running Pig scripts through Hue and YARN. Here is what I do:
1. Create a Pig Script in Hue:
offers = LOAD ‘/tmp/datafile.txt’ USING PigStorage AS (name:CHARARRAY);
The script succeeds.
2. However, when I add a dump to the script, like this:
offers = LOAD ‘/tmp/datafile.txt’ USING PigStorage AS (name:CHARARRAY);
dump offers;
To see the log, click on the status of the Pig job in the top right corner, it will open its Oozie workflow, then click on the Pig action on the log icon on the right. You should have more interesting logs!
For example, in the log I see this line: 2014-08-14 16:24:35,692 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – More information at: http://node.servername05:50030/jobdetails.jsp?jobid=job_1408018429315_0002
The script never moves past 0% and repeats Heat beat over and over again. The job displays in Oozie but never goes anywhere (the job is stuck on RUNNING). This same script worked in CDH 4.7 using MRv1. I can’t find much in the logs to help identify a problem, it just never finishes.
Here is an excerpt from the job’s log:
2014-08-19 14:31:01,128 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – More information at: http://node.servername05:50030/jobdetails.jsp?jobid=job_1408403413938_0014
2014-08-19 14:31:01,227 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher – 0% complete
Heart beat
…
Did not work:
- Reinstall the Oozie sharelib, 1. Stop Oozie, 2. Under Actions, select Install Sharelib 3. Make sure that the sharelib is using the one for Yarn: oozie-sharelib-yarn.tar.gz
- Click on ‘Hue server’, stop it, then do ‘Synchronize Database’ and restart Hue.
- I applied the change from step #5 in the document: http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/, but unfortunately, it did not help. But this looks very similar to my problem.
For information on how to configure Yarn, see Configure Yarn, specifically, Configure Yarn Resources.
Hue: Cannot Open Workflow Editor
Problem: Hue cannot edit an Oozie workflow.
Open Hue, click Workflow, and select Editor.
Receive the error: Server Error (500)
Resolution: After some debugging (see below), on the node that is running Hue can you run this from a bash shell:
/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/build/env/bin/hue migrate –delete-ghost-migrations
Results from the migrate command:
Running migrations for desktop:
– Migrating forwards to 0007_auto__add_documentpermission__add_documenttag__add_document.
> desktop:0007_auto__add_documentpermission__add_documenttag__add_document
– Loading initial data for desktop.
Installed 0 object(s) from 0 fixture(s)
…
Running migrations for oozie:
– Migrating forwards to 0025_change_examples_path_format.
> oozie:0022_auto__chg_field_mapreduce_node_ptr__chg_field_start_node_ptr
> oozie:0023_auto__add_field_node_data__add_field_job_data
> oozie:0024_auto__chg_field_subworkflow_sub_workflow
> oozie:0025_change_examples_path_format
– Migration ‘oozie:0025_change_examples_path_format’ is marked for no-dry-run.
– Loading initial data for oozie.
Installed 0 object(s) from 0 fixture(s)
…
south.exceptions.GhostMigrations:
! These migrations are in the database but not on disk:
<oozie: 0022_change_examples_path_format>
! I’m not trusting myself; either fix this yourself by fiddling
! with the south_migrationhistory table, or pass –delete-ghost-migrations
! to South to have it delete ALL of these records (this may not be good).
The error points to a problem in Oozie:
[30/Jun/2014 09:14:17 -0700] base ERROR Internal Server Error: /oozie/list_workflows/
Traceback (most recent call last):
File “/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/core/handlers/base.py”, line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File “/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/apps/oozie/src/oozie/views/editor.py”, line 64, in list_workflows
data = Document.objects.available(Workflow, request.user)
…
File “/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/db/models/sql/compiler.py”, line 818, in execute_sql
cursor.execute(sql, params)
File “/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/db/backends/sqlite3/base.py”, line 344, in execute
return Database.Cursor.execute(self, query, params)
DatabaseError: no such table: desktop_documenttag
[30/Jun/2014 09:14:17 -0700] middleware INFO Processing exception: no such table: desktop_documenttag: Traceback (most recent call last):
File “/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/core/handlers/base.py”, line 111, in get_response
response = callback(request, *callback_args, **callback_kwargs)
…
File “/opt/cloudera/parcels/CDH-5.0.2-1.cdh5.0.2.p0.13/lib/hue/build/env/lib/python2.7/site-packages/Django-1.4.5-py2.7.egg/django/db/backends/sqlite3/base.py”, line 344, in execute
return Database.Cursor.execute(self, query, params)
DatabaseError: no such table: desktop_documenttag
[30/Jun/2014 09:14:16 -0700] access INFO 192.168.200.157 admin – “GET /oozie/list_workflows/ HTTP/1.1”
Hue: Cannot Create a New Workflow
User receives a 500 Server error when they click on the Workflow Editor and attempt to Create a new Workflow.
Error: User: httpfs is not allowed to impersonate hue (error 500)
On Hue’s web UI we see the following: 500 Server error: Sorry, there’s been an error. An email was sent to your administrators. Thank you for your patience.
Within Hue’s log file we see:
sudo less /var/log/hue/runcpserver.log
[12/Sep/2017 14:42:58 -0700] connectionpool INFO Resetting dropped connection: servername01
[12/Sep/2017 14:42:58 -0700] middleware INFO Processing exception: RemoteException: User: httpfs is not allowed to impersonate hue (error 500): Traceback (most recent call last):
File “/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hue/build/env/lib/python2.7/site-packages/Django-1.6.10-py2.7.egg/django/core/handlers/base.py”, line 112, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File “/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hue/build/env/lib/python2.7/site-packages/Django-1.6.10-py2.7.egg/django/db/transaction.py”, line 371, in inner
return func(*args, **kwargs)
…
WebHdfsException: RemoteException: User: httpfs is not allowed to impersonate hue (error 500)
Narrow down the error within httpfs:
less /var/log/hadoop-httpfs/hadoop-cmf-hdfs-HTTPFS-httpfs.servername01.log.out
Resolution:
The impersionation account error to HttpFS gave me the clue. We set proxy groups in HDFS to allow us to tighten permissions on this service. Permissions, in the form of an impersonation account, were added to protect our HttpFS service from unauthorized read/writes.
Find the hadoop.proxyuser.httpfs.groups configuration in HDFS and add hue.