To support the collection of various time series data for dashboarding/alerting purposes, I recommend OpenTSDB. OpenTSDB is a time series database built on top of HBase http://opentsdb.net/
Installation
OpenTSDB Setup instructions
Setting up test environment on your Linux VM:
- Setup an HBase cluster on your Dev machine (can be done through Cloudera Manager/CDH, test that it is running by visiting in a web browser: http://<yourdevmachine>:60010
- Obtain the Debian package for opentsdb from their releases page:
- https://github.com/OpenTSDB/opentsdb/releases
- you can download the .deb file by running: wget https://github.com/OpenTSDB/opentsdb/releases/download/v2.2.0/opentsdb-2.2.0_all.deb
- Install the debian package by running: sudo dpkg -i opentsdb-2.2.0_all.deb
- Install other dependency, gnuplot: sudo apt-get install -y gnuplot
- Create the opentsdb tables in hbase by running:
- env COMPRESSION=SNAPPY HBASE_HOME=/opt/cloudera/parcels/CDH/lib/hbase /usr/share/opentsdb/tools/create_table.sh
- This command does not require root or sudo
- Add following configuration to the opentsdb config file at /etc/opentsdb/opentsdb.conf: tsd.core.auto_create_metrics = true
- Start the TSDB daemon by running: sudo service opentsdb start
- Verify and test
- logs are stored in /var/log/opentsdb/opentsdb.log
- you can view the basic ui at http://<yourdevmachine>:4242
How to Use OpenTSDB
Writing data to OpenTSDB
Writing data to OpenTSDB can be done either through its REST API or via direct socket connections. The general message written to TSDB requires the following information:
- metric name
- Namespacing of metrics is important to be able to quickly query data in the way you expect viewers of the data would want to query it.
- Example name: webservers.sys.cpu.user.all
- Case sensitive
- No spaces in name allowed
- Allowed chars:
a
toz
,A
toZ
,0
to9
,-
,_
,.
,/
- Important thought must be decided up front about how to name the time series because it serves as the default aggregation unit when no tags are specified in queries
- Example: If you care about the total size of a kafka topic, you can still gather metrics for each partition but you would identify each partition via a tag and use the same metric name for all partitions
- “kafka.topic.mykafkatopic.size 1466017245 100 partition=0”
- “kafka.topic.mykafkatopic.size 1466017245 150 partition=1”
- “kafka.topic.mykafkatopic.size 1466017245 200 partition=2”
- A query for kafka.topic.mykafkatopic.size would automatically sum each of the partitions to produce a total size across all partitions, which is the most common query for this metric anyway
- You can still query for each partition individually by adding the tag “partition=0” to the query
- More details at: http://opentsdb.net/docs/build/html/user_guide/writing.html#naming-schema
- timestamp
- Value of seconds or milliseconds since epoch
- generally should stick to seconds of precision
- More details at: http://opentsdb.net/docs/build/html/user_guide/writing.html#timestamps
- value
- TSDB currently supports numeric values only
- Can be int or float
- More details at: http://opentsdb.net/docs/build/html/user_guide/writing.html#integer-values
- list of tags
- Tags are possible dimensions to slice a particular metric name
- Each metric must have at least 1 tag
- Recommend to use a host=<hostname> tag for the metric
- Tags gives flexibility to query metrics but too many will be a performance hit
- Recommend to keep number of tags per metric to be less than 8
Send data via REST API example
- See the tcollector main class or details at http://opentsdb.net/docs/build/html/api_http/put.html
Send data via simple socket example
- General example: put <metric> <timestamp> <value> <tagk1=tagv1[ tagk2=tagv2 …tagkN=tagvN]>
- CLI example: echo “put kafka.topic.mykafkatopic.size 1466017245 200 partition=2” | nc <tsd host> 4242
- This pipes the echo output over a simple tcp socket to the tsdb server
- Can only send one metric at a time
- Must include a newline at the end (which echo does by default)
Installing tcollector – Sending your dev vm’s system metrics to your local OpenTSDB instance to have some test data to work with
- Obtain the latest version at https://github.com/OpenTSDB/tcollector/releases
- untar to directory of choice
- (untar to /opt/tcollector-v1.3.1) sudo tar -xzf v1.3.1.tar.gz -d /opt/
- run sudo ./tcollector start to start the collector daemon
- logs to /var/log/tcollector.log
- by default, will start all collectors that match running services on your system
- collectors are installed under /opt/tcollector-v1.3.1/collectors
- can manually modify which collectors to run by enabling the execute bit on the collectors
- start|stop the tcollectors by running sudo ./tcollector <action>
- Some collectors require configuration parameters under /opt/tcollector-v1.3.1/collectors/etc/
- Additional documentation: http://opentsdb.net/docs/build/html/user_guide/utilities/tcollector.html
The included collectors with the tcollector package provide a lot of examples to send data, all written in Python. More details at: http://opentsdb.net/docs/build/html/user_guide/writing.html#input-methods
Querying Data from OpenTSDB
Basic frontend is available via http on the same port as the write port (default 4242). Metric names can be searched with autocomplete suggestions available to help find metrics. For dashboard purposes, a frontend like Grafana makes it a little easier to query TSDB but the basics of knowing how to query metric names, setting tags, and downsampling is important.
OpenTSDB supports very high precision from older metric systems, down to the seconds. This power allows for querying at very high resolution but if we’re looking at large windows, that might be more data points than we need. Rather than generating rollups and discarding data, it downsamples data. Downsampling is simply picking an aggregation function (sum, avg, count, max, min, etc) over the downsampling window. For example, if I query for 7 days worth of datapoints for a metric that we collect every 15 seconds, we might want to downsample to the nearest 5 minute or higher so that we’re not overwhelmed by the number of data points. This feature is optional so you can always drill down to the lowest level still.
More info about the theory of querying OpenTSDB available at: http://opentsdb.net/docs/build/html/user_guide/query/aggregators.html
In general, we’ll be querying data for either displaying in dashboards or programmatically for individual datapoints for alerting. Both cases uses the OpenTSDB HTTP API to retrieve data.
Example query to retrieve the last value of a metric for threshold checking
More info at: http://opentsdb.net/docs/build/html/api_http/index.html
References
More info at http://opentsdb.net/docs/build/html/index.html