Cube

Overview Documentation Source

Time Series Visualization

Cube is an open-source system for visualizing time series data, built on MongoDB, Node and D3. If you send Cube timestamped events (with optional structured data), you can easily build realtime visualizations of aggregate metrics for internal dashboards. For example, you might use Cube to monitor traffic to your website, counting the number of requests in 5-minute intervals:

sum(request)

By applying median rather than sum, you can instead observe request duration:

median(request(duration_ms))

Cube supports filters for aggregating a subset of events: you might filter requests for a specific path, host, or user-agent. Or perhaps plot the count of slow requests (>500ms) over time. You can also evaluate arithmetic expressions, even combining multiple event types, say to compare requests to signups for conversion tracking.

Cube speaks WebSockets for low-latency, asynchronous input and output: new events are streamed in, and requested metrics are streamed out as they are computed. (You can also POST events to Cube, if that’s your thing, and collectd integration is included!) Metrics are cached in capped collections, and simple reductions such as sum and max use pyramidal aggregation to improve performance. Visualizations are generated client-side and assembled into dashboards with a few mouse clicks.

Building a Dashboard in 60 Seconds #

Collecting Data #

Events are streamed into Cube in JSON format. Each event has two required fields: type and time. The type is a namespace for partitioning events; each type will be stored in a separate collection, and you can customize the associated indexes or metric cache size. The time is specified in ISO 8601. An optional data field contains whatever else you want to associate with the event.

{
  "type": "request",
  "time": "2011-09-12T21:33:12Z",
  "data": {
    "host": "web14",
    "path": "/search",
    "query": {"q": "flowers"},
    "duration_ms": 241,
    "status": 200,
    "user_agent": "Chrome/13.0.782.112"
  }
}

Processes that send events to Cube are called emitters. You can write them in any language, though the provided examples favor JavaScript. Emitters connect over WebSockets or HTTP POST to collectors, who then save the events to MongoDB and invalidate associated cached metrics. Events can also include an id attribute, allowing you to replace an earlier event with new data.

Query Formulation #

To pose questions about your data, you formulate queries. Cube has a small query language backed by PEG.js; internally, these queries are translated to MongoDB queries. Each query first identifies a set of events, such as those with the type "request". You can filter events with conditional operators:

Multiple filters are intersected by chaining them together. For example, to consider only requests whose duration was more than 250ms but less than 500ms, say request.gt(duration_ms, 250).lt(duration_ms, 500). Filters also work on arrays: if any of the elements in the array match, then the event is accepted. For example, you can use eq to filter events by label.

After identifying a set of events, you derive a value for each event. This value defaults to 1, which is handy for counting. You can specify a particular data field, such as request(duration_ms), or an arithmetic expression such as payment(cents / 100). These values are fed into a reduce function:

You can also use arithmetic expressions for advanced metrics. For example, perhaps you want to know the percentage of page requests that triggered a server error? Query sum(request.ge(status, 500)) / sum(request).

Query Evaluation #

Queries are sent to evaluators, which reply with computed metrics. If you use Cube’s dashboard constructor, you don’t have to talk to evaluators explicitly; that’s handled by the visualization components. But to give an example:

{
  "expression": "sum(request)",
  "start": "2011-09-10T12:37:12Z",
  "stop": "2011-09-13T04:00:02Z",
  "step": 300000
}

The step field specifies how frequently you want to compute the specified metric expression. A value of 300,000 milliseconds corresponds to five minute intervals; thus, the earliest computed metric above is at 12:40 UTC. The supported time intervals are:

Specific time intervals allow Cube to employ pyramidal aggregation for simple reductions, improving performance. For example, if you ask for the number of events in a particular day, Cube can use previously-computed hourly sums to compute the daily total without scanning all the day’s events. Likewise, if five-minute sums are available for missing hours, those hours can be recomputed quickly, bubbling up to the daily total.

Metrics are cached greedily to capped collections as they are computed. You can configure the cache size per event type. (When the capped collection runs out of space, Cube overwrites old cached values.) Cube can return some results immediately if the query is partially cached; remaining results will stream in asynchronously, in arbitrary order, to visualization components:

{"time": "2011-09-10T12:40:00Z", "value": 42}
{"time": "2011-09-10T12:50:00Z", "value": 47}
{"time": "2011-09-10T12:45:00Z", "value": 48}

For specialized applications, you can write your own visualization components. You can also query Cube for raw events using the same query language (at a different endpoint, /event/get rather than /metric/get). For example, here’s a map of recent payments taken with Square, rendered in-browser by streaming raw events:

Building Visualizations #

After installing Cube and writing an emitter or two, use Cube’s graphical constructor to put together a realtime dashboard with a bit of typing and a few clicks. See the 60-second video above as an example. Drag-and-drop charts onto the board, and then configure the backing query and display parameters. Cube dashboards are also powered by WebSockets: anyone viewing or editing the dashboard will see your edits in realtime.

Half-Baked, but Still Tasty

This is a work in progress! We’re working on more advanced queries, visualization components, coordinated views, documentation and countless other goodies. This is an experimental version 0 release: we expect to make non-backwards-compatible changes in the near future as we refine and add new features! Cube is used internally at Square, but not quite battle-hardened. Do, however, get in touch if you want to be involved in development!

Fork me on GitHub