Posted on Mar 22, 18:38:21 CET 2016 - SHARE:
This blog post shows how to use the new "UQI" query interface of upscaledb 2.2.0. It's the follow-up to the series "Building an embedded column store database with upscaledb" (part one, part two and part three).
In the previous parts I demonstrated how to use upscaledb to create multiple indexes on one or several columns, and how to perform range queries with cursors. This part will show how to run queries with the UQI query interface.
upscaledb has a lot to offer regarding analytics: it has a full set of APIs dedicated to analytical calculations. These APIs are called "UQI" (Upscaledb Query Interface) and declared in the header file `ups/upscaledb_uqi.h`. These functions were rewritten for version 2.2.0 and now support an SQL-ish query language which can be extended through custom plugins.
This query interface can run full-table scans over a single database and apply aggregation functions and filter predicates. Functions and predicates can be extended through user-supplied plugins. Joins are not supported.
Internally, the UQI functions can operate directly on the B+-tree node data. B+-trees store keys and records very compact and efficiently. While you could also use cursors to implement this functionality, a cursor requires the key- and record data to be copied from the B+-tree to the application. The new UQI interface does the opposite: it performs the query in the B+-tree and does not copy any data. This makes the UQI queries much faster.
Here is a list of the most important APIs:
Performs a "select" query. The query syntax is described below. Results
are returned in result
. This is a shortcut for uqi_select_range
.
UPS_EXPORT ups_status_t UPS_CALLCONV uqi_select(ups_env_t *env, const char *query, uqi_result_t **result);
Performs a "select" query. If begin
is not NULL then the query
starts at the specified position, otherwise it starts at the beginning of the
database. Afterwards, the begin
cursor is advanced to the next
element after end
. If end
is not NULL then the query
stops at the specified position, otherwise it stops at the end of the database.
This API can be used for pagination.
UPS_EXPORT ups_status_t UPS_CALLCONV uqi_select_range(ups_env_t *env, const char *query, ups_cursor_t *begin, const ups_cursor_t *end, uqi_result_t **result);
Returns the number of rows from a result
object.
UPS_EXPORT uint32_t UPS_CALLCONV uqi_result_get_row_count(uqi_result_t *result);
Returns the key type from a result
object. Key types are
UPS_TYPE_UINT8
, UPS_TYPE_UINT32
,
UPS_TYPE_BINARY
etc.
UPS_EXPORT uint32_t UPS_CALLCONV uqi_result_get_key_type(uqi_result_t *result);
Returns the record type from a result
object. Record types are
UPS_TYPE_UINT8
, UPS_TYPE_UINT32
,
UPS_TYPE_BINARY
etc.
UPS_EXPORT uint32_t UPS_CALLCONV uqi_result_get_record_type(uqi_result_t *result);
Returns a key from the specified row of an result object. I.e. uqi_result_get_key(&result, 0, &key);
returns the first key from the query results.
UPS_EXPORT void UPS_CALLCONV uqi_result_get_key(uqi_result_t *result, uint32_t row, ups_key_t *key);
Returns a record from the specified row of an result object. I.e. uqi_result_get_record(&result, 0, &record);
returns the first record from the query results.
UPS_EXPORT void UPS_CALLCONV uqi_result_get_record(uqi_result_t *result, uint32_t row, ups_record_t *record);
Returns the serialized key data. If the key type has fixed length (i.e. UPS_TYPE_UINT32
) then the returned pointer can be casted directly into a uint32_t *
pointer. The size
returns the size of the integer array.
UPS_EXPORT void *UPS_CALLCONV uqi_result_get_key_data(uqi_result_t *result, uint32_t *size);
Returns the serialized record data. If the record type has fixed length (i.e. UPS_TYPE_UINT32
) then the returned pointer can be casted directly into a uint32_t *
pointer. The size
returns the size of the integer array.
UPS_EXPORT void *UPS_CALLCONV uqi_result_get_record_data(uqi_result_t *result, uint32_t *size);
The query format is simple and mostly self-explanatory. Key words are not case sensitive. Here are a few examples:
Calculates the sum of all keys from database 1
SUM($key) from database 1
Counts all keys in database 13
count($key) from database 13
Counts all distinct keys in database 13 (see below for the meaning of 'distinct', which differs from a standard SQL database)
distinct count($key) from database 13
Counts all distinct keys in database 13 with even records (the "even" predicate has to be supplied by the user)
distinct count($key) from database 13 where even($record)
Note that you can only SUM up records (or keys) if their type is numeric!
SUM($key) from database 1
SUM($record) from database 1
The TOP and BOTTOM queries are the only queries which accept the LIMIT keyword. If you specify a LIMIT for any other function the the UQI parser will currently fail with an error. You can also query TOP/BOTTOM queries over the keys, but since keys are already sorted it would be much cheaper to use a cursor instead (with the BOTTOM elements starting at the beginning of the database, and the TOP elements at the end).
TOP($record) from database 1 LIMIT 20
BOTTOM($record) from database 1 LIMIT 10
Again you can calculate the minimum and maximum from keys as well (by using MIN($keys) from database 1
), but since keys are already sorted it would be much more efficient to use a cursor and retrieve the first (or last) element in the database.
MIN($record) from database 1
MAX($record) from database 1
A predicate can be supplied with the WHERE
clause. In
release 2.2.0 there are no predicates supplied, but they can
be written as a plugin. A predicate can also be loaded from an external (dynamic) library.
Counts all distinct keys in database 13 with even records. The "even" predicate is loaded from a dynamic library, and therefore has to be quoted.
distinct count($key) from database 13 where "even@plugin.so"($record)
The "distinct" keyword ignores duplicate keys and skips them when processing the input data. Its meaning is therefore different from DISTINCT in an SQL database, where it filters duplicate results.
For performance reasons, plugins have to be written in C or C++. With some knowledge in C/C++, writing a plugin is not difficult, but it extends the scope of this blog post a bit. We've created a sample which demonstrates how an application can supply its own plugins for aggregating and filtering data: uqi2.c.
The UQI interface is brand new, and fairly unique. If you have any questions then please do not hesitate to write!