Tutorial

Analytics

upscaledb supports analytical functions which are directly executed in the B-Tree and extremely fast. These functions are still experimental, which means that their interfaces are not yet stable.

The API is a set of public functions which can perform analytical calculations. They are declared in the header file ups/upscaledb_uqi.h [https://github.com/cruppstahl/upscaledb/blob/master/include/ups/upscaledb_uqi.h](show file).

UPS_EXPORT ups_status_t UPS_CALLCONV
uqi_count(ups_db_t *db, ups_txn_t *txn, uqi_result_t *result);

uqi_count is the equivalent of the SQL COUNT operation. It counts all keys of a database and includes duplicate keys (if available). The result of the operation is stored in result.u.result_u64 (uqi_result_t is a union).

UPS_EXPORT ups_status_t UPS_CALLCONV
uqi_count_distinct(ups_db_t *db, ups_txn_t *txn, uqi_result_t *result);

Same as above, but does not include duplicate keys (SQL: COUNT DISTINCT).

UPS_EXPORT ups_status_t UPS_CALLCONV
uqi_count_if(ups_db_t *db, ups_txn_t *txn, uqi_bool_predicate_t *pred,
        uqi_result_t *result);

Same as uqi_count, but only counts keys where the pred function (a structure which describes a callback function) returns true. In SQL this is similar to specifying a WHERE clause when COUNTing: SELECT COUNT(col) WHERE …

UPS_EXPORT ups_status_t UPS_CALLCONV
uqi_count_distinct_if(ups_db_t *db, ups_txn_t *txn, uqi_bool_predicate_t *pred,
        uqi_result_t *result);

Same as uqi_count_if, but for distinct keys.

In addition to counting, there are a few other operations that are currently implemented:

UPS_EXPORT ups_status_t UPS_CALLCONV
uqi_average(ups_db_t *db, ups_txn_t *txn, uqi_result_t *result);

Calculates the average of all keys. Also exists as uqi_average_if with a predicate.

UPS_EXPORT ups_status_t UPS_CALLCONV
uqi_sum(ups_db_t *db, ups_txn_t *txn, uqi_result_t *result);

Calculates the sum of all keys. Also exists as uqi_sum_if.

These functions are just small examples of what is possible in a key/value store which is optimized for analytics. Which analytical functions are the most important for you?