upscaledb - embedded database technology

How to optimize performance?

Every application is different, and therefore it is difficult to give general rules for optimizations. If you want to squeeze out the last few percents of performance, it is absolutely necessary to write a benchmark which mimicks the behaviour of your application and allows you to test the various upscaledb settings. Or use ups_bench, a tool which was exactly created for this purpose (see Benchmarking). That being said, you can choose from a variety of options to improve performance.

Cache size

Try to come up with a cache size that is big enough that the working set of the index fits into the cache. Again, it is helpful if you create a benchmark application to figure out which size is the best for you.

Key size

Keep keys as small as possible. The more keys fit into a database page, the less I/O is required. You can set the keysize with UPS_PARAM_KEY_SIZE when creating a new Database. A fixed length key size is more efficient than variable length keys.

Key type

Better than specifying a key size is to specify the actual key type - use UPS_PARAM_KEY_TYPE in ups_env_create_db. These key types allow a more dense btree layout, saving I/O and optimizing for CPU caches. Also, the compare function is inline and does not require a callback function, further improving performance. A word of warning though - fixed keys are always stored in the btree node. If the key size is very large then the btree can only store a few keys per node, and the tree's fanout will be high. In such cases it might be better to NOT specify the key size; upscaledb will then move the key to a blob if it becomes too large. Use ups_bench to test the different configurations.

Record size

If all your keys have the same record size then also specify UPS_PARAM_RECORD_SIZE when calling ups_env_create_db. Small records are packed into the Btree leafs and do not require allocation of external blob space, further increasing performance.

Page size

The default pagesize is always a good start. However, if your keys are larger, you might want to increase the page size. Otherwise pages have to be split too often. Again it helps if you write a benchmark or use ups_bench for testing.

Compiler Flags

The GNU compiler collection has a few switches which squeeze out extra performance:

-mfpmath=sse -Ofast -flto -march=native -funroll-loops

This switch can be enabled at compile time:

./configure CFLAGS="-mfpmath=sse -Ofast -flto -march=native -funroll-loops"

Transactions

Transaction states are stored in memory and are consolidated with the B+Tree index at runtime. This consolidation is tricky when duplicate keys are involved, therefore performance will be a bit better if duplicate keys are disabled. Also, when inserting values it is VERY important to use the UPS_OVERWRITE flag whenever possible. An insert with UPS_OVERWRITE will not require any disk I/O.

How to improve error resiliance?

Choose one of the following options:

UPS_ENABLE_TRANSACTIONS: Transactions will make sure that no data is lost and the database file is always in a consistent state.
UPS_ENABLE_RECOVERY: Writes all modified pages to a write-ahead log file. If the application crashes, upscaledb will read these log files and recover itself.
UPS_ENABLE_FSYNC: Calls fsync() and flushes modified buffers to the harddisk. This protects against system crashes (i.e. power failures), but costs lots of performance.

Is upscaledb thread safe? Does it support concurrency?

upscaledb is thread-safe and can be used from multiple threads without problems. However, it is not yet concurrent; it uses a big lock to make sure that only one thread can access the upscaledb environment at a time.

In addition, ups_db_find (and ups_cursor_find) return temporary pointers that can be overwritten by subsequent calls, also from other threads. Use UPS_RECORD_USER_ALLOC or Transactions if this is a problem. See the upscaledb.h documentation on ups_record_t for more information.

Are upscaledb files little endian or big endian?

The upscaledb file format is stored in host endian. If you open a big endian file on a little endian machine (or vice versa) then you will get corrupt data. Also, some compression algorithms only work on little endian machines. Use ups_export and ups_import to export and import the data from little- to big-endian and vice versa.

Why is the library so huge?

The upscaledb library can be up to 12 mb in size (this varies from platform to platform). Reason is that all possible Btree configurations are mapped to C++ template classes, which causes the compiler to generate lots of debug information. To remove the debug information, simply strip the library.

Can I use upscaledb in commercial projects?

Yes. You can use upscaledb in open source AND closed source projects, for commercial or non-commercial uses. upscaledb is released under the Apache Public License 2.0. See the file COPYING which is also distributed in the source tarball.

Frequently Asked Questions