The most important [and the most difficult] part of your application stack is the data. The state is the only thing that matters. Go vs Rust - doesn’t matter. Kubernetes vs Nomad - doesn’t matter. AWS vs GCP - doesn’t matter. The only thing that matters is how your state is stored and configured.
For personal projects I really do not like working with an external data store like Postgres or Redis. I want something that is part of the app i.e. an embedded data store. SQLite is the most common embedded data store but it doesn’t support concurrent writes very well, so I tend to avoid it, and also I don’t really need a full-fledged SQL database - I just want a simple key value store.
Unfortunately the selection is limited because building a data store is difficult. The ones that have the most mind share are RocksDB, Xodus, LMDB, or MapDB. Badger and bolt are great candidates if working with Go but those are unavailable for the JVM.
MapDB uses mmap for file persistence, which I get why data stores would want to do that - let the OS deal with the persistence and abstract the file as memory but I am not a fan of it due to corruption issues and the fact that the storage manager can do a much better job than the OS as to what and when to commit pages. Apart from that, the whole point of building a data store product is to control persistence - why would we outsource that to the OS?
MongoDB, when it was first launched, used mmap. Once they acquired wired tiger they got rid of mmap. LevelDB uses mmap and then once Facebook forked it to build Rocksdb the first thing they did was to get rid of mmap.
LMDB is considered as one of the most performant embedded data stores out there but there are two issues with it as to why I do not use it in my projects. First is that it has a global writer lock 👎 and the second is that the byte buffers need to be flipped before they are persisted making it a bit un-ergonomic to use. I have also tried setting it up multiple times, but it throws an exception and doesn’t really work for some reason?
Rocksdb is an embedded key value store created by Facebook. It is a fork of LevelDB by Google. It uses an LSM-tree for storage as opposed to a B-tree found in other data stores such as Postgres. RocksDB is written in C++ and has a Java wrapper. It is an extremely popular storage manager being used as storage layer for CockroachDB and Rockset analytics. There is even a fork for MySQL using RocksDB called MyRocks. Other companies dealing with large scale data such as Pinterest and LinkedIn are also known to use RocksDB in their in house data store products. Stream dot io’s in house feed platform is also built using RocksDB.
- Gives me exactly what I want in an embedded key value data store - gets, puts, and range reads
- Has a built-in backup API - this is a must have feature for a data store to be used as a building block
- Excellent write performance
- Storage efficient
The way I view RocksDB is that Facebook took the storage manager out of Cassandra and made it a library. It has the same LSM-tree style storage with a Memtable that gets flushed into immutable SStables on disk.
It is like having a persistent concurrent skip list from the JVM collections library.
- It is written in C++ making it difficult to debug and understand, leads to JNI overhead and can cause memory leaks if objects are not freed properly
- Facebook group for support is basically a ghost town
- Lots of deletes/edits can potentially lead to compaction issues - just a fact of life when working with LSM style data stores
CockroachDB recently switched from RocksDB to their own in house RocksDB clone called PebbleDB.
Xodus is an embedded key value store created by JetBrains for their Youtrack product. It uses a Btree for storage [also has the option to use a patricia trie] and thus is optimised for reads.
- Has a really clean API - specially regarding transactions, which allows for higher level constructs to be built
- Btree means the storage is optimised for reads - Most apps are read heavy
- Has a backup API built in
- Provides an entity store and a virtual file system
- Written in Java/Kotlin
- Lack of adoption - Other than Youtrack I am not aware of other products that are using this
- Storage inefficient [see below]
- Performance is Ok [see below]
For a small pet project either is fine but I wanted to see how these data stores perform when working with 10 million keys. The three areas I wanted to test were reads, writes, and storage space used.
The tests were performed on a 4cpu/8gb 160GB nvme droplet from Digital Ocean using OpenJDK 13 and Xmx Xms set to 6g. All the code was written in Kotlin.
The write test was a single thread doing a 100k loop of a transaction committing 100 keys at once. The key and value were both UUIDs.
The findings are as follows:
Xodus averaged 11.8k writes per second while RocksDB did 178k writes per second.
Xodus took 141 minutes to write 10 million keys while RocksDB took 56 seconds. RocksDB is optimised for writes - as it is a simple append in MemTable - but wow that is a huge difference.
The writes per second of RocksDB stayed consistent from 0 keys all the way to 10 million keys at ~5 seconds per million keys. It did not degrade as the data set grew larger. Xodus on the other hand had performance degradation as the number of keys in the database grew. The first million keys were written in 49 seconds and the last million keys were written in 35 minutes. Performance degradation like this makes me wonder if something was off in my code?
Another thing to note is that something really funky started happening after the first 6 million keys were written in the Xodus database - reads on the droplet shot through the roof. At one point it was 200MB/s of reads with an average of 150MB/sec for the last 4 million keys.
The test was to see the size of the database after the 10 million keys were written.
RocksDB used 760MB of storage while Xodus used 5GB.
RocksDB is freaking fantastic here. Given that there were 10 million entries of 72 bytes [two UUIDs at 36 bytes each] each that is 720MB of data.
The test was to iterate through the entire 10 million keys forward and decode them back into strings.
Xodus took 30.9 seconds while RocksDB took 13.2 seconds.
Data storage is difficult. Building something that is correct and performant is really difficult. It is always easy to criticise than to build.
For small projects I think I will use Xodus for the nicer API but if I expect the state to have more than a few million keys I will use RocksDB.
I hope that the JVM world gets something like BadgerDB but in pure Kotlin.