[MDEV-13845] benchmark RocksDB engine Created: 2017-09-19  Updated: 2021-09-16  Resolved: 2021-09-16

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - RocksDB
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Axel Schwenke Assignee: Axel Schwenke
Resolution: Fixed Votes: 0
Labels: benchmarking

Attachments: File MDEV-13845-B.ods     File MDEV-13845-upstream.ods     File MDEV-13845.ods     File my.cnf    
Issue Links:
Relates
relates to MDEV-17261 sysbench oltp read only too slow for ... Closed
relates to MDEV-18080 Run MyRocks benchmark: MariaDB vs Per... Open

 Description   

Check RocksDB performance for common database operations (reads/writes). Especially how it compares to InnoDB in terms of throughput and storage needs.



 Comments   
Comment by Sergei Petrunia [ 2018-09-23 ]

See also MDEV-17261

Comment by Sergei Petrunia [ 2018-09-23 ]

axel, can you re-run the benchmark with the patch for MDEV-17261 ? (Just pushed it to 10.2, fix should be trivially applicable to 10.3?)

Comment by Axel Schwenke [ 2018-11-08 ]

New results (MyRocks and InnoDB from MariaDB 10.3.10) in spreadsheet MDEV-13845-B.ods The changes from MDEV-17261 are generally an improvement, especially for workloads containing reads and at high concurrency. There are some minor regressions up to -4%, but it's not clear how to attribute those. After all there were many changes between 10.3.6 and 10.3.10.
InnoDB ist still much faster than MyRocks - when it has enough memory. If memory buffers are small (smaller than the dataset) then MyRocks shines.
The data set was about 10GB in InnoDB vs. 5GB in MyRocks (40 mio rows). For the memory-based measurement we had 16G buffer pool / block cache and for the IO bound measurement it was 4GB buffer pool / block cache. MyRocks uses optimized settings (see attached my.cnf).

Comment by Sergei Petrunia [ 2018-11-11 ]

axel, thanks for the new results.

  • I think, a slowdown of a few percents is not the primary concern ATM.
  • The difference wit InnoDB for in-memory workloads seems to be too big. I would like to investigate it but I'm a bit hesitant in which direction to dig.
  • Where does the rocksdb_default_cf_options and other MyRocks settings in the attached my.cnf come from?
  • Will it require a lot of effort to also compare with MyRocks upstream (their "official" clone/builds steps are here: https://github.com/facebook/mysql-5.6/wiki/Build-Steps)

Questions about the configuration

  • Which version of sysbench are you using (0.x or 1.x series)?
  • The binlog was disabled for this workload, correct (don't see it in my.cnf)?
  • Can you paste the command line that was used to run sysbench (both the "prepare" and "run" ones?)
Comment by Axel Schwenke [ 2018-11-15 ]

psergey some answers to your questions:

Comment by Mark Callaghan [ 2018-11-21 ]

In the comment above this one there are links to 2 github directories, and each of those directories has many files like my.cnf.X. How do I find the the my.cnf used for MyRocks and InnoDB from these many possible my.cnf files?

Comment by Sergei Petrunia [ 2018-11-28 ]

As far as I understand, the two digit number is the run number.
There is a DESC file
https://github.com/hgxl64/mariadb-benchmarks/blob/master/Sysbench-0.5/MDEV-13845/RocksDB-10.3.10/DESC
which says

01 ... MyRocks (10.3.6), from memory
02 ... MyRocks (10.3.6), IO bound
03 ... InnoDB (10.3.10), from memory
04 ... InnoDB (10.3.10), IO bound
05 ... MyRocks (10.3.10), from memory
06 ... MyRocks (10.3.10), IO bound

Comment by Sergei Petrunia [ 2018-11-28 ]

This agrees with my.cnf files. Also, some of them are symlinks.

 my.cnf.01   # MyRocks, "in-memory", rocksdb_block_cache_size= 16G
 my.cnf.02   # MyRocks,  "on-disk", same as above but rocksdb_block_cache_size is 4G.
 my.cnf.03   # InnoDB, "in-memory", innodb_buffer_pool=16G
 my.cnf.04   # InnoDB, "on-disk", same as above but innodb_buffer_pool=4G
 my.cnf.05 -> my.cnf.01 # and run on MariaDB 10.3.10
 my.cnf.06 -> my.cnf.02 # and run on MariaDB 10.3.10

mdcallag, so as far as my understanding goes, the issue (MyRocks slower than InnoDB for in-memory workload, on recent build) was observed on these:

InnoDB: https://github.com/hgxl64/mariadb-benchmarks/blob/master/Sysbench-0.5/MDEV-13845/RocksDB-10.3.10/my.cnf.03
MyRocks: https://github.com/hgxl64/mariadb-benchmarks/blob/master/Sysbench-0.5/MDEV-13845/RocksDB-10.3.10/my.cnf.01

Comment by Sergei Petrunia [ 2018-11-28 ]

I haven't yet tried to reproduce myself. It would be interesting to do so (to check if there are some hidden factors at play)

Comment by Mark Callaghan [ 2018-11-28 ]

— Overview

From sysbench on my hardware (at home and at work), MyRocks in FB MySQL 5.6.35 vs InnoDB from upstream

  • in-memory - MyRocks gets 70% to 80% of InnoDB
  • io-bound - MyRocks is close to InnoDB - sometimes better, sometimes worse

From results by Alex (MDEV-13845-B.ods) for sysbench read-only I see:

  • in-memory - MyRocks is in the range of 70% to 80% of InnoDB except at very high concurrency (clients >> CPU cores)
  • IO-bound - MyRocks does better than InnoDB

For sysbench read-write with 22% writes I see:

  • in-memory and IO-bound - MyRocks QPS is saturated to ~35k - this is something to debug as I don't see that in my tests from late 2017

in-memory, big server - http://smalldatum.blogspot.com/2017/12/sysbench-in-memory-and-fast-server.html
in-memory, small server - http://smalldatum.blogspot.com/2017/11/sysbench-in-memory-small-server-innodb.html
io-bound, big server - http://smalldatum.blogspot.com/2017/12/sysbench-io-bound-and-fast-server.html
io-bound, small server - http://smalldatum.blogspot.com/2017/11/sysbench-io-bound-small-server-myrocks_29.html

While these blog posts don't always share the absolute values for sysbench QPS, I have that data too archived elsewhere.

— Summary

Questions about HW:

  • how much RAM in test server?
  • how many CPU cores (real, and then with HT enabled)?
  • what is the storage device?

Things to debug:
1) why does MyRocks do lousy at very high concurrency for in-memory sysbench read-only
2) why does MyRocks saturate at ~35k QPS for sysbench read-write

PMP stack traces will help figure out both issues.

Things that might help me:

  • what version of InnoDB is used by MariaDB 10.3? Is it most similar to 5.6, 5.7 or 5.8?
  • what version of MyRocks is used?

— Feedback on my.cnf

From the my.cnf files that Sergey listed:
From InnoDB: https://github.com/hgxl64/mariadb-benchmarks/blob/master/Sysbench-0.5/MDEV-13845/RocksDB-10.3.10/my.cnf.03
MyRocks: https://github.com/hgxl64/mariadb-benchmarks/blob/master/Sysbench-0.5/MDEV-13845/RocksDB-10.3.10/my.cnf.01

This is small (~100MB/s) and might cause writes to be throttled. I frequently don't set it for benchmarks.

  • rocksdb_rate_limiter_bytes_per_sec=104857600

From the other options:
rocksdb_default_cf_options="write_buffer_size=256m;target_file_size_base=32m;max_bytes_for_level_base=512m;max_write_buffer_number=4;\
level0_file_num_compaction_trigger=4;level0_slowdown_writes_trigger=20;level0_stop_writes_trigger=30;\
block_based_table_factory=

{cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=0}

;\
level_compaction_dynamic_level_bytes=true;optimize_filters_for_hits=true;memtable_prefix_bloom_size_ratio=0.05;prefix_extractor=capped:12;\
compaction_pri=kMinOverlappingRatio;compression=kZSTD;bottommost_compression=kZSTD;compression_opts=-14:1:0"

The number of background threads used for compaction/flush was not set and should be set for a benchmark. The default is probably too small and the outcome will be more write stalls. On current MyRocks the option is rocksdb_max_background_jobs.

Compression is zstd for all levels. For in-memory benchmark I would disable compression as that can help write throughput and reduce CPU load. If database is small enough to fit in-memory than there isn't much to be gained from compression. If you must use MyRocks compression in this case then switch to lz4 or snappy. For IO-bound tests I use no compression for L0, L1 and L2 then lz4 or snappy for remaining levels except the max, then zstd for the max. Or just use snappy/lz4 for all levels. zlib and zstd impose big CPU overhead with little gain for L0, L1, L2.

I would start with a normal bloom filter. The above used a prefix bloom filter. So rather than:
block_based_table_factory=

{cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=0}

;\
memtable_prefix_bloom_size_ratio=0.05;prefix_extractor=capped:12;\

I would use:
block_based_table_factory=

{cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1}

;\

My general tuning advice is at:

Comment by Mark Callaghan [ 2018-11-28 ]

I don't like that MyRocks appears to be worse at too-much concurrency than InnoDB on the sysbench read-only and in-memory workload. But I am also not that interested in that workload – 256 or 128 concurrent queries with an in-memory database and a server with much less than 128 CPU cores means the CPU is significantly oversubscribed and that is not a good way to run MySQL or MariaDB.

Alas the MySQL community has been putting too much focus on that result for a long time. So I won't blame Axel here.

Comment by Axel Schwenke [ 2018-12-04 ]

Hi mdcallag. Some answers to your questions:

Questions about HW:
how much RAM in test server?
how many CPU cores (real, and then with HT enabled)?
what is the storage device?

The test server has 128G of RAM, so the test uses only a small portion. There are 16 physical cores (in 2 sockets), HT is on, so Linux sees 32 cores. The storage is two SATA SSD (INTEL SSDSC2BB300H4) in RAID0 (Linux md, 1M chunks) formatted with ext4 (relatime,data=ordered).

Things that might help me:
what version of InnoDB is used by MariaDB 10.3? Is it most similar to 5.6, 5.7 or 5.8?
what version of MyRocks is used?

InnoDB in MariaDB 10.3 is from MySQL 5.7.21, MyRocks reports "rocksdb_version=5.8.0"

Comment by Mark Callaghan [ 2018-12-04 ]

RocksDB 5.8 is from December, 2017. While that is now ~1 year old, it has many of the important fixes. So I won't suggest it is too old.
https://github.com/facebook/rocksdb/tree/5.8.fb

For 16 cores and 32 HW threads I am not that interested in throughput for an in-memory workload and 128 or 256 concurrent clients running without think time. Alas, I also want to know that MyRocks doesn't fall over (QPS collapse) at that point and perhaps it does in this case. PMP stack traces might help to debug it.

Comment by Sergei Petrunia [ 2018-12-06 ]

> InnoDB in MariaDB 10.3 is from MySQL 5.7.21, MyRocks reports "rocksdb_version=5.8.0"
axel, how do you get that? I don't see any status variable named rocksdb_version ?
I do see this when I look into DATADIR/#rocksdb/LOG :

2018/12/05-21:30:18.574825 7f1c7e9c1740 RocksDB version: 5.14.0
2018/12/05-21:30:18.574829 7f1c7e9c1740 Git sha rocksdb_build_git_sha:@@

(we should look at printing a value for rocksdb_build_git_sha... But there is @@rocksdb_git_hash status variable also, which gives one the git hash of the RocksDB's revision)

Comment by Axel Schwenke [ 2018-12-06 ]

psergey how do you get that?

I looked in the OPTIONS* file in the RocksDB datadir. But I made a mistake. 5.8.0 is the RocksDB version in MariaDB 10.3.6. It's indeed 5.14.0 in 10.3.10.

I'd like to have the RocksDB version printed to the server error log at startup, just like InnoDB does. Want a MDEV for that?

Comment by Axel Schwenke [ 2019-01-22 ]

I added results from a comparison between RocksDB in MariaDB 10.3.10 and RocksDB from Facebook 5.6.35. The file is MDEV-13845-upstream.ods

Observations:

  • for read-only workload, MariaDB outperforms Facebook
  • for mixed workload, Facebook is faster; even more in the IO bound case
  • MariaDB scales better to higher thread numbers

I also tested InnoDB performance in both variants (sheet #2 in the document). In MariaDB InnoDB is mostly better when it has enough memory and uses many threads, but slower if it runs IO bound.

Generated at Thu Feb 08 08:08:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.