[MDEV-17261] sysbench oltp read only too slow for MyRocks. Created: 2018-09-21  Updated: 2018-12-26  Resolved: 2018-09-23

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - RocksDB
Affects Version/s: 10.2, 10.3
Fix Version/s: 10.2.18, 10.3.10

Type: Bug Priority: Major
Reporter: Sergei Petrunia Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: None

Attachments: File my-local-fbmysql.cnf     File my1.cnf     File run-mdev17261.sh    
Issue Links:
Relates
relates to MDEV-13845 benchmark RocksDB engine Closed
relates to MDEV-18080 Run MyRocks benchmark: MariaDB vs Per... Open

 Description   

(Background: So I was directed to this performance comparison result:
https://twitter.com/thewebscaledba/status/1002604510991142912?s=12 They report MyRocks-in-MariaDB is slower than InnoDB-in-MySQL)

This is oltp-read-only, the data is on OS cache. CPU-bound read-only workloads are not a MyRocks strong point.

Still, I got an aws ec2 c5.4xlarge (16 VCPU, 32G RAM) and ran tests on current MariaDB 10.3. Non-default settings:

log_bin=1
sync_binlog=1
binlog_format=row
max_connections=500

sysbench /usr/share/sysbench/oltp_point_select.lua --table-size=5000000 
--threads=1 --rand-type=uniform --db-driver=mysql --mysql-socket=/tmp/mysql.sock 
--mysql-user=root --mysql_storage_engine=$engine    prepare

sysbench /usr/share/sysbench/oltp_point_select.lua --table-size=5000000 --threads=$threads 
--time=180 --rand-type=uniform --db-driver=mysql --mysql-socket=/tmp/mysql.sock 
--mysql-user=root run

The result is not good:

InnoDB:
n_threads   TPS (QPS is the same)
100 181,984
150 182,268
200 178,739

RocksDB
n_threads   TPS (QPS is the same)
100 7,771
150 5,979
200 4,856

The difference is huge!



 Comments   
Comment by Sergei Petrunia [ 2018-09-21 ]

If I manually load the data into MyRocks using bulk loading (so that SST files end up on L6, and all other levels are empty, so there's no read amplification), then... the result is nearly the same.

Comment by Sergei Petrunia [ 2018-09-21 ]

Another odd observation:

  • during InnoDB run, all CPUs are busy. the disk is idle.
  • during MyRocks run, the typical picture is like this:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.44    0.00    7.95    0.45    0.00   88.17
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme0n1           0.00     0.00    0.00 6121.00     0.00     0.00     0.00     0.15    0.02    0.00    0.02   0.02  14.70

why is a read-only workload generating 6K writes per second??

Comment by Sergei Petrunia [ 2018-09-21 ]

Typical stack trace

(gdb)  thread 2
[Switching to thread 2 (Thread 0x7f870585d700 (LWP 2094))]
#0  0x00007f876ae04995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) wher
#0  0x00007f876ae04995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f87653fa02d in rocksdb::port::CondVar::Wait (this=this@entry=0x559182f793c8)
    at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/port/port_posix.cc:91
#2  0x00007f87653456a5 in rocksdb::InstrumentedCondVar::WaitInternal (this=this@entry=0x559182f793c8)
    at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/monitoring/instrumented_mutex.cc:48
#3  0x00007f8765345746 in rocksdb::InstrumentedCondVar::Wait (this=this@entry=0x559182f793c8)
    at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/monitoring/instrumented_mutex.cc:41
#4  0x00007f87652872ee in rocksdb::DBImpl::SyncWAL (this=0x559182f78de0) at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/db/db_impl.cc:797
#5  0x00007f8765280e34 in rocksdb::DBImpl::FlushWAL (this=0x559182f78de0, sync=<optimized out>)
    at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/db/db_impl.cc:780
#6  0x00007f87653e0434 in rocksdb::StackableDB::FlushWAL (this=<optimized out>, sync=<optimized out>)
    at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/include/rocksdb/utilities/stackable_db.h:286
#7  0x00007f876522433d in myrocks::rocksdb_commit (hton=<optimized out>, thd=0x7f86b40008c8, commit_tx=<optimized out>)
    at /home/centos/mariadb-10.3/storage/rocksdb/ha_rocksdb.cc:3763
#8  0x000055918002235d in commit_one_phase_2 (thd=thd@entry=0x7f86b40008c8, all=false, is_real_trans=true, trans=<optimized out>, trans=<optimized out>)
    at /home/centos/mariadb-10.3/sql/handler.cc:1614
#9  0x0000559180023dad in ha_commit_one_phase (thd=thd@entry=0x7f86b40008c8, all=all@entry=false) at /home/centos/mariadb-10.3/sql/handler.cc:1594
#10 0x0000559180024584 in ha_commit_trans (thd=thd@entry=0x7f86b40008c8, all=all@entry=false) at /home/centos/mariadb-10.3/sql/handler.cc:1456
#11 0x000055917ff3cd70 in trans_commit_stmt (thd=thd@entry=0x7f86b40008c8) at /home/centos/mariadb-10.3/sql/transaction.cc:520
#12 0x000055917fe5efc1 in mysql_execute_command (thd=<optimized out>) at /home/centos/mariadb-10.3/sql/sql_parse.cc:6359
#13 0x000055917fe7945f in Prepared_statement::execute (this=this@entry=0x7f86b407f818, expanded_query=expanded_query@entry=0x7f870585c390, 
    open_cursor=open_cursor@entry=false) at /home/centos/mariadb-10.3/sql/sql_prepare.cc:4764
#14 0x000055917fe79576 in Prepared_statement::execute_loop (this=0x7f86b407f818, expanded_query=0x7f870585c390, open_cursor=<optimized out>, 
    packet=<optimized out>, packet_end=<optimized out>) at /home/centos/mariadb-10.3/sql/sql_prepare.cc:4192
#15 0x000055917fe7a151 in mysql_stmt_execute_common (thd=thd@entry=0x7f86b40008c8, stmt_id=1, 
    packet=packet@entry=0x7f86b44dbc62 "def\006sbtest\asbtest1\asbtest1\001c\001c\f\b", 
    packet_end=packet_end@entry=0x7f86b44dbc6c "\asbtest1\asbtest1\001c\001c\f\b", cursor_flags=0, bulk_op=bulk_op@entry=false, 
    read_types=read_types@entry=false) at /home/centos/mariadb-10.3/sql/sql_prepare.cc:3192
#16 0x000055917fe7a1f3 in mysqld_stmt_execute (thd=thd@entry=0x7f86b40008c8, packet_arg=packet_arg@entry=0x7f86b44dbc59 "", 
    packet_length=packet_length@entry=19) at /home/centos/mariadb-10.3/sql/sql_prepare.cc:3090
#17 0x000055917fe6a37f in dispatch_command (command=command@entry=COM_STMT_EXECUTE, thd=thd@entry=0x7f86b40008c8, packet=packet@entry=0x7f86b44dbc59 "", 
    packet_length=packet_length@entry=19, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false)
    at /home/centos/mariadb-10.3/sql/sql_parse.cc:1792
#18 0x000055917fe6c29e in do_command (thd=0x7f86b40008c8) at /home/centos/mariadb-10.3/sql/sql_parse.cc:1395
#19 0x000055917ff3123c in do_handle_one_connection (connect=connect@entry=0x559183ad9328) at /home/centos/mariadb-10.3/sql/sql_connect.cc:1402
#20 0x000055917ff31354 in handle_one_connection (arg=0x559183ad9328) at /home/centos/mariadb-10.3/sql/sql_connect.cc:1308
#21 0x00007f876ae00e25 in start_thread () from /lib64/libpthread.so.0
#22 0x00007f8768dbabad in clone () from /lib64/libc.so.6

#13 0x000055917fe7945f in Prepared_statement::execute (this=this@entry=0x7f86b407f818, expanded_query=expanded_query@entry=0x7f870585c390, 
    open_cursor=open_cursor@entry=false) at /home/centos/mariadb-10.3/sql/sql_prepare.cc:4764
4764          error= mysql_execute_command(thd);
(gdb) p thd
$4 = (THD *) 0x7f86b40008c8
(gdb) p thd->query_string
$5 = {string = {str = 0x7f86b427ef98 "SELECT c FROM sbtest1 WHERE id=?", length = 32}, cs = 0x559180e218e0 <my_charset_latin1>}
(gdb) detach

Comment by Sergei Petrunia [ 2018-09-21 ]

Added this to my.cnf (the second line is actually irrelevant) and restarted

rocksdb_flush_log_at_trx_commit=0
rocksdb_override_cf_options='cf1={block_based_table_factory={filter_policy=bloomfilter:10:false;whole_key_filtering=0;};prefix_extractor=capped:8};'

Running on bulk-loaded data:

RocksDB
n_threads   TPS (QPS is the same)
100  225,305
150  224,000
200  219,875

Comment by Sergei Petrunia [ 2018-09-21 ]

Trying current FB/MySQL (sync_binlog=1, binlog-format=row)

n_threads TPS (QPS is the same)
100 260,244
150 258,392

Comment by Sergei Petrunia [ 2018-09-21 ]

The reason is a bug in group-commit-with-binlog code. We are flushing the WAL, even when the transaction didn't make any changes.
(MyRocks has "if there were no changes commit should be a rollback" in myrocks::Rdb_transaction::commit. ) The problem is the FlushWal() call made from rocksdb_commit.

Comment by Sergei Petrunia [ 2018-09-23 ]

MariaDB, After the patch

 n_threads TPS (QPS is the same)
100 223,913
150 221,107
200 217,329

That is, it is better than InnoDB for me.

Generated at Thu Feb 08 08:35:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.