Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17261

sysbench oltp read only too slow for MyRocks.

Details

    Description

      (Background: So I was directed to this performance comparison result:
      https://twitter.com/thewebscaledba/status/1002604510991142912?s=12 They report MyRocks-in-MariaDB is slower than InnoDB-in-MySQL)

      This is oltp-read-only, the data is on OS cache. CPU-bound read-only workloads are not a MyRocks strong point.

      Still, I got an aws ec2 c5.4xlarge (16 VCPU, 32G RAM) and ran tests on current MariaDB 10.3. Non-default settings:

      log_bin=1
      sync_binlog=1
      binlog_format=row
      max_connections=500
      

      sysbench /usr/share/sysbench/oltp_point_select.lua --table-size=5000000 
      --threads=1 --rand-type=uniform --db-driver=mysql --mysql-socket=/tmp/mysql.sock 
      --mysql-user=root --mysql_storage_engine=$engine    prepare
      

      sysbench /usr/share/sysbench/oltp_point_select.lua --table-size=5000000 --threads=$threads 
      --time=180 --rand-type=uniform --db-driver=mysql --mysql-socket=/tmp/mysql.sock 
      --mysql-user=root run
      

      The result is not good:

      InnoDB:
      n_threads   TPS (QPS is the same)
      100 181,984
      150 182,268
      200 178,739
      

      RocksDB
      n_threads   TPS (QPS is the same)
      100 7,771
      150 5,979
      200 4,856
      

      The difference is huge!

      Attachments

        1. my1.cnf
          0.4 kB
        2. my-local-fbmysql.cnf
          0.3 kB
        3. run-mdev17261.sh
          1 kB

        Issue Links

          Activity

            If I manually load the data into MyRocks using bulk loading (so that SST files end up on L6, and all other levels are empty, so there's no read amplification), then... the result is nearly the same.

            psergei Sergei Petrunia added a comment - If I manually load the data into MyRocks using bulk loading (so that SST files end up on L6, and all other levels are empty, so there's no read amplification), then... the result is nearly the same.

            Another odd observation:

            • during InnoDB run, all CPUs are busy. the disk is idle.
            • during MyRocks run, the typical picture is like this:

            avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                       3.44    0.00    7.95    0.45    0.00   88.17
             
            Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
            nvme0n1           0.00     0.00    0.00 6121.00     0.00     0.00     0.00     0.15    0.02    0.00    0.02   0.02  14.70
            

            why is a read-only workload generating 6K writes per second??

            psergei Sergei Petrunia added a comment - Another odd observation: during InnoDB run, all CPUs are busy. the disk is idle. during MyRocks run, the typical picture is like this: avg-cpu: %user %nice %system %iowait %steal %idle 3.44 0.00 7.95 0.45 0.00 88.17   Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.00 0.00 6121.00 0.00 0.00 0.00 0.15 0.02 0.00 0.02 0.02 14.70 why is a read-only workload generating 6K writes per second??

            Typical stack trace

            (gdb)  thread 2
            [Switching to thread 2 (Thread 0x7f870585d700 (LWP 2094))]
            #0  0x00007f876ae04995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
            (gdb) wher
            #0  0x00007f876ae04995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
            #1  0x00007f87653fa02d in rocksdb::port::CondVar::Wait (this=this@entry=0x559182f793c8)
                at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/port/port_posix.cc:91
            #2  0x00007f87653456a5 in rocksdb::InstrumentedCondVar::WaitInternal (this=this@entry=0x559182f793c8)
                at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/monitoring/instrumented_mutex.cc:48
            #3  0x00007f8765345746 in rocksdb::InstrumentedCondVar::Wait (this=this@entry=0x559182f793c8)
                at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/monitoring/instrumented_mutex.cc:41
            #4  0x00007f87652872ee in rocksdb::DBImpl::SyncWAL (this=0x559182f78de0) at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/db/db_impl.cc:797
            #5  0x00007f8765280e34 in rocksdb::DBImpl::FlushWAL (this=0x559182f78de0, sync=<optimized out>)
                at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/db/db_impl.cc:780
            #6  0x00007f87653e0434 in rocksdb::StackableDB::FlushWAL (this=<optimized out>, sync=<optimized out>)
                at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/include/rocksdb/utilities/stackable_db.h:286
            #7  0x00007f876522433d in myrocks::rocksdb_commit (hton=<optimized out>, thd=0x7f86b40008c8, commit_tx=<optimized out>)
                at /home/centos/mariadb-10.3/storage/rocksdb/ha_rocksdb.cc:3763
            #8  0x000055918002235d in commit_one_phase_2 (thd=thd@entry=0x7f86b40008c8, all=false, is_real_trans=true, trans=<optimized out>, trans=<optimized out>)
                at /home/centos/mariadb-10.3/sql/handler.cc:1614
            #9  0x0000559180023dad in ha_commit_one_phase (thd=thd@entry=0x7f86b40008c8, all=all@entry=false) at /home/centos/mariadb-10.3/sql/handler.cc:1594
            #10 0x0000559180024584 in ha_commit_trans (thd=thd@entry=0x7f86b40008c8, all=all@entry=false) at /home/centos/mariadb-10.3/sql/handler.cc:1456
            #11 0x000055917ff3cd70 in trans_commit_stmt (thd=thd@entry=0x7f86b40008c8) at /home/centos/mariadb-10.3/sql/transaction.cc:520
            #12 0x000055917fe5efc1 in mysql_execute_command (thd=<optimized out>) at /home/centos/mariadb-10.3/sql/sql_parse.cc:6359
            #13 0x000055917fe7945f in Prepared_statement::execute (this=this@entry=0x7f86b407f818, expanded_query=expanded_query@entry=0x7f870585c390, 
                open_cursor=open_cursor@entry=false) at /home/centos/mariadb-10.3/sql/sql_prepare.cc:4764
            #14 0x000055917fe79576 in Prepared_statement::execute_loop (this=0x7f86b407f818, expanded_query=0x7f870585c390, open_cursor=<optimized out>, 
                packet=<optimized out>, packet_end=<optimized out>) at /home/centos/mariadb-10.3/sql/sql_prepare.cc:4192
            #15 0x000055917fe7a151 in mysql_stmt_execute_common (thd=thd@entry=0x7f86b40008c8, stmt_id=1, 
                packet=packet@entry=0x7f86b44dbc62 "def\006sbtest\asbtest1\asbtest1\001c\001c\f\b", 
                packet_end=packet_end@entry=0x7f86b44dbc6c "\asbtest1\asbtest1\001c\001c\f\b", cursor_flags=0, bulk_op=bulk_op@entry=false, 
                read_types=read_types@entry=false) at /home/centos/mariadb-10.3/sql/sql_prepare.cc:3192
            #16 0x000055917fe7a1f3 in mysqld_stmt_execute (thd=thd@entry=0x7f86b40008c8, packet_arg=packet_arg@entry=0x7f86b44dbc59 "", 
                packet_length=packet_length@entry=19) at /home/centos/mariadb-10.3/sql/sql_prepare.cc:3090
            #17 0x000055917fe6a37f in dispatch_command (command=command@entry=COM_STMT_EXECUTE, thd=thd@entry=0x7f86b40008c8, packet=packet@entry=0x7f86b44dbc59 "", 
                packet_length=packet_length@entry=19, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false)
                at /home/centos/mariadb-10.3/sql/sql_parse.cc:1792
            #18 0x000055917fe6c29e in do_command (thd=0x7f86b40008c8) at /home/centos/mariadb-10.3/sql/sql_parse.cc:1395
            #19 0x000055917ff3123c in do_handle_one_connection (connect=connect@entry=0x559183ad9328) at /home/centos/mariadb-10.3/sql/sql_connect.cc:1402
            #20 0x000055917ff31354 in handle_one_connection (arg=0x559183ad9328) at /home/centos/mariadb-10.3/sql/sql_connect.cc:1308
            #21 0x00007f876ae00e25 in start_thread () from /lib64/libpthread.so.0
            #22 0x00007f8768dbabad in clone () from /lib64/libc.so.6
            

            #13 0x000055917fe7945f in Prepared_statement::execute (this=this@entry=0x7f86b407f818, expanded_query=expanded_query@entry=0x7f870585c390, 
                open_cursor=open_cursor@entry=false) at /home/centos/mariadb-10.3/sql/sql_prepare.cc:4764
            4764          error= mysql_execute_command(thd);
            (gdb) p thd
            $4 = (THD *) 0x7f86b40008c8
            (gdb) p thd->query_string
            $5 = {string = {str = 0x7f86b427ef98 "SELECT c FROM sbtest1 WHERE id=?", length = 32}, cs = 0x559180e218e0 <my_charset_latin1>}
            (gdb) detach
            

            psergei Sergei Petrunia added a comment - Typical stack trace (gdb) thread 2 [Switching to thread 2 (Thread 0x7f870585d700 (LWP 2094))] #0 0x00007f876ae04995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) wher #0 0x00007f876ae04995 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f87653fa02d in rocksdb::port::CondVar::Wait (this=this@entry=0x559182f793c8) at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/port/port_posix.cc:91 #2 0x00007f87653456a5 in rocksdb::InstrumentedCondVar::WaitInternal (this=this@entry=0x559182f793c8) at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/monitoring/instrumented_mutex.cc:48 #3 0x00007f8765345746 in rocksdb::InstrumentedCondVar::Wait (this=this@entry=0x559182f793c8) at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/monitoring/instrumented_mutex.cc:41 #4 0x00007f87652872ee in rocksdb::DBImpl::SyncWAL (this=0x559182f78de0) at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/db/db_impl.cc:797 #5 0x00007f8765280e34 in rocksdb::DBImpl::FlushWAL (this=0x559182f78de0, sync=<optimized out>) at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/db/db_impl.cc:780 #6 0x00007f87653e0434 in rocksdb::StackableDB::FlushWAL (this=<optimized out>, sync=<optimized out>) at /home/centos/mariadb-10.3/storage/rocksdb/rocksdb/include/rocksdb/utilities/stackable_db.h:286 #7 0x00007f876522433d in myrocks::rocksdb_commit (hton=<optimized out>, thd=0x7f86b40008c8, commit_tx=<optimized out>) at /home/centos/mariadb-10.3/storage/rocksdb/ha_rocksdb.cc:3763 #8 0x000055918002235d in commit_one_phase_2 (thd=thd@entry=0x7f86b40008c8, all=false, is_real_trans=true, trans=<optimized out>, trans=<optimized out>) at /home/centos/mariadb-10.3/sql/handler.cc:1614 #9 0x0000559180023dad in ha_commit_one_phase (thd=thd@entry=0x7f86b40008c8, all=all@entry=false) at /home/centos/mariadb-10.3/sql/handler.cc:1594 #10 0x0000559180024584 in ha_commit_trans (thd=thd@entry=0x7f86b40008c8, all=all@entry=false) at /home/centos/mariadb-10.3/sql/handler.cc:1456 #11 0x000055917ff3cd70 in trans_commit_stmt (thd=thd@entry=0x7f86b40008c8) at /home/centos/mariadb-10.3/sql/transaction.cc:520 #12 0x000055917fe5efc1 in mysql_execute_command (thd=<optimized out>) at /home/centos/mariadb-10.3/sql/sql_parse.cc:6359 #13 0x000055917fe7945f in Prepared_statement::execute (this=this@entry=0x7f86b407f818, expanded_query=expanded_query@entry=0x7f870585c390, open_cursor=open_cursor@entry=false) at /home/centos/mariadb-10.3/sql/sql_prepare.cc:4764 #14 0x000055917fe79576 in Prepared_statement::execute_loop (this=0x7f86b407f818, expanded_query=0x7f870585c390, open_cursor=<optimized out>, packet=<optimized out>, packet_end=<optimized out>) at /home/centos/mariadb-10.3/sql/sql_prepare.cc:4192 #15 0x000055917fe7a151 in mysql_stmt_execute_common (thd=thd@entry=0x7f86b40008c8, stmt_id=1, packet=packet@entry=0x7f86b44dbc62 "def\006sbtest\asbtest1\asbtest1\001c\001c\f\b", packet_end=packet_end@entry=0x7f86b44dbc6c "\asbtest1\asbtest1\001c\001c\f\b", cursor_flags=0, bulk_op=bulk_op@entry=false, read_types=read_types@entry=false) at /home/centos/mariadb-10.3/sql/sql_prepare.cc:3192 #16 0x000055917fe7a1f3 in mysqld_stmt_execute (thd=thd@entry=0x7f86b40008c8, packet_arg=packet_arg@entry=0x7f86b44dbc59 "", packet_length=packet_length@entry=19) at /home/centos/mariadb-10.3/sql/sql_prepare.cc:3090 #17 0x000055917fe6a37f in dispatch_command (command=command@entry=COM_STMT_EXECUTE, thd=thd@entry=0x7f86b40008c8, packet=packet@entry=0x7f86b44dbc59 "", packet_length=packet_length@entry=19, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/centos/mariadb-10.3/sql/sql_parse.cc:1792 #18 0x000055917fe6c29e in do_command (thd=0x7f86b40008c8) at /home/centos/mariadb-10.3/sql/sql_parse.cc:1395 #19 0x000055917ff3123c in do_handle_one_connection (connect=connect@entry=0x559183ad9328) at /home/centos/mariadb-10.3/sql/sql_connect.cc:1402 #20 0x000055917ff31354 in handle_one_connection (arg=0x559183ad9328) at /home/centos/mariadb-10.3/sql/sql_connect.cc:1308 #21 0x00007f876ae00e25 in start_thread () from /lib64/libpthread.so.0 #22 0x00007f8768dbabad in clone () from /lib64/libc.so.6 #13 0x000055917fe7945f in Prepared_statement::execute (this=this@entry=0x7f86b407f818, expanded_query=expanded_query@entry=0x7f870585c390, open_cursor=open_cursor@entry=false) at /home/centos/mariadb-10.3/sql/sql_prepare.cc:4764 4764 error= mysql_execute_command(thd); (gdb) p thd $4 = (THD *) 0x7f86b40008c8 (gdb) p thd->query_string $5 = {string = {str = 0x7f86b427ef98 "SELECT c FROM sbtest1 WHERE id=?", length = 32}, cs = 0x559180e218e0 <my_charset_latin1>} (gdb) detach
            psergei Sergei Petrunia added a comment - - edited

            Added this to my.cnf (the second line is actually irrelevant) and restarted

            rocksdb_flush_log_at_trx_commit=0
            rocksdb_override_cf_options='cf1={block_based_table_factory={filter_policy=bloomfilter:10:false;whole_key_filtering=0;};prefix_extractor=capped:8};'
            

            Running on bulk-loaded data:

            RocksDB
            n_threads   TPS (QPS is the same)
            100  225,305
            150  224,000
            200  219,875
            

            psergei Sergei Petrunia added a comment - - edited Added this to my.cnf (the second line is actually irrelevant) and restarted rocksdb_flush_log_at_trx_commit=0 rocksdb_override_cf_options='cf1={block_based_table_factory={filter_policy=bloomfilter:10:false;whole_key_filtering=0;};prefix_extractor=capped:8};' Running on bulk-loaded data: RocksDB n_threads TPS (QPS is the same) 100 225,305 150 224,000 200 219,875

            Trying current FB/MySQL (sync_binlog=1, binlog-format=row)

            n_threads TPS (QPS is the same)
            100 260,244
            150 258,392
            

            psergei Sergei Petrunia added a comment - Trying current FB/MySQL (sync_binlog=1, binlog-format=row) n_threads TPS (QPS is the same) 100 260,244 150 258,392

            The reason is a bug in group-commit-with-binlog code. We are flushing the WAL, even when the transaction didn't make any changes.
            (MyRocks has "if there were no changes commit should be a rollback" in myrocks::Rdb_transaction::commit. ) The problem is the FlushWal() call made from rocksdb_commit.

            psergei Sergei Petrunia added a comment - The reason is a bug in group-commit-with-binlog code. We are flushing the WAL, even when the transaction didn't make any changes. (MyRocks has "if there were no changes commit should be a rollback" in myrocks::Rdb_transaction::commit . ) The problem is the FlushWal() call made from rocksdb_commit.
            psergei Sergei Petrunia added a comment - - edited

            MariaDB, After the patch

             n_threads TPS (QPS is the same)
            100 223,913
            150 221,107
            200 217,329
            

            That is, it is better than InnoDB for me.

            psergei Sergei Petrunia added a comment - - edited MariaDB, After the patch n_threads TPS (QPS is the same) 100 223,913 150 221,107 200 217,329 That is, it is better than InnoDB for me.

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.