Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18080

Run MyRocks benchmark: MariaDB vs Percona Server vs FB/MySQL

Details

    Description

      I used AWS c5.2xlarge, 50Gb EBS ssd with 150 IOPs. The scripts to setup servers, sysbench, and run the benchmark are attached. (one only needs to edit my.cnf and start servers)

      Servers:

      • MariaDB 10.3 current, revision 2999492c3278528ceb9f37bd6cfca5ca5295ef9a
      • Percona Server 5.7 current, revision 6604e02a4ae73a8d542ba70e71ad91f2af4514cb
      • Facebook/MySQL-5.6 current, revision 5e398eab68dbf58312ab1544f0e42084552967e1

      Settings that were added to my.cnf:
      MariaDB:

      log_bin=1
      rocksdb_block_cache_size=2G
      binlog_format=row
      sync_binlog=1
      

      Percona Server:

      rocksdb_block_cache_size=2G
      log_bin=1
      

      Facebook/MySQL 5.6

      log-bin=pslp                                                                                                                                                    
      binlog-format=row                                                                                                                                               
      sync_binlog=1                                                                                                                                                   
      rocksdb_block_cache_size=2G                                                                                                                                     
      

      Sysbench prepare and run commands:

      sysbench /usr/share/sysbench/oltp_update_non_index.lua \
        --table-size=1000000 \
        --threads=$threads \
        --time=60 \
        --rand-type=uniform \
        --db-driver=mysql \
        --mysql-socket=/tmp/mysql20.sock \
        --mysql-user=root \
        --mysql_storage_engine=$engine \
        prepare
      

      sysbench /usr/share/sysbench/oltp_update_non_index.lua \
        --table-size=1000000 \
        --threads=$threads \
        --time=60 \
        --rand-type=uniform \
        --db-driver=mysql \
        --mysql-socket=/tmp/mysql20.sock \
        --mysql-user=root \
        --mysql_storage_engine=$engine \
        run 
      

      Results:

      Percona 5.7

      n_threads, qps
       20,  4117.79  
       50,  9487.79 
       80, 13952.85
      100, 16852.61
      150, 21942.59
      

      MariaDB 10.3

      n_threads, qps
       20,  3125.01
       50,  7494.81
       80, 11821.79
      100, 14749.30
      150, 20313.95
      

      FB/MySQL-5.6

      n_threads, qps
       20,  3291.02
       50,  7711.92
       80, 11394.20
      100, 13300.78
      150, 18795.42
      

      Attachments

        1. image-2018-12-26-12-14-32-900.png
          image-2018-12-26-12-14-32-900.png
          21 kB
        2. out-mariadb-10.3-rocksdb.log
          11 kB
        3. out-percona-5.7-rocksdb.log
          11 kB
        4. run-sysbench.sh
          2 kB
        5. screenshot-1.png
          screenshot-1.png
          19 kB
        6. screenshot-2.png
          screenshot-2.png
          17 kB
        7. screenshot-3.png
          screenshot-3.png
          35 kB
        8. setup-mariadb-current.sh
          1 kB
        9. setup-os-ubuntu.sh
          0.5 kB
        10. setup-percona-current.sh
          1 kB
        11. setup-sysbench-ubuntu.sh
          0.1 kB

        Issue Links

          Activity

            psergei Sergei Petrunia added a comment - - edited

            Trying on a patched version:

            MariaDB-10.3-patch1

             20,  6483.17
             50, 14635.17
             80, 19708.48
            100, 23949.44
            150, 32377.06
            

            This is on par with other branches.

            psergei Sergei Petrunia added a comment - - edited Trying on a patched version: MariaDB-10.3-patch1 20, 6483.17 50, 14635.17 80, 19708.48 100, 23949.44 150, 32377.06 This is on par with other branches.
            psergei Sergei Petrunia added a comment - - edited

            The code that is causing slowdown here was introduced in MDEV-15372.

            That MDEV was fixing the performance of multi-threaded slave (non-XA variant of it). The slave wants to make commits in the same order as the master does, the idea was to let the transactions run, but then commit them (call rocksdb_commit) in their order on the master.

            This caused them to be serialized. The way to un-serialize them was mimicking InnoDB, and it was:

              tx->set_sync(false);
              tx->commit(); // this establishes the commit order. It is serialized but it does not flush
             
              // this notifies the SQL layer that subsequent transactions can run:
              thd_wakeup_subsequent_commits(thd, 0);
             
              // this makes the changes persistent:
               rocksdb::Status s= rdb->FlushWAL(true);
            

            psergei Sergei Petrunia added a comment - - edited The code that is causing slowdown here was introduced in MDEV-15372 . That MDEV was fixing the performance of multi-threaded slave (non-XA variant of it). The slave wants to make commits in the same order as the master does, the idea was to let the transactions run, but then commit them (call rocksdb_commit) in their order on the master. This caused them to be serialized. The way to un-serialize them was mimicking InnoDB, and it was: tx->set_sync( false ); tx->commit(); // this establishes the commit order. It is serialized but it does not flush   // this notifies the SQL layer that subsequent transactions can run: thd_wakeup_subsequent_commits(thd, 0);   // this makes the changes persistent: rocksdb::Status s= rdb->FlushWAL( true );
            psergei Sergei Petrunia added a comment - - edited

            I'm not sure why did this change fix the performance back then but is killing it now. Maybe, something has changed inside RocksDB? (looks like no)

            psergei Sergei Petrunia added a comment - - edited I'm not sure why did this change fix the performance back then but is killing it now. Maybe, something has changed inside RocksDB? (looks like no)
            psergei Sergei Petrunia added a comment - - edited

            Disable this for non-slave threads

            An obvious thing to do is to disable the new code for non-slave threads (see THD::slave_thread).

            Possible solutions for slave threads:

            Hook in RocksDB

            Add a hook inside RocksDB somewhere to call thd_wakeup_subsequent_commits().
            roblems: there doesn't seem to be any hook for this, so adding it will
            require A) finding the right place and B) convincing RocksDB to accept a PR
            with a hook.

            Non-durable mode for the slave commits

            Transactions on the slave come from the binary log, so it is not an issue if
            some of them are lost in a crash. They can be replayed from the relay log.

            (We only need a guarantee that if transaction #N disappears, then all
            subsequent transactions disappear as well. I think we have this property:
            writes to RocksDB WAL are done sequentially. Failing to flush may truncate the
            WAL, but will not create "gaps" in it)

            One thing to check: when the slave thinks it has applied all events from
            a relay log file, it may remove that relay log file. But what if the storage
            engine has not persisted the transactions from that that log file yet
            (assuming they can be replayed)? Can this situation happen and if yes can it
            be prevented (e.g. have MyRocks flush its WAL before a relay log file is
            removed)?

            Use XA-mode for slave threads.

            (TODO: this looked like a solution but now I'm trying to describe it and
            it's not obvious how to achieve both performance and safety?)

            psergei Sergei Petrunia added a comment - - edited Disable this for non-slave threads An obvious thing to do is to disable the new code for non-slave threads (see THD::slave_thread ). Possible solutions for slave threads: Hook in RocksDB Add a hook inside RocksDB somewhere to call thd_wakeup_subsequent_commits(). roblems: there doesn't seem to be any hook for this, so adding it will require A) finding the right place and B) convincing RocksDB to accept a PR with a hook. Non-durable mode for the slave commits Transactions on the slave come from the binary log, so it is not an issue if some of them are lost in a crash. They can be replayed from the relay log. (We only need a guarantee that if transaction #N disappears, then all subsequent transactions disappear as well. I think we have this property: writes to RocksDB WAL are done sequentially. Failing to flush may truncate the WAL, but will not create "gaps" in it) One thing to check: when the slave thinks it has applied all events from a relay log file, it may remove that relay log file. But what if the storage engine has not persisted the transactions from that that log file yet (assuming they can be replayed)? Can this situation happen and if yes can it be prevented (e.g. have MyRocks flush its WAL before a relay log file is removed)? Use XA-mode for slave threads. (TODO: this looked like a solution but now I'm trying to describe it and it's not obvious how to achieve both performance and safety?)

            Taking MariaDB 10.2 as a base, cset 0623cc7c16c3280d1f81b9049e1561d1b4b6c1d0.

            Developed a patch to disable the MDEV-15372 code for non-slave threads. Trying it on a c5.2xlarge instance, with log-bin off, other settings being default:

            n_threads	MariaDB-102-cur	MariaDB-10.2-patched
            20	648.16	6195.19
            50	630.57	13620.29
            80	594.41	18855.85
            100	599.48	22311.91
            150	670.35	30937.94
            

            psergei Sergei Petrunia added a comment - Taking MariaDB 10.2 as a base, cset 0623cc7c16c3280d1f81b9049e1561d1b4b6c1d0. Developed a patch to disable the MDEV-15372 code for non-slave threads. Trying it on a c5.2xlarge instance, with log-bin off, other settings being default: n_threads MariaDB-102-cur MariaDB-10.2-patched 20 648.16 6195.19 50 630.57 13620.29 80 594.41 18855.85 100 599.48 22311.91 150 670.35 30937.94

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.