Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11900

Run benchmarks with sync-binlog before it gets enabled in 10.2 by default

Details

    Description

      We are about to enable sync-binlog by default in 10.2. Please check how it affects performance on some standard read-write load with binary log enabled. No need for accurate numbers, but we should have at least a rough picture.

      Attachments

        Activity

          elenst Elena Stepanova created issue -
          axel Axel Schwenke made changes -
          Field Original Value New Value
          Status Open [ 1 ] In Progress [ 3 ]
          axel Axel Schwenke made changes -
          Status In Progress [ 3 ] Stalled [ 10000 ]
          axel Axel Schwenke made changes -
          Attachment sync_vs_async_HDD.ods [ 43268 ]
          Attachment sync_vs_async_SSD.ods [ 43269 ]
          axel Axel Schwenke added a comment -

          I've run a series of OLTP benchmarks with varying write intensity (10, 22, 80 and 100% writes). I tested 3 different log settings:

          1. innodb_flush_log_at_trx_commit=0, sync-binlog=0
          2. innodb_flush_log_at_trx_commit=1, sync-binlog=0
          3. innodb_flush_log_at_trx_commit=1, sync-binlog=1

          Naturally there is a big difference between 1. and 2., but also between 2. and 3. Here I see a performance drop between 10% and 50%. The effect is more pronounced when the datadir resides on a hard (rotating) disk. But even on a (fast) SSD the effect is clearly visible. For details see the attached spread sheets.

          I tested MariaDB 10.1 and 10.2 and also MySQL 5.7. The numbers are in the same ballpark for all 3 of them with some slight advantage for 10.2. This however is no reason to celebrate, because in absolute numbers 10.2 comes out at the bottom.

          I also spotted an irregularity here: it seems that MariaDB 10.2 does two fsyncs per commit on the InnoDB redo log as soon as one sets innodb_flush_log_at_trx_commit=1 and enables the binlog (no matter if sync-binlog=0 or 1). This is someting to investigate in a separate task.

          I also checked the bytes written per transaction to the redo log and the binlog. There is a moderate increase of ~6 bytes per binlog event in 10.2 compared to 10.1. No numbers here for MySQL (it does not have a status variable for binlog writes).

          axel Axel Schwenke added a comment - I've run a series of OLTP benchmarks with varying write intensity (10, 22, 80 and 100% writes). I tested 3 different log settings: innodb_flush_log_at_trx_commit=0, sync-binlog=0 innodb_flush_log_at_trx_commit=1, sync-binlog=0 innodb_flush_log_at_trx_commit=1, sync-binlog=1 Naturally there is a big difference between 1. and 2., but also between 2. and 3. Here I see a performance drop between 10% and 50%. The effect is more pronounced when the datadir resides on a hard (rotating) disk. But even on a (fast) SSD the effect is clearly visible. For details see the attached spread sheets. I tested MariaDB 10.1 and 10.2 and also MySQL 5.7. The numbers are in the same ballpark for all 3 of them with some slight advantage for 10.2. This however is no reason to celebrate, because in absolute numbers 10.2 comes out at the bottom. I also spotted an irregularity here: it seems that MariaDB 10.2 does two fsyncs per commit on the InnoDB redo log as soon as one sets innodb_flush_log_at_trx_commit=1 and enables the binlog (no matter if sync-binlog=0 or 1). This is someting to investigate in a separate task. I also checked the bytes written per transaction to the redo log and the binlog. There is a moderate increase of ~6 bytes per binlog event in 10.2 compared to 10.1. No numbers here for MySQL (it does not have a status variable for binlog writes).

          I'd say 50% is not too bad, that's something that I would initially expect, my results were much worse (but they aren't reliable).

          I don't see it mentioned anywhere, did you run with the default statement binlog_format, or were there different ones? I wonder if it might make a difference.
          Our new default is expected to be mixed, which can easily switch to row.

          elenst Elena Stepanova added a comment - I'd say 50% is not too bad, that's something that I would initially expect, my results were much worse (but they aren't reliable). I don't see it mentioned anywhere, did you run with the default statement binlog_format, or were there different ones? I wonder if it might make a difference. Our new default is expected to be mixed, which can easily switch to row.
          serg Sergei Golubchik made changes -
          Fix Version/s N/A [ 14700 ]
          Fix Version/s 10.2 [ 14601 ]
          Resolution Fixed [ 1 ]
          Status Stalled [ 10000 ] Closed [ 6 ]
          psergei Sergei Petrunia made changes -
          Comment [ Looking at how "Prepare" operation for the storage engine is persisted on disk.

          In MariaDB, each Prepare operation *tries to flush the logs itself* :

          {noformat}
            #0 log_write_up_to (lsn=1908254, flush_to_disk=true) at /home/psergey/dev-git/10.2-mariarocks/storage/innobase/log/log0log.cc:1234
            #1 0x00005555562f5c08 in trx_flush_log_if_needed_low (lsn=1908254) at /home/psergey/dev-git/10.2-mariarocks/storage/innobase/trx/trx0trx.cc:1600
            #2 0x00005555562f5c51 in trx_flush_log_if_needed (lsn=1908254, trx=0x7fffe64c1878) at /home/psergey/dev-git/10.2-mariarocks/storage/innobase/trx/trx0trx.cc:1622
            #3 0x00005555562f8fd2 in trx_prepare (trx=0x7fffe64c1878) at /home/psergey/dev-git/10.2-mariarocks/storage/innobase/trx/trx0trx.cc:2774
            #4 0x00005555562f9084 in trx_prepare_for_mysql (trx=0x7fffe64c1878) at /home/psergey/dev-git/10.2-mariarocks/storage/innobase/trx/trx0trx.cc:2796
            #5 0x000055555610cc78 in innobase_xa_prepare (hton=0x5555579371c0, thd=0x7fff78000b00, prepare_trx=false) at /home/psergey/dev-git/10.2-mariarocks/storage/innobase/handler/ha_innodb.cc:17869
            #6 0x0000555555d7923c in prepare_or_error (ht=0x5555579371c0, thd=0x7fff78000b00, all=false) at /home/psergey/dev-git/10.2-mariarocks/sql/handler.cc:1146
            #7 0x0000555555d79adf in ha_commit_trans (thd=0x7fff78000b00, all=false) at /home/psergey/dev-git/10.2-mariarocks/sql/handler.cc:1425
            #8 0x0000555555c60990 in trans_commit_stmt (thd=0x7fff78000b00) at /home/psergey/dev-git/10.2-mariarocks/sql/transaction.cc:510
            #9 0x0000555555b0c3ef in mysql_execute_command (thd=0x7fff78000b00) at /home/psergey/dev-git/10.2-mariarocks/sql/sql_parse.cc:6261
            #10 0x0000555555b10aa6 in mysql_parse (thd=0x7fff78000b00, rawbuf=0x7fff780111f8 "insert into t20 values (2)", length=26, parser_state=0x7ffff4463210, is_com_multi=false, is_next_command=false) at /home/psergey/dev-git/10.2-mariarocks/sql/sql_parse.cc:7886
          {noformat}

          Concurrent Prepare operations can get grouped (one Prepare will make a flush for many), this is achieved by relying on InnoDB's native group commit mechanism (which predates XA group commit):

          See this comment in trx_prepare:
          {noformat}
          The idea in InnoDB's group prepare is that a group of
          transactions gather behind a trx doing a physical disk write
          to log files, and when that physical write has been completed,
          one of those transactions does a write which prepares the whole
          group. Note that this group prepare will only bring benefit if
          there are > 2 users in the database. Then at least 2 users can
          gather behind one doing the physical log write to disk.
          {noformat}

          In fb/mysql-5.6, it works differently.

          InnoDB's Prepare calls thd_requested_durability() which returns HA_IGNORE_DURABILITY, and flushing is not done:

          {noformat}
          (gdb) wher
            #0 thd_requested_durability (thd=0x2bb4cd0) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/storage/innobase/handler/ha_innodb.cc:1830
            #1 0x000000000150c35a in trx_prepare (trx=0x7fffe0d01e60, async=0) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/storage/innobase/trx/trx0trx.cc:2365
            #2 0x000000000150c421 in trx_prepare_for_mysql (trx=0x7fffe0d01e60, async=0) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/storage/innobase/trx/trx0trx.cc:2409
            #3 0x000000000139bea8 in innobase_xa_prepare (hton=0x2949ad0, thd=0x2bb4cd0, prepare_trx=false, async=false) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/storage/innobase/handler/ha_innodb.cc:15170
            #4 0x0000000000d85aa5 in ha_prepare_low (thd=0x2bb4cd0, all=false, async=false) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/sql/handler.cc:2264
            #5 0x0000000001237ee0 in MYSQL_BIN_LOG::prepare (this=0x2744000 <mysql_bin_log>, thd=0x2bb4cd0, all=false, async=false) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/sql/binlog.cc:7054
            #6 0x0000000000d83c9c in ha_commit_trans (thd=0x2bb4cd0, all=false, async=false, ignore_global_read_lock=false) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/sql/handler.cc:1540
            #7 0x000000000106f0a3 in trans_commit_stmt (thd=0x2bb4cd0, async=false) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/sql/transaction.cc:452
            #8 0x0000000000f8cf3f in mysql_execute_command (thd=0x2bb4cd0, statement_start_time=0x7ffff430aaa8, post_parse=0x7ffff430ac00) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/sql/sql_parse.cc:6254
            #9 0x0000000000f90913 in mysql_parse (thd=0x2bb4cd0, rawbuf=0x7fffa4005680 "insert into t21 values (4)", length=26, parser_state=0x7ffff430b4b0, last_timer=0x7ffff430ac00, async_commit=0x7ffff430abcc "") at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/sql/sql_parse.cc:7770

          (gdb) fini
            Run till exit from #0 thd_requested_durability (thd=0x2bb4cd0) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/storage/innobase/handler/ha_innodb.cc:1830
            0x000000000150c35a in trx_prepare (trx=0x7fffe0d01e60, async=0) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/storage/innobase/trx/trx0trx.cc:2365
            Value returned is $17 = HA_IGNORE_DURABILITY
          {noformat}

          and then the SQL layer makes a special call to make the effects of all Prepare transactions persistent:

          {noformat}
            Breakpoint 3, ha_flush_logs (db_type=0x0, engine_map=0x7fffa40086b0) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/sql/handler.cc:2452
          (gdb) wher
            #0 ha_flush_logs (db_type=0x0, engine_map=0x7fffa40086b0) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/sql/handler.cc:2452
            #1 0x0000000001238999 in MYSQL_BIN_LOG::process_flush_stage_queue (this=0x2744000 <mysql_bin_log>, total_bytes_var=0x7ffff4308468, rotate_var=0x7ffff430844f, out_queue_var=0x7ffff4308470, async=false) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/sql/binlog.cc:7322
            #2 0x0000000001239b85 in MYSQL_BIN_LOG::ordered_commit (this=0x2744000 <mysql_bin_log>, thd=0x2bb4cd0, all=false, skip_commit=false, async=false) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/sql/binlog.cc:7891
            #3 0x00000000012385fd in MYSQL_BIN_LOG::commit (this=0x2744000 <mysql_bin_log>, thd=0x2bb4cd0, all=false, async=false) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/sql/binlog.cc:7223
            #4 0x0000000000d83cda in ha_commit_trans (thd=0x2bb4cd0, all=false, async=false, ignore_global_read_lock=false) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/sql/handler.cc:1542
            #5 0x000000000106f0a3 in trans_commit_stmt (thd=0x2bb4cd0, async=false) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/sql/transaction.cc:452
            #6 0x0000000000f8cf3f in mysql_execute_command (thd=0x2bb4cd0, statement_start_time=0x7ffff430aaa8, post_parse=0x7ffff430ac00) at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/sql/sql_parse.cc:6254
            #7 0x0000000000f90913 in mysql_parse (thd=0x2bb4cd0, rawbuf=0x7fffa4005680 "insert into t21 values (4)", length=26, parser_state=0x7ffff430b4b0, last_timer=0x7ffff430ac00, async_commit=0x7ffff430abcc "") at /home/psergey/dev-git/mysql-5.6-rocksdb-look400/sql/sql_parse.cc:7770
          {noformat}
          ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 79351 ] MariaDB v4 [ 133104 ]

          People

            axel Axel Schwenke
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.