Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16589

default value for sync_binlog should be the safer value 1 instead of 0

Details

    • New Feature
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • Replication, Server
    • None

    Description

      The default variable for sync_binlog is 0 for MariaDB, where Oracle changed it to 1 starting with 5.7.7. I think we should also change the default to 1, as running a master with sync_binlog=0 is risky - any crash of server or mysqld will create inconsistent slaves 99% of the time.

      Attachments

        1. sysbench.pdf
          21 kB
        2. group_commit_benchmark.png
          group_commit_benchmark.png
          75 kB
        3. innodb_binlog.png
          innodb_binlog.png
          49 kB
        4. innodb_binlog_on_ssd.png
          innodb_binlog_on_ssd.png
          45 kB

        Issue Links

          Activity

            rpizzi Rick Pizzi (Inactive) created issue -
            elenst Elena Stepanova made changes -
            Field Original Value New Value
            Affects Version/s 10.2.15 [ 23006 ]
            Issue Type Bug [ 1 ] Task [ 3 ]
            jeanfrancois.gagne Jean-François Gagné added a comment - IMHO, this should be a priority Major: a database without the D of ACID is not a "real" database. https://jfg-mysql.blogspot.com/2018/10/consequences-sync-binlog-neq-1-part-1.html https://fosdem.org/2019/schedule/event/sync_binlog_use_default/ https://twitter.com/jfg956/status/1081697022267351040
            serg Sergei Golubchik made changes -
            Priority Minor [ 4 ] Major [ 3 ]
            serg Sergei Golubchik made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            Elkin Andrei Elkin made changes -
            marko Marko Mäkelä made changes -
            ralf.gebhardt Ralf Gebhardt made changes -

            I think that we should run benchmarks to determine the performance impact of sync_binlog=1. Maybe it is insignificant enough on SSD and with group commit, so that we can enable this setting by default?

            marko Marko Mäkelä added a comment - I think that we should run benchmarks to determine the performance impact of sync_binlog=1 . Maybe it is insignificant enough on SSD and with group commit, so that we can enable this setting by default?
            marko Marko Mäkelä made changes -
            Assignee Axel Schwenke [ axel ]
            Elkin Andrei Elkin added a comment -

            to the benchmarking.

            Elkin Andrei Elkin added a comment - to the benchmarking.

            It is not a question of benchmark / speed, it is a question of trust in the database. IMHO, a database should ship "safe" by default, and now it is not the case with MariaDB with sync_binlog = 0 in the default configuration.

            There will always be situations where running safe configuration will be slower. With HDD, the latency of a sync is ~10ms, which is slow, but this is not a reason for unsafe configuration. With a RAID cache, sync latency is less than 1ms, ans it is the same with SSD, but Cloud / Network storage is bringing back this latency to 1ms. If a DBA wants better performance, he can change the sync_binlog and trx_commit parameters, but this has consequence that I detailed in [1], [2] and [3].

            [1]: https://jfg-mysql.blogspot.com/2018/10/consequences-sync-binlog-neq-1-part-1.html

            [2]: https://archive.fosdem.org/2020/schedule/event/sync_binlog/

            [3]: https://www.slideshare.net/JeanFranoisGagn/the-consequences-of-syncbinlog-1

            Please ship MariaDB with a safe default configuration, which means sync_binlog = 1.

            jeanfrancois.gagne Jean-François Gagné added a comment - It is not a question of benchmark / speed, it is a question of trust in the database. IMHO, a database should ship "safe" by default, and now it is not the case with MariaDB with sync_binlog = 0 in the default configuration. There will always be situations where running safe configuration will be slower. With HDD, the latency of a sync is ~10ms, which is slow, but this is not a reason for unsafe configuration. With a RAID cache, sync latency is less than 1ms, ans it is the same with SSD, but Cloud / Network storage is bringing back this latency to 1ms. If a DBA wants better performance, he can change the sync_binlog and trx_commit parameters, but this has consequence that I detailed in [1] , [2] and [3] . [1] : https://jfg-mysql.blogspot.com/2018/10/consequences-sync-binlog-neq-1-part-1.html [2] : https://archive.fosdem.org/2020/schedule/event/sync_binlog/ [3] : https://www.slideshare.net/JeanFranoisGagn/the-consequences-of-syncbinlog-1 Please ship MariaDB with a safe default configuration, which means sync_binlog = 1.
            Elkin Andrei Elkin added a comment - - edited

            jeanfrancois.gagne, thanks for your comments and compilation of valuable analysis! At my endorsement I actually meant MDEV-18959
            (that aims at overcoming `sync_binlog=1 && innodb_flush_log_at_trx_commit=1` as the only safe configuration).
            I'd personally agree to change to `sync_binlog=1`, but since there's some legacy involved it should
            be widely discussed in engineering and support.

            So we've actually started in that...

            Elkin Andrei Elkin added a comment - - edited jeanfrancois.gagne , thanks for your comments and compilation of valuable analysis! At my endorsement I actually meant MDEV-18959 (that aims at overcoming `sync_binlog=1 && innodb_flush_log_at_trx_commit=1` as the only safe configuration). I'd personally agree to change to `sync_binlog=1`, but since there's some legacy involved it should be widely discussed in engineering and support. So we've actually started in that...

            I insist that we should either ship with sync_binlog=1, or clearly state that we are not ACID compliant unless the setting is changed...

            rpizzi Rick Pizzi (Inactive) added a comment - I insist that we should either ship with sync_binlog=1, or clearly state that we are not ACID compliant unless the setting is changed...
            serg Sergei Golubchik made changes -
            Fix Version/s 10.6 [ 24028 ]
            serg Sergei Golubchik made changes -
            Assignee Axel Schwenke [ axel ] Andrei Elkin [ elkin ]

            let's change it in 10.6, unless there're convincing reasons not to

            serg Sergei Golubchik added a comment - let's change it in 10.6, unless there're convincing reasons not to
            sujatha.sivakumar Sujatha Sivakumar (Inactive) made changes -
            Assignee Andrei Elkin [ elkin ] Sujatha Sivakumar [ sujatha.sivakumar ]
            sujatha.sivakumar Sujatha Sivakumar (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            sujatha.sivakumar Sujatha Sivakumar (Inactive) made changes -
            Assignee Sujatha Sivakumar [ sujatha.sivakumar ] Axel Schwenke [ axel ]
            julien.fritsch Julien Fritsch made changes -
            julien.fritsch Julien Fritsch made changes -
            Assignee Axel Schwenke [ axel ] Sujatha Sivakumar [ sujatha.sivakumar ]
            sujatha.sivakumar Sujatha Sivakumar (Inactive) made changes -
            Attachment sysbench.pdf [ 56413 ]
            danblack Daniel Black added a comment -

            Nice graphs sujatha.sivakumar. So 8-16 threads the throughput is higher. Still suffering on latency, particularly insert.

            https://mariadb.org/fest2020/ssd/ at 10:48 offset - talking about fsync (redo, but same applies to binlog), that each fsync can be on the same data sector. Aligning every binlog unit to a beginning of a new 4k (discoverable fstat - blksize) block on disk after a fsync acceptable/show gains? And/or piggy back on the io_uring (MDEV-24883) implementation to have the kernel processing both binlog and other fsyncs for a transaction at the same tiem.

            danblack Daniel Black added a comment - Nice graphs sujatha.sivakumar . So 8-16 threads the throughput is higher. Still suffering on latency, particularly insert. https://mariadb.org/fest2020/ssd/ at 10:48 offset - talking about fsync (redo, but same applies to binlog), that each fsync can be on the same data sector. Aligning every binlog unit to a beginning of a new 4k (discoverable fstat - blksize) block on disk after a fsync acceptable/show gains? And/or piggy back on the io_uring ( MDEV-24883 ) implementation to have the kernel processing both binlog and other fsyncs for a transaction at the same tiem.
            julien.fritsch Julien Fritsch made changes -
            julien.fritsch Julien Fritsch made changes -
            clieu Christine Lieu (Inactive) made changes -
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.6 [ 24028 ]
            serg Sergei Golubchik made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.7 [ 24805 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 88072 ] MariaDB v4 [ 131826 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.8 [ 26121 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.9 [ 26905 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Assignee Sujatha Sivakumar [ sujatha.sivakumar ] Andrei Elkin [ elkin ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 10.10 [ 27530 ]
            AirFocus AirFocus made changes -
            Description The default variable for sync_binlog is 0 for MariaDB, where Oracle changed it to 1 starting with 5.7.7. I think we should also change the default to 1, as running a master with sync_binlog=0 is risky - any crash of server or mysqld will create inconsistent slaves 99% of the time.
            The default variable for sync_binlog is 0 for MariaDB, where Oracle changed it to 1 starting with 5.7.7. I think we should also change the default to 1, as running a master with sync_binlog=0 is risky \- any crash of server or mysqld will create inconsistent slaves 99% of the time.
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.12 [ 28320 ]
            Fix Version/s 10.11 [ 27614 ]
            bnestere Brandon Nesterenko made changes -
            Attachment group_commit_benchmark.png [ 67225 ]

            On a benchmark which aims to isolate the transaction commits alone, and varying different group commit parameters (i.e., binlog_commit_wait_count, binlog_commit_wait_usec, innodb_flush_log_at_trx_commit, sync_binlog, and concurrent connection count), we have the following results:

            bnestere Brandon Nesterenko added a comment - On a benchmark which aims to isolate the transaction commits alone, and varying different group commit parameters (i.e., binlog_commit_wait_count, binlog_commit_wait_usec, innodb_flush_log_at_trx_commit, sync_binlog, and concurrent connection count), we have the following results:
            bnestere Brandon Nesterenko made changes -
            Attachment innodb_binlog.png [ 67342 ]

            Added a new benchmark result prototyping the use of an innodb table to serve as the binlog (with the normal binary log disabled). Tested against various modes of flushing the binary log. In the legend, b stands for sync_binlog, and i stands for innodb_flush_log_at_trx_commit.

            bnestere Brandon Nesterenko added a comment - Added a new benchmark result prototyping the use of an innodb table to serve as the binlog (with the normal binary log disabled). Tested against various modes of flushing the binary log. In the legend, b stands for sync_binlog, and i stands for innodb_flush_log_at_trx_commit.
            bnestere Brandon Nesterenko made changes -
            Attachment innodb_binlog_on_ssd.png [ 67361 ]

            Running the innodb table prototype benchmark on an SSD (rather than ramdisk and WITH_PMEM) shows that with low concurrency, the innodb binary log implementation is able to have higher performance than the current file implementation; however, with more concurrency, the methods switch. Will do further analysis into the time spent, along with more benchmarks comparing differing binlog group commit parameters.

            bnestere Brandon Nesterenko added a comment - Running the innodb table prototype benchmark on an SSD (rather than ramdisk and WITH_PMEM) shows that with low concurrency, the innodb binary log implementation is able to have higher performance than the current file implementation; however, with more concurrency, the methods switch. Will do further analysis into the time spent, along with more benchmarks comparing differing binlog group commit parameters.
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.0 [ 28320 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Component/s Replication [ 10100 ]
            Fix Version/s 11.3 [ 28565 ]
            Fix Version/s 11.1 [ 28549 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 11.4 [ 29301 ]
            Fix Version/s 11.3 [ 28565 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 11.5 [ 29506 ]
            Fix Version/s 11.4 [ 29301 ]
            julien.fritsch Julien Fritsch made changes -
            Issue Type Task [ 3 ] New Feature [ 2 ]
            julien.fritsch Julien Fritsch made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 11.6 [ 29515 ]
            Fix Version/s 11.5 [ 29506 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 11.7 [ 29815 ]
            Fix Version/s 11.6 [ 29515 ]

            As far as I understand, implementing MDEV-34705 would significantly reduce the impact of setting sync_binlog. It should also make sync_binlog=0 (which would effectively be an alias of innodb_flush_log_at_trx_commit=0) crash-safe in a non-DDL workload. Yes, you might lose some latest committed transactions, but the binlog would be in sync with the storage engine.

            For any crash safety, DDL operations seem to require fdatasync() or fsync(), as long as a separate ddl_recovery.log file is being maintained.

            marko Marko Mäkelä added a comment - As far as I understand, implementing MDEV-34705 would significantly reduce the impact of setting sync_binlog . It should also make sync_binlog=0 (which would effectively be an alias of innodb_flush_log_at_trx_commit=0 ) crash-safe in a non-DDL workload. Yes, you might lose some latest committed transactions, but the binlog would be in sync with the storage engine. For any crash safety, DDL operations seem to require fdatasync() or fsync() , as long as a separate ddl_recovery.log file is being maintained.
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 11.8 [ 29921 ]
            Fix Version/s 11.7 [ 29815 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 11.8 [ 29921 ]

            People

              Elkin Andrei Elkin
              rpizzi Rick Pizzi (Inactive)
              Votes:
              7 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.