Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16333

Slow replication after upgrade from 10.1.33 to 10.3.7

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 10.3.7
    • 10.3.7
    • Replication
    • centos7 64bit

    Description

      We have 2 identical slaves running, same hardware and same software.
      After upgrading one of the slaves from 10.1.33 to 10.3.7 (now 48hours ago) the upgraded slave (which is set to maintenance mode, so is not receiving extra work on top of the replication) can not follow the master server. its behind and the backlog is constantly increasing.

      Both servers have the exact same hardware and work fine in 10.1.33.
      They are not cpu and not io bound.
      Those servers never ran 10.2
      99% of the tables are innodb

      attached is a config diff for a show variables.
      If anything else is needed, let me know

      Attachments

        Activity

          Maikel Punie Maikel Punie added a comment -

          this was not reproduceable after a few weeks, so not sure what happend, this report can be closed

          Maikel Punie Maikel Punie added a comment - this was not reproduceable after a few weeks, so not sure what happend, this report can be closed

          let's give it one more month, just to be sure.
          If we won't hear from you during that time, we'll close it.

          serg Sergei Golubchik added a comment - let's give it one more month, just to be sure. If we won't hear from you during that time, we'll close it.
          david_schumacher david schumacher added a comment - - edited

          hi all and @Maikel Punie ,

          i experienced a comparable behaviour when switching from MySQL 5.1 to MariaDB 10.3.17-0+deb10u1 (Debian 10 Debian 9), kernel 4.9.110-3+deb9u6.
          replication with MariaDB 10.3.17 or 10.1.37-MariaDB-0+deb9u1 was multiple times slower on SSDs than before using MySQL 5.1 on spinning discs.

          as a workaround i changed table mysql.gtid_slave_pos from engine=innodb to engine=myisam on the slave (vs-db-misc-multidb-02) and replication was fast as normal immediately, see graph of the behaviour while a hourly cronjob inserts data:

          just guessing at the moment: there is something slowing down innodb-transactions (in general or only for gtid_slave_pos?) dramatically.
          have not done more research on this yet.

          environment:
          master: vs-db-misc-multidb-01
          slave: vs-db-misc-multidb-02
          100% MyISAM (exception before the change: mysql.gtid_slave_pos)
          MariaDB 10.3.17-0+deb10u1 inside LXC (host and guest: Debian 10)
          innodb-config: debian defaults
          myisam-config: key_buffer_size = 4Gb

          parallel-replication specific settings (experiemented with different settings, changed more or less nothing regarding replication speed while gtid_slave_pos was innodb):
          {{MariaDB [(none)]> show variables like '%parallel%';
          -----------------------------------------+

          Variable_name Value

          -----------------------------------------+

          skip_parallel_replication OFF
          slave_domain_parallel_threads 20
          slave_parallel_max_queued 32001072 (increased until "Waiting for room in worker thread event queue" vanished from "Slave_SQL_Running_State")
          slave_parallel_mode aggressive
          slave_parallel_threads 20
          slave_parallel_workers 20

          -----------------------------------------+}}

          any idea which information can be helpful or what to try to get helpful insights ?

          thanks & regards,
          david

          david_schumacher david schumacher added a comment - - edited hi all and @Maikel Punie , i experienced a comparable behaviour when switching from MySQL 5.1 to MariaDB 10.3.17-0+deb10u1 ( Debian 10 Debian 9), kernel 4.9.110-3+deb9u6. replication with MariaDB 10.3.17 or 10.1.37-MariaDB-0+deb9u1 was multiple times slower on SSDs than before using MySQL 5.1 on spinning discs. as a workaround i changed table mysql.gtid_slave_pos from engine=innodb to engine=myisam on the slave (vs-db-misc-multidb-02) and replication was fast as normal immediately, see graph of the behaviour while a hourly cronjob inserts data: just guessing at the moment: there is something slowing down innodb-transactions (in general or only for gtid_slave_pos?) dramatically. have not done more research on this yet. environment: master: vs-db-misc-multidb-01 slave: vs-db-misc-multidb-02 100% MyISAM (exception before the change: mysql.gtid_slave_pos) MariaDB 10.3.17-0+deb10u1 inside LXC (host and guest: Debian 10) innodb-config: debian defaults myisam-config: key_buffer_size = 4Gb parallel-replication specific settings (experiemented with different settings, changed more or less nothing regarding replication speed while gtid_slave_pos was innodb): {{MariaDB [(none)] > show variables like '%parallel%'; ------------------------------ -----------+ Variable_name Value ------------------------------ -----------+ skip_parallel_replication OFF slave_domain_parallel_threads 20 slave_parallel_max_queued 32001072 (increased until "Waiting for room in worker thread event queue" vanished from "Slave_SQL_Running_State") slave_parallel_mode aggressive slave_parallel_threads 20 slave_parallel_workers 20 ------------------------------ -----------+}} any idea which information can be helpful or what to try to get helpful insights ? thanks & regards, david
          Elkin Andrei Elkin added a comment -

          david_schumacher: what table type the replicated inserts go into?

          Notice that aggressive may involve "internal" rollback and then the MyISAM type of gtid_slave_pos would fire back with a duplicate key error at replaying.
          If it's pure insert load no rollback can occur though.

          Elkin Andrei Elkin added a comment - david_schumacher : what table type the replicated inserts go into? Notice that aggressive may involve "internal" rollback and then the MyISAM type of gtid_slave_pos would fire back with a duplicate key error at replaying. If it's pure insert load no rollback can occur though.
          david_schumacher david schumacher added a comment - - edited

          hi,

          i tested using blackhole, myisam and innodb for the insert-table, and even with blackhole, replication is extremely delayed as long gtid_slave_pos uses innodb.

          important correction, sorry for that ... maybe this behaviour is completely unrelated to maikels findings .. or maikel upgraded his kernel in the meantime:
          i just noticed the host-kernels are different (now corrected in my first post): both slaves on different physical servers are using 4.9.110-3+deb9u6 (Debian 9), while master uses 4.19.67-2 (Debian 10). hardware of all three servers is the same (4 sata-SSDs in md-raid10, xfs with discard mount option). after moving one slave to the master-host, innodb-performance was back to normal. moving back to older kernel, performance dropped again.

          side-test: 10.1.37-MariaDB-0+deb9u1 on the same hardware using the older kernel suffers the same low innodb-performance like 10.3.17-0+deb10u1 - as long global innodb_flush_log_at_trx_commit is set to 1 (see below).

          if gtid_slave_pos is myisam and only insert-table is innodb, all inserts into this insert-table are extremely slow too (25 minutes for 140k inserts, while myisam takes 18 seconds for the same inserts). slave_parallel_mode was conservative and slave_parallel_threads/workers = 2 for other tests, changed nearly nothing.

          i tested direct inserts into the slave without using replication, the results are comparable... so i think its a pure innodb-problem and not replication-specific, and only happend on kernel 4.9.110-3+deb9u6, but with older mariadb too. myisam-performance and IO-benchmarks are showing identical results on both kernels.

          after i set innodb_flush_log_at_trx_commit to 2 or 0, inserts into innodb-table where fast as usual (for both gtid_slave_pos and the normal insert-table .. test-inserts took 41 seconds which matches what i saw in other cases comparing myisam vs. innodb peformance ).

          different values for innodb_use_native_aio and innodb_flush_method did not change much.

          for me this is solved too, but as long i run this older kernels, i can do tests if needed.

          cheers,
          david

          david_schumacher david schumacher added a comment - - edited hi, i tested using blackhole, myisam and innodb for the insert-table, and even with blackhole, replication is extremely delayed as long gtid_slave_pos uses innodb. important correction, sorry for that ... maybe this behaviour is completely unrelated to maikels findings .. or maikel upgraded his kernel in the meantime: i just noticed the host-kernels are different (now corrected in my first post): both slaves on different physical servers are using 4.9.110-3+deb9u6 (Debian 9), while master uses 4.19.67-2 (Debian 10). hardware of all three servers is the same (4 sata-SSDs in md-raid10, xfs with discard mount option). after moving one slave to the master-host, innodb-performance was back to normal. moving back to older kernel, performance dropped again. side-test: 10.1.37-MariaDB-0+deb9u1 on the same hardware using the older kernel suffers the same low innodb-performance like 10.3.17-0+deb10u1 - as long global innodb_flush_log_at_trx_commit is set to 1 (see below). if gtid_slave_pos is myisam and only insert-table is innodb, all inserts into this insert-table are extremely slow too (25 minutes for 140k inserts, while myisam takes 18 seconds for the same inserts). slave_parallel_mode was conservative and slave_parallel_threads/workers = 2 for other tests, changed nearly nothing. i tested direct inserts into the slave without using replication, the results are comparable... so i think its a pure innodb-problem and not replication-specific, and only happend on kernel 4.9.110-3+deb9u6, but with older mariadb too. myisam-performance and IO-benchmarks are showing identical results on both kernels. after i set innodb_flush_log_at_trx_commit to 2 or 0, inserts into innodb-table where fast as usual (for both gtid_slave_pos and the normal insert-table .. test-inserts took 41 seconds which matches what i saw in other cases comparing myisam vs. innodb peformance ). different values for innodb_use_native_aio and innodb_flush_method did not change much. for me this is solved too, but as long i run this older kernels, i can do tests if needed. cheers, david

          People

            Elkin Andrei Elkin
            Maikel Punie Maikel Punie
            Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.