Details

    Description

      On some of my systems, the ibd file of mysql.gtid_slave_pos is very big. Those are running with parallel replication with many threads. It is actually the system I am talking about in the following:

      file="my_mysql_data_dir/mysql/gtid_slave_pos.ibd"
      list="...list of hosts..."
       
      $ for s in $list; do echo "md5sum <<< $s; ssh $s stat -c %s $file"; done | parallel -j 10 -k
      34efe8b2ba3316f8d6d93509bed3c5c2 *-
      490733568
       
      77399f822496dce163baea27d9592eac *-
      247463936
       
      820a25cfb0957d9643be3aed7f5317fa *-
      310378496
       
      37e53cd9b5a613cee05f50a746432960 *-
      457179136
       
      df3210a24fc65c149a8480740f97e689 *-
      1027604480
       
      d756e66aded1957ebbc85d2b9f2589b7 *-
      385875968
       
      7cbd541c85e02c12db63ec33bcc2b8e1 *-
      322961408
       
      ec93fff8ac68ad85ed92100decd081f8 *-
      1073741824
       
      2a291ed23819cf8555048505383bd055 *-
      2969567232
       
      12c35669ffedaff33159ac8488019bf8 *-
      260046848
       
      12505185df893d2072b4f642e9f719b4 *-
      21210595328
       
      76bc0389781bd2421d472c513f7ae476 *-
      2084569088
       
      9f0966e9107f05083902ad7afdd6f5ba *-
      998244352
       
      1492e8ac62c10d1fcf66485257c7ac63 *-
      998244352
       
      dc4ba45e21d05d115bfdd6b97c6bfd45 *-
      272629760
       
      18cb15f21feb0944e449a741cbe1c9be *-
      234881024
       
      85183b178739ba2f09d97138351d7105 *-
      251658240
       
      b337f58234dd325df87a0249f7fd748a *-
      998244352
       
      dab32b913d40aa9b5a638876238a9294 *-
      3099590656
       
      ebf8be428023bb69c833dc8be197e8ee *-
      239075328
       
      5faf2f569c71a040b735445dd651ab47 *-
      876609536
      

      Thanks in advance for looking into that,

      JFG

      Attachments

        Activity

          Actually, it turns out that there is in fact a simple but serious problem in replication that can cause it to accumulate unlimited amount of old rows in mysql.gtid_slave_pos. See this mailing list thread:

          https://lists.launchpad.net/maria-discuss/msg05253.html

          When a transaction completes, it will delete no longer used rows in mysql.gtid_slave_pos as part of its transaction. If then later that transaction turns out to be in conflict (optimistic parallel replication), it will be rolled back. In this case the row-deletions will also be rolled back, and we end with left-over rows in mysql.gtid_slave_pos.

          Presumably the reason that we do not see the many rows in the table in this report is that a server restart happened. After a server restart, unused rows in mysql.gtid_slave_pos are again considered for deletion.

          knielsen Kristian Nielsen added a comment - Actually, it turns out that there is in fact a simple but serious problem in replication that can cause it to accumulate unlimited amount of old rows in mysql.gtid_slave_pos. See this mailing list thread: https://lists.launchpad.net/maria-discuss/msg05253.html When a transaction completes, it will delete no longer used rows in mysql.gtid_slave_pos as part of its transaction. If then later that transaction turns out to be in conflict (optimistic parallel replication), it will be rolled back. In this case the row-deletions will also be rolled back, and we end with left-over rows in mysql.gtid_slave_pos. Presumably the reason that we do not see the many rows in the table in this report is that a server restart happened. After a server restart, unused rows in mysql.gtid_slave_pos are again considered for deletion.
          knielsen Kristian Nielsen added a comment - Patch: 10.1 version: https://github.com/knielsen/server/commit/2f4a0c5be2c5d5153c4253a49ba8820ab333a9a0 10.3 version: https://github.com/knielsen/server/commit/3eb2c46644b6ac81e7e5e79c9c120700a48d8071

          I have pushed the above-referenced patch to 10.1 and 10.3.

          This should fix the problem with millions of rows eventually accumulating in mysql.gtid_slave_pos when using optimistic parallel replication. Before this patch, approximately one row would be accumulated in the table for every conflict that causes a transaction rollback and retry, until next server restart where the table is cleared again.

          I think it is likely that this is also at least part of the problem that the original reporter saw, with mysql.gtid_slave_pos size in giga-bytes.

          It still seems excessive that the table grows to 8 MB with just 1000 rows in it as in my small testcase above (which does not trigger any conflicts or accumulated rows). I am not sure if 8 MB is just some InnoDB extent allocation size, or if purge lag is the cause, or if the INSERT+DELETE causes too sparse B-Tree pages, or if something else is going on...

          knielsen Kristian Nielsen added a comment - I have pushed the above-referenced patch to 10.1 and 10.3. This should fix the problem with millions of rows eventually accumulating in mysql.gtid_slave_pos when using optimistic parallel replication. Before this patch, approximately one row would be accumulated in the table for every conflict that causes a transaction rollback and retry, until next server restart where the table is cleared again. I think it is likely that this is also at least part of the problem that the original reporter saw, with mysql.gtid_slave_pos size in giga-bytes. It still seems excessive that the table grows to 8 MB with just 1000 rows in it as in my small testcase above (which does not trigger any conflicts or accumulated rows). I am not sure if 8 MB is just some InnoDB extent allocation size, or if purge lag is the cause, or if the INSERT+DELETE causes too sparse B-Tree pages, or if something else is going on...
          ruochen wy added a comment -

          I found same thing in 10.0 and 10.1 at last year,the table of mysql.gtid_slave_pos in parallel replication slave env growing to biggest 16Gb in ibd file.

          https://jira.mariadb.org/browse/MDEV-12318

          ruochen wy added a comment - I found same thing in 10.0 and 10.1 at last year,the table of mysql.gtid_slave_pos in parallel replication slave env growing to biggest 16Gb in ibd file. https://jira.mariadb.org/browse/MDEV-12318
          knielsen Kristian Nielsen added a comment - - edited

          This issue should be fixed now. There was a bug where transactions that rolled back late during their replication would leave old rows in the mysql.gtid_slave_pos table to eventually accumulate.

          I believe the reason that the tables were reported empty (while still taking up lots of space in InnoDB) is that the tables had a large number of rows at one point, but the server was restarted before the bug was reported. The server clears out the table at server restart.

          The bug is fixed in 10.0-10.3 so that a rollback will not leave rows in the table undeleted. From 10.4 the deletion of old rows is completely removed from the replication of transactions, and happens instead in the slave background thread.

          knielsen Kristian Nielsen added a comment - - edited This issue should be fixed now. There was a bug where transactions that rolled back late during their replication would leave old rows in the mysql.gtid_slave_pos table to eventually accumulate. I believe the reason that the tables were reported empty (while still taking up lots of space in InnoDB) is that the tables had a large number of rows at one point, but the server was restarted before the bug was reported. The server clears out the table at server restart. The bug is fixed in 10.0-10.3 so that a rollback will not leave rows in the table undeleted. From 10.4 the deletion of old rows is completely removed from the replication of transactions, and happens instead in the slave background thread.

          People

            Elkin Andrei Elkin
            jeanfrancois.gagne Jean-François Gagné
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.