[MDEV-12147] mysql.gtid_slave_pos gets too big. Created: 2017-02-28  Updated: 2018-12-07  Resolved: 2018-12-07

Status: Closed
Project: MariaDB Server
Component/s: Replication, Storage Engine - InnoDB
Affects Version/s: 10.0
Fix Version/s: 10.3.11, 10.1.37, 10.2.19

Type: Bug Priority: Major
Reporter: Jean-François Gagné Assignee: Andrei Elkin
Resolution: Fixed Votes: 1
Labels: None


 Description   

On some of my systems, the ibd file of mysql.gtid_slave_pos is very big. Those are running with parallel replication with many threads. It is actually the system I am talking about in the following:

file="my_mysql_data_dir/mysql/gtid_slave_pos.ibd"
list="...list of hosts..."
 
$ for s in $list; do echo "md5sum <<< $s; ssh $s stat -c %s $file"; done | parallel -j 10 -k
34efe8b2ba3316f8d6d93509bed3c5c2 *-
490733568
 
77399f822496dce163baea27d9592eac *-
247463936
 
820a25cfb0957d9643be3aed7f5317fa *-
310378496
 
37e53cd9b5a613cee05f50a746432960 *-
457179136
 
df3210a24fc65c149a8480740f97e689 *-
1027604480
 
d756e66aded1957ebbc85d2b9f2589b7 *-
385875968
 
7cbd541c85e02c12db63ec33bcc2b8e1 *-
322961408
 
ec93fff8ac68ad85ed92100decd081f8 *-
1073741824
 
2a291ed23819cf8555048505383bd055 *-
2969567232
 
12c35669ffedaff33159ac8488019bf8 *-
260046848
 
12505185df893d2072b4f642e9f719b4 *-
21210595328
 
76bc0389781bd2421d472c513f7ae476 *-
2084569088
 
9f0966e9107f05083902ad7afdd6f5ba *-
998244352
 
1492e8ac62c10d1fcf66485257c7ac63 *-
998244352
 
dc4ba45e21d05d115bfdd6b97c6bfd45 *-
272629760
 
18cb15f21feb0944e449a741cbe1c9be *-
234881024
 
85183b178739ba2f09d97138351d7105 *-
251658240
 
b337f58234dd325df87a0249f7fd748a *-
998244352
 
dab32b913d40aa9b5a638876238a9294 *-
3099590656
 
ebf8be428023bb69c833dc8be197e8ee *-
239075328
 
5faf2f569c71a040b735445dd651ab47 *-
876609536

Thanks in advance for looking into that,

JFG



 Comments   
Comment by Elena Stepanova [ 2017-03-06 ]

knielsen, do you want to take this?

Comment by Jean-François Gagné [ 2017-03-06 ]

Some more interesting information:

$ md5sum <<< "$(hostname)"
df3210a24fc65c149a8480740f97e689  -
 
$ ls -lh ...
[...] 1.0K Sep  2  2014 gtid_slave_pos.frm
[...] 980M Mar  6 13:47 gtid_slave_pos.ibd
 
[mysql]> show table status like 'gtid_slave_pos'\G
*************************** 1. row ***************************
           Name: gtid_slave_pos
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 34
 Avg_row_length: 481
    Data_length: 16384
Max_data_length: 0
   Index_length: 0
      Data_free: 1010827264
 Auto_increment: NULL
    Create_time: 2016-05-26 17:12:55
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options:
        Comment: Replication slave GTID position
1 row in set (0.00 sec)
 
[(none)] > show global variables like 'gtid_slave_pos';
+----------------+----------------------------------------+
| Variable_name  | Value                                  |
+----------------+----------------------------------------+
| gtid_slave_pos | 0-xxx-7328681352,999-yyy-2 |
+----------------+----------------------------------------+
1 row in set (0.00 sec)

The domain_id 999 above was used for some tests and is not used anymore.

Comment by Kristian Nielsen [ 2017-03-07 ]

Elena, maybe this is something for the InnoDB people to look at?

The table status shows 34 rows and a single page used, the rest is just free
data. Seems something in InnoDB is not good enough at reusing freed space.

Another possibility is that a bug in replication at some point caused huge
amounts of rows to accumulate. And then later replication managed to delete
them again. It would have to be millions of rows to take that amount of
space. Does not seem likely.

I tried this (on a release build for realistic speed):

Master:

mysqld --no-defaults --language=$(pwd)/sql/share/english --basedir=$(pwd) --datadir=$X --socket=$(pwd)/$X/mysql.sock --port=3310 --innodb-buffer-pool-size=20M --innodb-log-file-size=2M --log-bin=mysql-bin --server-id=1 --sync-binlog=0 --innodb-flush-log-at-trx-commit=0

#! /usr/bin/perl
 
use strict;
use warnings;
 
use DBI;
 
my $d=DBI->connect("DBI:mysql:database=test:mysql_socket=/home/knielsen/my/mariadb/bld/data1/mysql.sock",
                   "root", undef, {RaiseError=>1, AutoCommit=>1});
 
$d->do("CREATE TABLE IF NOT EXISTS t1 (a INT primary key, b INT)");
my $res = $d->selectall_arrayref("SELECT IFNULL(MAX(a)+1, 0) FROM t1");
my $v = $res->[0][0];
for (;;) {
  $d->do("INSERT INTO t1 VALUES (?, ?)", undef, $v++, int(rand()*2000000000));
}

Slave:

sql/mysqld --no-defaults --language=$(pwd)/sql/share/english --basedir=$(pwd) --datadir=$X --socket=$(pwd)/$X/mysql.sock --port=3311 --innodb-buffer-pool-size=20M --innodb-log-file-size=2M --log-bin=mysql-bin --server-id=2 --sync-binlog=1 --innodb-flush-log-at-trx-commit=1 --skip-slave-start --slave-parallel-threads=1000 --slave-parallel-mode=aggressive --log-slave-updates

  -rw-rw---- 1 knielsen knielsen 8388608 Mar  7 14:45 gtid_slave_pos.ibd
MariaDB [mysql]> show table status like 'gtid_slave_pos'\G
*************************** 1. row ***************************
           Name: gtid_slave_pos
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 1053
 Avg_row_length: 171
    Data_length: 180224
Max_data_length: 0
   Index_length: 0
      Data_free: 4194304
 Auto_increment: NULL
    Create_time: 2017-03-07 13:54:18
    Update_time: NULL
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options: 
        Comment: Replication slave GTID position

The row count never goes above 1000, which is the expected amount. So it is
using 8MB for 1000 rows, that is just 2 rows per page, which seems quite bad
(row size is 4 integers, 24 bytes). I had this running for an hour or so on
my laptop, it is not clear if it will continue to grow. Elena, maybe you can
try reproducing if you have a larger machine that you can run it on for
longer (5000 threads gave up on my machine).

Jean-François, one thing I could think of is if you at some point had a huge
purge lag on your machines - then the old rows could accumulate and make the
space in the table not able to be freed. And then if purge catches up you
would be left with a huge, almost-empty .ibd.

But so far it looks like an InnoDB limitation?

Comment by Marko Mäkelä [ 2017-08-10 ]

Elkin, what are the write patterns on mysql.gtid_slave_pos? Could we do something to improve it? Note that MariaDB uses a different GTID implementation than MySQL.
Also note that in InnoDB, data files never shrink, except with certain DDL statements that rebuild the data file.

Comment by Kristian Nielsen [ 2017-08-10 ]

The write pattern into mysql.gtid_slave_pos is single-row inserts and
single-row deletions by primary key in parallel. Rows are deleted relatively
quickly after being inserted.

It seems to me that purge lag is the most likely problem here. If my
understanding is correct, severe purge lag could leave all the deleted rows
taking space in the tablespace. This would make the table grow with one row
per commit for as long as purge is stalled.

It is necessary to have multiple rows in the table (as compared to a single
row that is updated by every transaction) in order to support parallel
transaction. However, instead of deleting rows when no longer used, it would
be possible to leave them, and later re-use them with UPDATE instead of a
new INSERT.

Marko, would this pattern be better for InnoDB performance?

Comment by Kristian Nielsen [ 2018-10-02 ]

Actually, it turns out that there is in fact a simple but serious problem in replication that can cause it to accumulate unlimited amount of old rows in mysql.gtid_slave_pos. See this mailing list thread:

https://lists.launchpad.net/maria-discuss/msg05253.html

When a transaction completes, it will delete no longer used rows in mysql.gtid_slave_pos as part of its transaction. If then later that transaction turns out to be in conflict (optimistic parallel replication), it will be rolled back. In this case the row-deletions will also be rolled back, and we end with left-over rows in mysql.gtid_slave_pos.

Presumably the reason that we do not see the many rows in the table in this report is that a server restart happened. After a server restart, unused rows in mysql.gtid_slave_pos are again considered for deletion.

Comment by Kristian Nielsen [ 2018-10-08 ]

Patch:

10.1 version: https://github.com/knielsen/server/commit/2f4a0c5be2c5d5153c4253a49ba8820ab333a9a0
10.3 version: https://github.com/knielsen/server/commit/3eb2c46644b6ac81e7e5e79c9c120700a48d8071

Comment by Kristian Nielsen [ 2018-10-13 ]

I have pushed the above-referenced patch to 10.1 and 10.3.

This should fix the problem with millions of rows eventually accumulating in mysql.gtid_slave_pos when using optimistic parallel replication. Before this patch, approximately one row would be accumulated in the table for every conflict that causes a transaction rollback and retry, until next server restart where the table is cleared again.

I think it is likely that this is also at least part of the problem that the original reporter saw, with mysql.gtid_slave_pos size in giga-bytes.

It still seems excessive that the table grows to 8 MB with just 1000 rows in it as in my small testcase above (which does not trigger any conflicts or accumulated rows). I am not sure if 8 MB is just some InnoDB extent allocation size, or if purge lag is the cause, or if the INSERT+DELETE causes too sparse B-Tree pages, or if something else is going on...

Comment by wy [ 2018-12-03 ]

I found same thing in 10.0 and 10.1 at last year,the table of mysql.gtid_slave_pos in parallel replication slave env growing to biggest 16Gb in ibd file.

https://jira.mariadb.org/browse/MDEV-12318

Comment by Kristian Nielsen [ 2018-12-07 ]

This issue should be fixed now. There was a bug where transactions that rolled back late during their replication would leave old rows in the mysql.gtid_slave_pos table to eventually accumulate.

I believe the reason that the tables were reported empty (while still taking up lots of space in InnoDB) is that the tables had a large number of rows at one point, but the server was restarted before the bug was reported. The server clears out the table at server restart.

The bug is fixed in 10.0-10.3 so that a rollback will not leave rows in the table undeleted. From 10.4 the deletion of old rows is completely removed from the replication of transactions, and happens instead in the slave background thread.

Generated at Thu Feb 08 07:55:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.