[MDEV-12147] mysql.gtid_slave_pos gets too big. - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.0(EOL)
Fix Version/s: 10.3.11, 10.1.37, 10.2.19
Component/s: Replication, Storage Engine - InnoDB
Labels:
None

Description

On some of my systems, the ibd file of mysql.gtid_slave_pos is very big. Those are running with parallel replication with many threads. It is actually the system I am talking about in the following:

https://www.slideshare.net/JeanFranoisGagn/mysql-parallel-replication-inventory-usecase-and-limitations-66757440/40

file="my_mysql_data_dir/mysql/gtid_slave_pos.ibd"
list="...list of hosts..."

$ for s in $list; do echo "md5sum <<< $s; ssh $s stat -c %s $file"; done \| parallel -j 10 -k
34efe8b2ba3316f8d6d93509bed3c5c2 *-
490733568

77399f822496dce163baea27d9592eac *-
247463936

820a25cfb0957d9643be3aed7f5317fa *-
310378496

37e53cd9b5a613cee05f50a746432960 *-
457179136

df3210a24fc65c149a8480740f97e689 *-
1027604480

d756e66aded1957ebbc85d2b9f2589b7 *-
385875968

7cbd541c85e02c12db63ec33bcc2b8e1 *-
322961408

ec93fff8ac68ad85ed92100decd081f8 *-
1073741824

2a291ed23819cf8555048505383bd055 *-
2969567232

12c35669ffedaff33159ac8488019bf8 *-
260046848

12505185df893d2072b4f642e9f719b4 *-
21210595328

76bc0389781bd2421d472c513f7ae476 *-
2084569088

9f0966e9107f05083902ad7afdd6f5ba *-
998244352

1492e8ac62c10d1fcf66485257c7ac63 *-
998244352

dc4ba45e21d05d115bfdd6b97c6bfd45 *-
272629760

18cb15f21feb0944e449a741cbe1c9be *-
234881024

85183b178739ba2f09d97138351d7105 *-
251658240

b337f58234dd325df87a0249f7fd748a *-
998244352

dab32b913d40aa9b5a638876238a9294 *-
3099590656

ebf8be428023bb69c833dc8be197e8ee *-
239075328

5faf2f569c71a040b735445dd651ab47 *-
876609536

Thanks in advance for looking into that,

JFG

Attachments

Activity

Ascending order - Click to sort in descending order

View 5 older comments

Kristian Nielsen added a comment - 2018-10-02 20:30

Actually, it turns out that there is in fact a simple but serious problem in replication that can cause it to accumulate unlimited amount of old rows in mysql.gtid_slave_pos. See this mailing list thread:

https://lists.launchpad.net/maria-discuss/msg05253.html

When a transaction completes, it will delete no longer used rows in mysql.gtid_slave_pos as part of its transaction. If then later that transaction turns out to be in conflict (optimistic parallel replication), it will be rolled back. In this case the row-deletions will also be rolled back, and we end with left-over rows in mysql.gtid_slave_pos.

Presumably the reason that we do not see the many rows in the table in this report is that a server restart happened. After a server restart, unused rows in mysql.gtid_slave_pos are again considered for deletion.

Kristian Nielsen added a comment - 2018-10-02 20:30 Actually, it turns out that there is in fact a simple but serious problem in replication that can cause it to accumulate unlimited amount of old rows in mysql.gtid_slave_pos. See this mailing list thread: https://lists.launchpad.net/maria-discuss/msg05253.html When a transaction completes, it will delete no longer used rows in mysql.gtid_slave_pos as part of its transaction. If then later that transaction turns out to be in conflict (optimistic parallel replication), it will be rolled back. In this case the row-deletions will also be rolled back, and we end with left-over rows in mysql.gtid_slave_pos. Presumably the reason that we do not see the many rows in the table in this report is that a server restart happened. After a server restart, unused rows in mysql.gtid_slave_pos are again considered for deletion.

Kristian Nielsen added a comment - 2018-10-08 16:49

Patch:

10.1 version: https://github.com/knielsen/server/commit/2f4a0c5be2c5d5153c4253a49ba8820ab333a9a0
10.3 version: https://github.com/knielsen/server/commit/3eb2c46644b6ac81e7e5e79c9c120700a48d8071

Kristian Nielsen added a comment - 2018-10-08 16:49 Patch: 10.1 version: https://github.com/knielsen/server/commit/2f4a0c5be2c5d5153c4253a49ba8820ab333a9a0 10.3 version: https://github.com/knielsen/server/commit/3eb2c46644b6ac81e7e5e79c9c120700a48d8071

Kristian Nielsen added a comment - 2018-10-13 18:21

I have pushed the above-referenced patch to 10.1 and 10.3.

This should fix the problem with millions of rows eventually accumulating in mysql.gtid_slave_pos when using optimistic parallel replication. Before this patch, approximately one row would be accumulated in the table for every conflict that causes a transaction rollback and retry, until next server restart where the table is cleared again.

I think it is likely that this is also at least part of the problem that the original reporter saw, with mysql.gtid_slave_pos size in giga-bytes.

It still seems excessive that the table grows to 8 MB with just 1000 rows in it as in my small testcase above (which does not trigger any conflicts or accumulated rows). I am not sure if 8 MB is just some InnoDB extent allocation size, or if purge lag is the cause, or if the INSERT+DELETE causes too sparse B-Tree pages, or if something else is going on...

Kristian Nielsen added a comment - 2018-10-13 18:21 I have pushed the above-referenced patch to 10.1 and 10.3. This should fix the problem with millions of rows eventually accumulating in mysql.gtid_slave_pos when using optimistic parallel replication. Before this patch, approximately one row would be accumulated in the table for every conflict that causes a transaction rollback and retry, until next server restart where the table is cleared again. I think it is likely that this is also at least part of the problem that the original reporter saw, with mysql.gtid_slave_pos size in giga-bytes. It still seems excessive that the table grows to 8 MB with just 1000 rows in it as in my small testcase above (which does not trigger any conflicts or accumulated rows). I am not sure if 8 MB is just some InnoDB extent allocation size, or if purge lag is the cause, or if the INSERT+DELETE causes too sparse B-Tree pages, or if something else is going on...

wy added a comment - 2018-12-03 06:59

I found same thing in 10.0 and 10.1 at last year，the table of mysql.gtid_slave_pos in parallel replication slave env growing to biggest 16Gb in ibd file.

https://jira.mariadb.org/browse/MDEV-12318

wy added a comment - 2018-12-03 06:59 I found same thing in 10.0 and 10.1 at last year，the table of mysql.gtid_slave_pos in parallel replication slave env growing to biggest 16Gb in ibd file. https://jira.mariadb.org/browse/MDEV-12318

Kristian Nielsen added a comment - 2018-12-07 15:56 - edited

This issue should be fixed now. There was a bug where transactions that rolled back late during their replication would leave old rows in the mysql.gtid_slave_pos table to eventually accumulate.

I believe the reason that the tables were reported empty (while still taking up lots of space in InnoDB) is that the tables had a large number of rows at one point, but the server was restarted before the bug was reported. The server clears out the table at server restart.

The bug is fixed in 10.0-10.3 so that a rollback will not leave rows in the table undeleted. From 10.4 the deletion of old rows is completely removed from the replication of transactions, and happens instead in the slave background thread.

Kristian Nielsen added a comment - 2018-12-07 15:56 - edited This issue should be fixed now. There was a bug where transactions that rolled back late during their replication would leave old rows in the mysql.gtid_slave_pos table to eventually accumulate. I believe the reason that the tables were reported empty (while still taking up lots of space in InnoDB) is that the tables had a large number of rows at one point, but the server was restarted before the bug was reported. The server clears out the table at server restart. The bug is fixed in 10.0-10.3 so that a rollback will not leave rows in the table undeleted. From 10.4 the deletion of old rows is completely removed from the replication of transactions, and happens instead in the slave background thread.

People

Assignee:: Andrei Elkin

Reporter:: Jean-François Gagné

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2017-02-28 12:05

Updated:: 2018-12-07 16:43

Resolved:: 2018-12-07 15:56

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server