[MDEV-12147] mysql.gtid_slave_pos gets too big. Created: 2017-02-28 Updated: 2018-12-07 Resolved: 2018-12-07 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication, Storage Engine - InnoDB |
| Affects Version/s: | 10.0 |
| Fix Version/s: | 10.3.11, 10.1.37, 10.2.19 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jean-François Gagné | Assignee: | Andrei Elkin |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Description |
|
On some of my systems, the ibd file of mysql.gtid_slave_pos is very big. Those are running with parallel replication with many threads. It is actually the system I am talking about in the following:
Thanks in advance for looking into that, JFG |
| Comments |
| Comment by Elena Stepanova [ 2017-03-06 ] | |||||||||||||||||||||||||||||||||||||||
|
knielsen, do you want to take this? | |||||||||||||||||||||||||||||||||||||||
| Comment by Jean-François Gagné [ 2017-03-06 ] | |||||||||||||||||||||||||||||||||||||||
|
Some more interesting information:
The domain_id 999 above was used for some tests and is not used anymore. | |||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2017-03-07 ] | |||||||||||||||||||||||||||||||||||||||
|
Elena, maybe this is something for the InnoDB people to look at? The table status shows 34 rows and a single page used, the rest is just free Another possibility is that a bug in replication at some point caused huge I tried this (on a release build for realistic speed): Master:
Slave:
The row count never goes above 1000, which is the expected amount. So it is Jean-François, one thing I could think of is if you at some point had a huge But so far it looks like an InnoDB limitation? | |||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-08-10 ] | |||||||||||||||||||||||||||||||||||||||
|
Elkin, what are the write patterns on mysql.gtid_slave_pos? Could we do something to improve it? Note that MariaDB uses a different GTID implementation than MySQL. | |||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2017-08-10 ] | |||||||||||||||||||||||||||||||||||||||
|
The write pattern into mysql.gtid_slave_pos is single-row inserts and It seems to me that purge lag is the most likely problem here. If my It is necessary to have multiple rows in the table (as compared to a single Marko, would this pattern be better for InnoDB performance? | |||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2018-10-02 ] | |||||||||||||||||||||||||||||||||||||||
|
Actually, it turns out that there is in fact a simple but serious problem in replication that can cause it to accumulate unlimited amount of old rows in mysql.gtid_slave_pos. See this mailing list thread: https://lists.launchpad.net/maria-discuss/msg05253.html When a transaction completes, it will delete no longer used rows in mysql.gtid_slave_pos as part of its transaction. If then later that transaction turns out to be in conflict (optimistic parallel replication), it will be rolled back. In this case the row-deletions will also be rolled back, and we end with left-over rows in mysql.gtid_slave_pos. Presumably the reason that we do not see the many rows in the table in this report is that a server restart happened. After a server restart, unused rows in mysql.gtid_slave_pos are again considered for deletion. | |||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2018-10-08 ] | |||||||||||||||||||||||||||||||||||||||
|
Patch: 10.1 version: https://github.com/knielsen/server/commit/2f4a0c5be2c5d5153c4253a49ba8820ab333a9a0 | |||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2018-10-13 ] | |||||||||||||||||||||||||||||||||||||||
|
I have pushed the above-referenced patch to 10.1 and 10.3. This should fix the problem with millions of rows eventually accumulating in mysql.gtid_slave_pos when using optimistic parallel replication. Before this patch, approximately one row would be accumulated in the table for every conflict that causes a transaction rollback and retry, until next server restart where the table is cleared again. I think it is likely that this is also at least part of the problem that the original reporter saw, with mysql.gtid_slave_pos size in giga-bytes. It still seems excessive that the table grows to 8 MB with just 1000 rows in it as in my small testcase above (which does not trigger any conflicts or accumulated rows). I am not sure if 8 MB is just some InnoDB extent allocation size, or if purge lag is the cause, or if the INSERT+DELETE causes too sparse B-Tree pages, or if something else is going on... | |||||||||||||||||||||||||||||||||||||||
| Comment by wy [ 2018-12-03 ] | |||||||||||||||||||||||||||||||||||||||
|
I found same thing in 10.0 and 10.1 at last year,the table of mysql.gtid_slave_pos in parallel replication slave env growing to biggest 16Gb in ibd file. | |||||||||||||||||||||||||||||||||||||||
| Comment by Kristian Nielsen [ 2018-12-07 ] | |||||||||||||||||||||||||||||||||||||||
|
This issue should be fixed now. There was a bug where transactions that rolled back late during their replication would leave old rows in the mysql.gtid_slave_pos table to eventually accumulate. I believe the reason that the tables were reported empty (while still taking up lots of space in InnoDB) is that the tables had a large number of rows at one point, but the server was restarted before the bug was reported. The server clears out the table at server restart. The bug is fixed in 10.0-10.3 so that a rollback will not leave rows in the table undeleted. From 10.4 the deletion of old rows is completely removed from the replication of transactions, and happens instead in the slave background thread. |