[MDEV-8929] Chained replication is not working (non-transactional updates are replicated very slowly) Created: 2015-10-12  Updated: 2023-07-12  Resolved: 2023-07-12

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.0.21
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Alex Assignee: Kristian Nielsen
Resolution: Done Votes: 0
Labels: None
Environment:

centos 6.6



 Description   

Hello,
I began playing with MariaDB 10 and decided to check how it works with chained replication where 1 server replicates to 2nd and 2nd to 3rd.
It's easily working with 5.5 but with 10.0.21 I've encountered a problem with 3rd server.

The initial setup was:
server 1 (5.5) -> server 2 (5.5) -> server 3 (5.5)

After upgrade:
server 1 (10.0) -> server 2 (10.0) -> server 3 (5.5)

The next and final step was server 3 upgrade:
server 1 (10.0) -> server 2 (10.0) -> server 3 (10.0)

And here I got into troubles. Server 3 denied replicating. Counters worked fine (for different relay/master logs positions).
Use of gtid on all servers was set to no. Actually I haven't touched anything related to gtid since wanted to keep using the old style replication.

In the end I ended with 3rd server space problems. It kept downloading logs from master (server 2) without replaying it.

So now I am back to server 1 (10.0) -> server 2 (10.0) -> server 3 (5.5) topology.

In my tests (and future upgrades) I want to stay with classic/old replication style and enjoy MariaDB 10 parallel replication.
So thought that without touching gtid settings it remains as is (like the server 1 and server 2 do).

What is wrong with my chained replication setup that caused 3rd server to stop replicating?
Is it something about default domain_id=0, something else or a bug that denied 3rd server ignore gtid related settings?

Another thing I've paid my attention to - very high IO. Disk was at 100% usage (by mysql process) and always writing something.
I thought that it was somehow caused by relay-logs massive download from master (server 2) and replaced disk with a faster one.
I am using Amazon, so upgraded provisioned IOPS HDD , from 1000 IOPS to 2000 IOPS. It didn't help, and the the disk remained busy, same 100% and just doubled write speed.
Can't say whether it's normal or not, when downgraded to 5.5 everything calmed down, no IO, no replication problems.

Your help is much appreciated!

Thanks,
Alex



 Comments   
Comment by Elena Stepanova [ 2015-10-18 ]

ShivaS,

Could you please

  • give an example of an event which the 5.5 slave chokes upon;
  • quote the exact error message it produces;
  • specify the version of 5.5 slave;
  • attach your config files for all three servers?

In general this is not a recommended scenario to upgrade (normally you should upgrade slaves first), but some effort was made to make it work in a general simple case, so we want to find out what went wrong.
Please note however that such setup will never be safe – as soon as 10.0 produces anything that 5.5 does not support, replication will fail.

Comment by Alex [ 2015-10-18 ]

Hi Elena,
I think there is a misunderstanding.
5.5 has no problems replicating from 10.0 after downgrade as long as both mysql_upgrade and relay-log-recovery=1 used.
So all relay logs are reset and re-downloaded in a format 5.5 can understand.
I have few 10.0 -> 5.5 working replications and never had problems with it. Even in a chain.
I only described the process of how I did the downgrade.

The main problem is with 3x 10.0 servers replicating in a chain in old fashion style (binary log/positions) and not using gtid.
The 3rd server is not replicating.
It seems that IO is running and SQL is not really executing/running, though both report YES of course.
ame for counters, no relay log positions/other things are updating.

The setup/configs is same as used for 5.5 and never changed, binary logs for all with few exclusions (for database mysql for example), and that's it.

Comment by Elena Stepanova [ 2015-10-18 ]

ShivaS, indeed, I misread your initial description, sorry about that.

No, the problem you encountered is definitely not normal. And I just tried to reproduce it without luck – chain replication 10.0=>10.0=>10.0 with default settings works just fine for me.
One obvious reason that could cause such effect would be having the same server_id on both first and third server; but then, your replication to 5.5 should have been affected identically, since you did not change the config.
Another possibility is that the 2nd server does not have log-slave-updates – but again, 5.5 would have been affected just the same.

It's possible that there is a bug in 10.0 replication that we are not aware of; but apparently it's not visible on default settings. So, even though you did not change configuration, please still attach the files (or output of SHOW VARIABLES statements) for each server, so that we could try to reproduce it.

Comment by Alex [ 2015-10-18 ]

Hi Elena, I am working on reproducing the problem with a completely clean environment and will update you shortly.
if this worked for you - could be something is wrong on my side, though I've made few good tries. So I'll recheck it in a new environment

Comment by Alex [ 2015-10-20 ]

Hi Elena,
I think I got something here...
I've been playing with upgrades only, so no idea whether it's only upgrade related. Could be it's MariaDB 10 problem.

After 5.5 -> 10.0 upgrade replication works but very very very slow, same as 2nd server in a chain that I am blaming for inability to make some parallel replication regardless what happens on master (MDEV-8955). I also think that because of that extremely slow replication I haven't waited enough last time, and assumed relay log counters never update.
It's not true. They do..like a turtle.

The strange thing is that on one of my servers in 3rd chain I've excluded many huge tables to let it update only small/rarely (compared to master's qps) tables.
So as a fact I knew that it will receive full sized/content binary logs but should actually replay it quite fast because heavy things should have been ignored.
No way...slave lags, seconds behind master go up and I see same 100% IO usage I've previously reported in this task.
And again I ended up with space filled by relay logs and had to reset it with relay-log-recovery=1.

I used inotify tools to look for files activity on DB partition. 100% IO usage was very strange. I noticed crazy access to ib_logfile1 (not to ib_logfile0)
I've enabled innodb only to get MariaDB running the proposed way - keep gtid_slave_pos table as innodb (+ noticed it created 2 more tables as innodb: innodb_index_stats and innodb_table_stats)
I decided to disable all custom innodb settings and just keep innodb engine enabled. This just to let MariaDB10 use its default innodb settings. After all I don't need to tune it yet.
The problem remained. IO 100%, slave lags and increasing delay.
Then I decided to completely disable innodb, and converted gtid_slave_pos to myisam (it simply didn't give me to restart).
The other 2 tables I've mentioned I decided not to touch since MariaDB 10 restarted without errors.

Then the magic began...slave began working hard on closing delays, IO went down and increased only on real disk writes...now I recognize my good old MariaDB 5 behaviour.
Even 2nd master in a chain finally started closing the delay using slave parallel threads only without any additional manipulations with master's binlog settings (MDEV-8955), solely relying on this amazing feature. Slowly, but closing, second after second. Other slaves started working too.

A different thing I've noticed with 2nd-chain-slave which additionally holds few more local tables that are not replicated from its master but only have direct app access
(and then all tables (from master and this slave) are replicated to 3rd-chain-slave)
Sometimes inserts to those local tables stuck, everything and everybody wait for table lock. And it started only after I've upgraded to MariaDB 10, but can't say for sure whether it happens when multiple slave threads are active or not. I think yes. I already wanted to switch those tables to innodb because of those locks, but now innodb is disabled because of all aforementioned problems.
And those "waiting for lock" tables recieve maybe 20 inserts/sec. Also when I added binlog settings to this server (since 2nd server in a chain is a master too), it completely died/stuck until restarted. Tried few times.

So for now all slaves are closing the delays behind their masters, speeds seem to be good same as in MariaDB 5 and parallel threads are working fine without binlog settings on either master.
It's not as fast as it used to be if binlog settings are enabled, but at least keeping slaves in sync with masters. binlog settings are not applied yet, nowhere.

That's all information I have by now. I tried to detail every thing I've paid attention to and steps I took to temporarily solve the problem.
But I still would like to have innodb, reliable gtid slave table, binlog settings on masters that also behave like slaves, no locks on rare inserts to myisam and a coffee

Please let me know how I can assist you further!

Thanks!
Alex

p.s.
I also see no IO problems on master with enabled innodb.

Comment by Alex [ 2015-10-20 ]

Finally I got stalled slave that kept downloading relay logs without running anything on its side.
this time I was smarter and checked processlist in information schema database to see what happens to that specific thread.

What I saw is:

*************************** 2. row ***************************
           ID: 5
         USER: system user
         HOST:
           DB: NULL
      COMMAND: Connect
         TIME: 7
        STATE: Slave has read all relay log; waiting for the slave I/O thread t
         INFO: NULL
      TIME_MS: 7268.915
        STAGE: 0
    MAX_STAGE: 0
     PROGRESS: 0.000
  MEMORY_USED: 441242824
EXAMINED_ROWS: 0
     QUERY_ID: 140
 

After it reached around 512MB it suddenly started working...looks like some buffering.
Strange that other slaves immediately started working and closing the delay. So under certain circumstances this may happen....like in my 1st upgrade.
Historically I had on master max_allowed_packet set to 512MB and then had to set it on slaves since all denied to replicate because of too large packet..so adjusted it. in MariaDB 5 it never happened though. So seems like this small thing is kinda clear now (I hope) but don't understand why I did that in the past and why replication sometimes buffering data and sometimes immediately starts replaying...

Comment by Alex [ 2015-10-20 ]

seems like this has something common with my recent replication problems
https://mariadb.atlassian.net/browse/MDEV-6276

However, on 2nd chained slave there are no exclusions and it still lagged behind until I disabled innodb. same IO problems.

Comment by Alex [ 2015-10-20 ]

made a quick test and see same thing in general log:

endless entries like those:

                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
                    8 Query     BEGIN
                    8 Query     COMMIT
 

Comment by Alex [ 2015-10-20 ]

one more thing though. my problematic slaves are all myisam, and innodb is solely used for gtid slave pos table.
some servers replicate everything received from master, some partially, but instead of replicate-do-db I use replicate-ignore..anyway I think it's not a problem. It's just that table and innodb causing very high IO.

Besides, I've encountered few more problems with replication in MariaDB10 that I still need your help with

Comment by Elena Stepanova [ 2015-10-21 ]

ShivaS, thanks for performing this investigation.

Did you try to re-enable InnoDB, but keep the gtid table MyISAM? It would show whether the table is a point of contention, or it's InnoDB being smart and doing something it was not asked to do.

Comment by Alex [ 2015-10-21 ]

Just did, though I should have done this before
With innodb enabled and gtid as myisam - no IO at all. After alter to innodb - high IO, alter back to myisam - IO calmed down.

Comment by Elena Stepanova [ 2015-10-21 ]

Okay, thank you, so it's most certainly the contention around gtid InnoDB tables. As I understand, your other tables are MyISAM? What is the main flow, is it DML (inserts, updates, deletes), or DDL, or mixed? Which binlog format are you using? Please also list any other non-default settings that you use on the servers.

I will try to run some tests to see if I can reproduce it locally.

Comment by Alex [ 2015-10-21 ]

ROW replication but without annotations/checksums, otherwise it slows down the replication speed and slaves are lagging (even in 5.5). Yes, all tables are MyISAM.
As for the main flow, I have few slaves used for different operations. master has heavy insert-delayed, one slave replicates everything from master and only has backup running once a day or manipulations once an hour, a different slave excludes most of heavy tables and uses some for selects (redis), no DML. A different slave also excludes most of heavy tables and uses for some rare select tasks, and the last slave has full replication from master + additional local tables that have kinda rare inserts/selects like 20-30qps.
Master does 10-20qps so you can imagine how big/many binary logs are.
As for the innodb, I also tested that with default settings. Query cache is either enabled or disabled (depends on slave's role).
Other settings (below) are either default or not, I am used to have everything in my.cnf for the quick reference.
max_allowed_packet = 512M
table_open_cache = 32768
sort_buffer_size = 2M
read_buffer_size = 128K
read_rnd_buffer_size = 256K
tmp_table_size=1000M
max_heap_table_size=1000M

skip-archive
skip-federated
default-storage-engine=MyISAM

if needed:
replicate-wild-ignore-table=db.table%

Thanks!

Comment by Elena Stepanova [ 2015-10-26 ]

mdev8929.cnf

!include suite/rpl/rpl_1slave_base.cnf
!include include/default_client.cnf
 
[mysqld.1]
log-slave-updates
innodb
 
[mysqld.2]
skip-log-slave-updates
innodb

mdev8929.test

--let $rpl_topology= 1->2
--let $rows = 500
 
--source include/rpl_init.inc
 
--enable_connect_log
 
--connection server_1
# alter table mysql.gtid_slave_pos engine=InnoDB;
 
# Toggle MyISAM / InnoDB to get different variations
create table t1 (pk int primary key, i int) engine=MyISAM;
--echo # Inserting
# begin;
--disable_query_log
while ($rows)
{
  eval insert into t1 values ($rows, $rows);
  dec $rows;
}
--enable_query_log
# commit;
drop table t1;
 
--save_master_pos
 
--connection server_2
--sync_with_master
--save_master_pos
 
--connection server_1
 
--disable_connect_log
--source include/rpl_end.inc

Results
Note: results don't depend much on binlog format, so I'm only pasting values for mixed format

Variation Milliseconds
t1 MyISAM, gtid_slave_pos InnoDB 30001
t1 MyISAM, gtid_slave_pos MyISAM 1937
t1 InnoDB, gtid_slave_pos InnoDB 67523
t1 InnoDB, gtid_slave_pos InnoDB, in one transaction 3330
t1 MyISAM, gtid_slave_pos InnoDB, innodb-flush-log-at-trx-commit=2 9122
t1 MyISAM, gtid_slave_pos InnoDB, innodb-flush-log-at-trx-commit=0 8456

ShivaS,
As I understand, the first row is basically what you have.
It is no news if you think about it... For every update on a non-transactional table gtid_slave_pos also gets updated, and as we know, it is not fast. In my test, the situation could be improved by using non-default innodb_fllush_log_at_trx_commit. I will assign the report to Kristian for further analysis – I don't expect it's news for him, but he might give some advice – but first, if possible, could you please run the same flow (user tables MyISAM, gtid table InnoDB) with innodb_flush_log_at_trx_commit=0, to see how much effect it has in your environment – mine is by no means suitable for performance testing, so the results might be misleading.

Comment by Alex [ 2015-10-26 ]

Hi Elena,
I performed more tests and came to 2 conclusions:
1. Once myisam altered to innodb in most cases it works fine. Only 2-3% of IO added, and maybe a dozen of IOPS, which is fine and which you describe here.
2. Once I restart mysql (after myisam ->> innodb and trx=1 is set) the IO goes to 100% usage and IO reaches disks's IOPS limit.(tried with both 1000 and 2000 IOPS in amazon). Same for write speed. In all my previous tests I usually restarted mysql with setting innodb on/off.

In both cases (with/out mysql restart after myisam->innodb) trx_commit=0 calms the situation down and works fine with a little IO compared to myisam. But if trx=1 is back with myisam altered back to innodb - 100% IO reached. But again, this usually happens after restart. And then all online alters (to/from innodb/myisam) give 100% IO problem with innodb and trx=1.

So the problem is not a little IO increase but truly 100% IO usage. IOPS at maximum as well as CPU resources. Something crazy inside innodb/xtradb and very little data used by that slave table. After all this slave is all about myisam.

Here are the results:

*myisam*
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.13    0.00    0.05    0.03    0.00   99.80
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.10    0.00    0.20     0.00     0.00    12.00     0.00    4.00   4.00   0.08
xvdc              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdb              0.00     1.70    0.00    2.00     0.00     0.01    14.80     0.00    0.90   0.35   0.07
xvdf              0.00     0.20    0.00    0.40     0.00     0.00    12.00     0.00    0.75   0.75   0.03
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.05    0.00    0.00    0.00    0.00   99.95
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00    0.00    0.60     0.00     0.00     8.00     0.00    2.33   0.50   0.03
xvdc              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdb              0.00    10.20    0.00   12.30     0.00     0.09    14.63     0.05    4.22   0.11   0.13
xvdf              0.00     3.90    0.00    1.10     0.00     0.02    36.36     0.00    1.09   0.91   0.10
 
*innodb trx=1* 
(that what happens after mysql restart)
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.10    0.00    0.30     0.00     0.00    10.67     0.00    3.67   2.33   0.07
xvdc              0.00     0.00    0.00    0.50     0.00     0.00     8.00     0.00    0.00   0.00   0.00
xvdb              0.00   568.20    0.00  812.50     0.00     5.39    13.59     0.96    1.18   1.17  95.36
xvdf              0.00  2733.40    0.00   90.60     0.00    11.03   249.36     7.03   77.65   0.65   5.90
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.38    0.00    1.32   10.68    0.16   86.46
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.10    0.00    0.20     0.00     0.00    12.00     0.00    5.00   5.00   0.10
xvdc              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdb              0.00   661.20    0.00  834.90     0.00     5.84    14.34     0.99    1.18   1.14  94.82
xvdf              0.00     0.40    0.00    0.50     0.00     0.00    14.40     0.00    0.60   0.60   0.03
 
 
*innodb trx=0*
(similar details received after 1st alter to innodb, without restart, and even with trx=1)
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdc              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdb              0.00   987.00    0.00   42.00     0.00     4.02   196.00     0.47   11.14   0.57   2.40
xvdf              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          24.05    0.00    6.01    0.32    0.63   68.99
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdc              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdb              0.00   990.00    0.00   42.00     0.00     4.03   196.57     0.47   11.21   0.55   2.30
xvdf              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Comment by Elena Stepanova [ 2015-10-27 ]

Only 2-3% of IO added, and maybe a dozen of IOPS, which is fine and which you describe here

Actually no, that's not what I'm saying at all.
the results above show ~15x difference between gtid_slave_pos being MyISAM and InnoDB; and mind you, it's one thread, one table and a tiny number of inserts, while on obvious reasons the problem scales awfully.

Below is an extended example. Each thread inserts into its own MyISAM table, and we are waiting till slave finishes the job (time in ms).

gtid_slave_pos MyISAM, 500 inserts per table
1 thread:  1809
2 threads: 2430
3 threads: 3067
 
gtid_slave_pos MyISAM, 1000 inserts per table
1 thread:  2431
2 threads: 3712
3 threads: 4973
 
gtid_slave_pos InnoDB, 500  inserts per table
1 thread:  32235
2 threads: 62349
3 threads: 94214
 
gtid_slave_pos InnoDB, 1000 inserts per table
1 thread:  62561
2 threads: 131776
3 threads: 202873

So, for one thread and 500 rows, it's ~17x difference. For 3 threads and 1000 rows, it's already ~40x.
And while IO stats on my VM are by no means accurate, yes, I'm getting a similar picture with IO.

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdc               0.00   323.00    0.00   50.00     0.00  1442.00    57.68     1.26   25.60    0.00   25.60  19.20  96.00

One remaining mystery is that you need to do a restart – frankly I have no idea how it can make any difference; maybe Kristian will be able to explain this.

If the slave is completely about MyISAM, it's not actually that bad – you can keep gtid_slave_pos with MyISAM. The real problem should be when flow has both transactional and non-transactional flow.
I'm assigning it to Kristian – as I said before, obviously there is no news for him, but hopefully he can suggest how to handle it best.

Comment by Alex [ 2015-10-27 ]

Elena, I am sorry about the misinterpreting your previous reply.

I only wanted to point out that 100% IO is quite strange to see here. At high speeds it simply slows down the slave. And after all keeping it with myisam doesn't make it crash safe or with trx=0...However, it doesn't look worse than it was before

Anyway, thanks for your time on this one, I'll be patiently awaiting Kristian's verdict/advice and for now keeping the table as myisam.
Now, when I know the nature of this problem I can safely upgrade small innodb servers to mariadb10, keep trx=1 and innodb for gtid pos slave table. As long as there is no high load on slaves - it is crash-safe

p.s. no idea about restart thing...It happened me quite a few times during last tests. Could be I didn't give enough time after alter for server to start showing signs of IO problems. However, everything usually happens quite fast with speeds of queries I have. For now I give up on it and keep waiting for Kristian

thanks again!

Comment by Kristian Nielsen [ 2015-10-27 ]

Not 100% sure what I am being asked here. But:

1. Every slave transaction needs to update mysql.gtid_slave_pos. Therefore, this table should preferably use the same storage engine (eg. InnoDB vs. MyISAM) as most normal transaction, to avoid multi-engine transactions.

2. It is normal that InnoDB with default settings is slow due to syncing to disks. Eg. a limit of 20 commits/second can easily be seen on non-SSD.

Comment by Alex [ 2015-10-27 ]

Hi Kristian!

So I guess it's better keep this table as myisam in high-load environments regardless what engine is used, otherwise it's either trx=0 or a good SSD may help.
And that's because the disk syncs at such speed and trx=1 used to "kill" slow HDDs. I initially thought that innodb and trx=1 can handle small but frequent single innodb tables updates...
Maybe that's also because it's Amazon and on a normal hardware with raid controller and back-write cache it could behave differently...

In my tests I used 1000 and 2000 IOPS disks in Amazon EC2 which is not SSD but anyway faster than raid10 with 15k SAS, and in both cases the IOPS / CPU usage was at maximum with trx=1.
My master does around 10-15k qps, so slaves are working quite hard which in its turn adds lot of a "headache" to gtid_slave_pos table. Thanks to parallel replication btw
The other slave replicates few rarely-updated tables from the same master but gtid nature updates gtid_slave_pos table like if slave was truly doing something and not skipping most of heavy tables (I guess there is no choice to make it crash-safe otherwise)

Thanks!
Alex

Comment by Kristian Nielsen [ 2023-07-12 ]

Closing, this was mostly a discussion/investigation into the performance impact of introducing updates to mysql.gtid_slave_pos for every replicated transaction, which can unfortunately be significant in some scenarios.

Parallel replication can to some extend mitigate this, even in --slave-parallel-mode=minimal the fsyncs for the transaction logs can utilise group commit. Also, it is now possible to create multiple mysql.gtid_slave_pos tables with different engines, so each replicated update can be recorded in the version with the same engine as the updates, avoiding multi-engine transactions.

Generated at Thu Feb 08 07:30:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.