[MDEV-14502] InnoDB defragmentation periodically hangs when replicated Created: 2017-11-25  Updated: 2018-10-08

Status: Open
Project: MariaDB Server
Component/s: Replication, Storage Engine - InnoDB
Affects Version/s: 10.2.10
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Christian Rishøj Assignee: Matthias Leich
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Ubuntu 16.04


Attachments: File my_a.cnf     File my_b.cnf    

 Description   

Periodically, InnoDB defragmentation (innodb-defragment = 1) hangs on the slave in a replicated setup.

It seems to happen only when parallel replication is enabled (specifically slave_parallel_threads = 4 and slave_parallel_mode = 'optimistic').

When the hang occurs, one of the slave workers executing the OPTIMIZE TABLE) statement remains in state "Waiting for table metadata lock" seemingly forever.

Stopping the slave does not appear to work, although manually killing the other slave workers does result in the STOP SLAVE statement to eventually succeed.



 Comments   
Comment by Elena Stepanova [ 2017-11-30 ]

Please provide complete configuration files from the master and slave.
Do you have a simple master->slave replication setup, or is the topology more complicated?
Do OPTIMIZE TABLE statements that the slave hangs upon come directly from a connection to slave, or from replication?

Also, would you be able to connect to the server process while it's hanging, and collect all threads' stack trace?

Thanks.

Comment by Christian Rishøj [ 2017-12-03 ]

Please provide complete configuration files from the master and slave.

Attaching...

Do you have a simple master->slave replication setup, or is the topology more complicated?

Master-master, with writes on both masters.

Do OPTIMIZE TABLE statements that the slave hangs upon come directly from a connection to slave, or from replication?

The OPTIMIZE TABLE that hangs is replicated from the master.

Also, would you be able to connect to the server process while it's hanging, and collect all threads' stack trace?

At the moment we have suspended the periodic defragmentation, but I will try to reproduce it.

Comment by Elena Stepanova [ 2018-10-05 ]

It's been long time, but have you ever managed to get the stack trace from a hanging process?

Comment by Christian Rishøj [ 2018-10-07 ]

I'm afraid not, sorry.

Comment by Elena Stepanova [ 2018-10-07 ]

mleich, could you maybe run some concurrent tests with replication setup, taking into account provided config files and description of the problem, to see if you can reproduce it? Without the stack trace, it's a blind chase, so hopes are not high, but still.

Comment by Matthias Leich [ 2018-10-08 ]

Hi Elena, I will try.

Generated at Thu Feb 08 08:14:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.