Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Cannot Reproduce
-
10.1.17, 10.6, 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5(EOL), 10.7(EOL)
-
Can result in hang or crash
-
10.0.28
Description
Hi,
I ran into a strange issue when setting a server to read_only.
Settings:
slave_parallel_mode=optimistic
slave_parallel_threads=8
The system was replicating a lot of changes (it had just been restored from a backup).
The process I executed was:
1. Restore backup
2. SET GLOBAL gtid_slave_pos='slave pos from backup'
3. CHANGE MASTER TO MASTER_USE_GTID=slave_pos
4. START SLAVE
5. SET GLOBAL read_only=1;
The system started to hang with this processlist:
MariaDB [(none)]> show processlist;
|
+----+-------------+--------------------+--------------+---------+------+-----------------------------------------------+------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+-------------+--------------------+--------------+---------+------+-----------------------------------------------+------------------------+----------+
|
| 4 | root | a.b.c.d:53344 | NULL | Sleep | 0 | | NULL | 0.000 |
|
| 5 | root | localhost | NULL | Query | 33 | Waiting for commit lock | set global read_only=1 | 0.000 |
|
| 6 | system user | | NULL | Connect | 41 | Waiting for master to send event | NULL | 0.000 |
|
| 7 | system user | | NULL | Connect | 33 | Waiting for global read lock | NULL | 0.000 |
|
| 8 | system user | | NULL | Connect | 33 | Waiting for global read lock | NULL | 0.000 |
|
| 9 | system user | | NULL | Connect | 33 | Waiting for prior transaction to commit | NULL | 0.000 |
|
| 10 | system user | | NULL | Connect | 33 | Waiting for global read lock | NULL | 0.000 |
|
| 11 | system user | | NULL | Connect | 33 | Waiting for prior transaction to commit | NULL | 0.000 |
|
| 12 | system user | | NULL | Connect | 33 | Waiting for prior transaction to commit | NULL | 0.000 |
|
| 13 | system user | | NULL | Connect | 32 | Waiting for global read lock | NULL | 0.000 |
|
| 14 | system user | | NULL | Connect | 33 | Waiting for global read lock | NULL | 0.000 |
|
| 15 | system user | | NULL | Connect | 40 | Waiting for room in worker thread event queue | NULL | 0.000 |
|
| 16 | root | 10.255.10.32:38644 | regressiondb | Sleep | 3 | | NULL | 0.000 |
|
| 20 | root | a.b.c.d:53514 | NULL | Sleep | 16 | | NULL | 0.000 |
|
| 22 | root | localhost | NULL | Query | 0 | init | show processlist | 0.000 |
|
+----+-------------+--------------------+--------------+---------+------+-----------------------------------------------+------------------------+----------+
|
15 rows in set (0.00 sec)
|
I think the slave threads caught a read only state in the process. The server started to hang.
I tried to stop the slave with STOP SLAVE, it still hung.
Then I killed the SET GLOBAL read_only=1, the server freed up. The slave threads stopped as well.
The server I just experienced this one had to go back into production, hopefully we can try to reproduce it on another system or hopefully you can reproduce it in a lab.