Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.0(EOL)
-
10.0.22, 10.1.9-1, 10.1.9-2
Description
Thread 5 is locked by global read lock but FTWRL never completes and stays in waiting for commit lock; replication thread stays locked forever until FTWRL is killed
MariaDB(db-02)[(none)]> show processlist\G
|
*************************** 1. row ***************************
|
Id: 3
|
User: system user
|
Host:
|
db: NULL
|
Command: Connect
|
Time: 12612
|
State: Waiting for prior transaction to start commit before starting next transaction
|
Info: NULL
|
Progress: 0.000
|
*************************** 2. row ***************************
|
Id: 4
|
User: system user
|
Host:
|
db: NULL
|
Command: Connect
|
Time: 12612
|
State: Waiting for prior transaction to commit
|
Info: NULL
|
Progress: 0.000
|
*************************** 3. row ***************************
|
Id: 5
|
User: system user
|
Host:
|
db: crypto_data
|
Command: Connect
|
Time: 12612
|
State: Waiting for global read lock
|
Info: INSERT INTO `market_orders_ok` (`label`, `marketid`, `ordertype`, `price`, `quantity`, `total`) VALU
|
Progress: 0.000
|
*************************** 4. row ***************************
|
Id: 6
|
User: system user
|
Host:
|
db: NULL
|
Command: Connect
|
Time: 12612
|
State: Waiting for prior transaction to start commit before starting next transaction
|
Info: NULL
|
Progress: 0.000
|
*************************** 5. row ***************************
|
Id: 7
|
User: system user
|
Host:
|
db: NULL
|
Command: Connect
|
Time: 1651381
|
State: Waiting for master to send event
|
Info: NULL
|
Progress: 0.000
|
*************************** 6. row ***************************
|
Id: 8
|
User: system user
|
Host:
|
db: NULL
|
Command: Connect
|
Time: 13263
|
State: Waiting for room in worker thread event queue
|
Info: NULL
|
Progress: 0.000
|
*************************** 7. row ***************************
|
Id: 39
|
User: monitoring
|
Host: localhost
|
db: NULL
|
Command: Sleep
|
Time: 59
|
State:
|
Info: NULL
|
Progress: 0.000
|
*************************** 8. row ***************************
|
Id: 1568378
|
User: tanje6acfc0c1213cfb63
|
Host: localhost
|
db: NULL
|
Command: Sleep
|
Time: 3
|
State:
|
Info: NULL
|
Progress: 0.000
|
*************************** 9. row ***************************
|
Id: 27871929
|
User: backup
|
Host: localhost
|
db: NULL
|
Command: Query
|
Time: 12612
|
State: Waiting for commit lock
|
Info: FLUSH TABLES WITH READ LOCK
|
Progress: 0.000
|
*************************** 10. row ***************************
|
Id: 28082858
|
User: tanje6acfc0c1213cfb63
|
Host: localhost
|
db: NULL
|
Command: Query
|
Time: 0
|
State: init
|
Info: show processlist
|
Progress: 0.000
|
10 rows in set (0.00 sec)
|
Attachments
Issue Links
- relates to
-
MDEV-8318 Assertion `!pool->busy' failed in pool_mark_busy(rpl_parallel_thread_pool*) on concurrent FTWRL
-
- Closed
-
Current idea:
When FTWRL starts, it first checks all parallel replication worker
threads. It finds the most recent GTID started by any of them. It then sets
a flag to tell the threads not to start on any newer GTIDs, and then waits
for all earlier GTIDs to fully commit. It also sets a flag to tell START
SLAVE, STOP SLAVE, and the SQL thread to not start any new slave activity.
Once all worker threads have reached their designated point, FTWLR continues
to take the global read lock. Once that is obtained, it clears the flags and
signals worker threads and other slave code that it can proceed. At this
point, the lock is held, so no real activity will be possible until the lock
is cleared with UNLOCK TABLES.
This should hopefully fix the deadlock, at least I got the test case of
Elena to pass with a preliminary patch along these lines.
Some care will probably be needed to guard against other deadlocks against
concurrent START SLAVE / STOP SLAVE, hopefully I can get that solved.