[MDEV-31427] MariaDB replication server's SQL Thread stuck at 'Waiting for prior transaction to commit' Created: 2023-06-07  Updated: 2023-06-20  Resolved: 2023-06-08

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.6.10
Fix Version/s: 10.6.11

Type: Bug Priority: Major
Reporter: Mohamed Ismail Assignee: Kristian Nielsen
Resolution: Duplicate Votes: 1
Labels: replication
Environment:

hw.system.vendor=HP
hw.system.model=DL380 Gen10
hw.cpu.vendor=Intel
hw.cpu.model=Xeon Silver 4116 CPU
hw.cpu.clock.speed.in.ghz=2.10
hw.cpu.quantity=48
hw.ram.in.gb=256
os=RHEL 7.9


Attachments: Text File gbd_out.txt     Text File innodb.txt    
Issue Links:
Duplicate
duplicates MDEV-29843 Server hang in thd_decrement_pending_... Closed

 Description   

Replication hangs often at a one Master - One Replica setup.
From processlist the SQL worker threads seems to wait indefinitely with 'Waiting for prior transaction to commit' status.
Once this happens replication is stuck and STOP SLAVE does not work, have to kill manually then works well for few days and issue occurs again.
The queries before Relay_Log_Pos also does not seem out of ordinary (an insert and an update with PK).

MariaDB : v10.6.10

Processlist

Id User Host db Command Time State Info Progress
11 system user NULL Slave_IO 986857 Waiting for master to send event NULL 0.000
13 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
14 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
15 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
16 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
18 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
17 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
20 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
21 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
19 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
22 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
23 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
24 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
25 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
26 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
27 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
28 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
29 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
30 system user NULL Slave_worker 57879 closing tables NULL 0.000
31 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
32 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
34 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
33 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
35 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
36 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
37 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
38 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
39 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
40 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
41 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
42 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
43 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
44 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
45 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
46 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
47 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
48 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
49 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
50 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
51 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
52 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
53 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
54 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
55 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
56 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
57 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
58 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
59 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
60 system user NULL Slave_worker 57878 Waiting for prior transaction to commit NULL 0.000
12 system user NULL Slave_SQL 57999 Waiting for room in worker thread event queue NULL 0.000

Slave status

Slave_IO_State: Waiting for master to send event
Master_Host: a.b.c.d
Master_User: repl
Master_Port: 3307
Connect_Retry: 60
Master_Log_File: binary-log.024465
Read_Master_Log_Pos: 312810975
Relay_Log_File: mysql-1-relay-bin.004631
Relay_Log_Pos: 201089830
Relay_Master_Log_File: binary-log.024265
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: mysql
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 610809588
Relay_Log_Space: 76470879853
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 57879
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_SSL_Crl:
Master_SSL_Crlpath:
Using_Gtid: Current_Pos
Gtid_IO_Pos: 0-2-703347858
Replicate_Do_Domain_Ids:
Replicate_Ignore_Domain_Ids:
Parallel_Mode: optimistic
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Waiting for room in worker thread event queue
Slave_DDL_Groups: 21059
Slave_Non_Transactional_Groups: 0
Slave_Transactional_Groups: 71502590

Relay log events

show relaylog events in 'mysql-1-relay-bin.004631' from 201088966 limit 20;
-------------------------------------------------------------------------------------------------------------+

Log_name Pos Event_type Server_id End_log_pos Info

-------------------------------------------------------------------------------------------------------------+

mysql-1-relay-bin.004631 201088966 Xid 2 610806615 COMMIT /* xid=33545593056 */
mysql-1-relay-bin.004631 201088997 Gtid 2 610806659 BEGIN GTID 0-2-696762571 cid=33545631902
mysql-1-relay-bin.004631 201089041 Table_map 2 610807665 table_id: 16927 (aDatabase.table1)
mysql-1-relay-bin.004631 201089153 Write_rows_v1 2 610807864 table_id: 16927 flags: STMT_END_F
mysql-1-relay-bin.004631 201089352 Table_map 2 610809239 table_id: 101 (aDatabase.table2)
mysql-1-relay-bin.004631 201089481 Update_rows_v1 2 610809557 table_id: 101 flags: STMT_END_F
mysql-1-relay-bin.004631 201089799 Xid 2 610809588 COMMIT /* xid=33545619314 */
mysql-1-relay-bin.004631 201089830 Gtid 2 610809632 BEGIN GTID 0-2-696762572 cid=33545631902
mysql-1-relay-bin.004631 201089874 Table_map 2 610810941 table_id: 16924 (aDatabase.table3)
mysql-1-relay-bin.004631 201089988 Write_rows_v1 2 610811072 table_id: 16924 flags: STMT_END_F
mysql-1-relay-bin.004631 201090119 Table_map 2 610823353 table_id: 16928 (aDatabase.table4)
mysql-1-relay-bin.004631 201090240 Write_rows_v1 2 610829138 table_id: 16928 flags: STMT_END_F
mysql-1-relay-bin.004631 201096025 Table_map 2 610830574 table_id: 72 (aDatabase.table5)
mysql-1-relay-bin.004631 201096152 Update_rows_v1 2 610831036 table_id: 72 flags: STMT_END_F
mysql-1-relay-bin.004631 201096614 Table_map 2 610832472 table_id: 72 (aDatabase.table5)
mysql-1-relay-bin.004631 201096741 Update_rows_v1 2 610832934 table_id: 72 flags: STMT_END_F
mysql-1-relay-bin.004631 201097203 Table_map 2 610834371 table_id: 72 (aDatabase.table5)
mysql-1-relay-bin.004631 201097330 Update_rows_v1 2 610834833 table_id: 72 flags: STMT_END_F
mysql-1-relay-bin.004631 201097792 Table_map 2 610836271 table_id: 72 (aDatabase.table5)
mysql-1-relay-bin.004631 201097919 Update_rows_v1 2 610836733 table_id: 72 flags: STMT_END_F

-------------------------------------------------------------------------------------------------------------+
20 rows in set (0.078 sec)

Replica my.cnf

binlog-format=ROW
binlog-ignore-db=mysql
binlog-ignore-db=performance_schema
binlog-ignore-db=test
binlog_cache_size=4M
binlog_commit_wait_count=48
binlog_commit_wait_usec=20000
character_set_server=utf8
datadir=/data/mysql-1
expire_logs_days=3
general-log=0
general_log_file=/var/opt/na/log/mysql-1/general-query.log
gtid_strict_mode=1
innodb=FORCE
innodb_buffer_pool_chunk_size=128M
innodb_buffer_pool_size=192G
innodb_data_file_path=ibdata:100M:autoextend
innodb_data_home_dir=/data/mysql-1
innodb_fast_shutdown=1
innodb_file_per_table=1
innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT
innodb_force_recovery=0
innodb_io_capacity=3000
innodb_log_buffer_size=8M
innodb_log_file_size=2047MB
innodb_log_group_home_dir=/data/mysql-1
innodb_open_files=10000
innodb_print_all_deadlocks=1
innodb_purge_threads=1
interactive_timeout=420
join_buffer_size=256K
join_buffer_space_limit=2M
join_cache_level=2
log-bin=binary-log.
log-error=/var/opt/na/log/mysql-1/error.log
max_allowed_packet=33554432
max_connections=2300
max_heap_table_size=64M
open-files-limit=65535
optimizer_switch=default
pid-file=/var/opt/na/run/mysql-1.pid
port=3307
query_cache_type=0
replicate-ignore-db=mysql
server-id=1
skip-name-resolve
slave_net_timeout=60
slave_parallel_max_queued=1048576
slave_parallel_mode=optimistic
slave_parallel_threads=48
slow-query-log=0
slow_query_log_file=/var/opt/na/log/mysql-1/slow-query.log
socket=/data/mysql-1/mysql.sock
sql_mode=ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER
sync_binlog=0
sync_master_info=0
sysdate_is_now=TRUE
table_definition_cache=500
table_open_cache=10000
thread_handling=pool-of-threads
tmp_table_size=64M
tmpdir=/data/tmp/mysql-1
transaction-isolation=READ-COMMITTED



 Comments   
Comment by Kristian Nielsen [ 2023-06-07 ]

Thanks a lot for the excellent bug report with very good and detailed information!

I did a quick initial analysis. What we see is that one replication worker thread is stuck trying to commit, probably the transaction with GTID 0-2-696762572. The remaining 47 worker threads are waiting for the first thread, each with a transaction ready to commit in order one after the other. So from the replication side all looks fine.

What is really strange is where the first thread is stuck. It is inside InnoDB trying to write the commit record to the redo log, and it is hanging inside pthread_cond_signal(). That is not something that would normally be expected to be able to hang:

#0  0x00007f789dca054d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f789dc9e14d in pthread_cond_signal@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#2  0x000055de401c23cd in inline_mysql_cond_signal (that=0x7f4798006b78) at /home/buildbot/buildbot/build/include/mysql/psi/mysql_thread.h:1099
#3  dec_pending_ops (state=<synthetic pointer>, this=0x7f4798006b30) at /home/buildbot/buildbot/build/sql/sql_class.h:2535
#4  thd_decrement_pending_ops (thd=0x7f47980009b8) at /home/buildbot/buildbot/build/sql/sql_class.cc:5142
#5  0x000055de407b5726 in group_commit_lock::release (this=this@entry=0x55de41f0da80 <write_lock>, num=num@entry=216757233923465)
    at /home/buildbot/buildbot/build/storage/innobase/log/log0sync.cc:388
#6  0x000055de407a0a3c in log_write_up_to (lsn=<optimized out>, lsn@entry=216757233923297, flush_to_disk=flush_to_disk@entry=false, rotate_key=rotate_key@entry=false, 

The __lll_lock_wait() sounds like some internal glibc locking. I'll try to find source for pthread_cond_signal() and see what this could be.

If you have the chance, it could be useful to get another stack trace with debug info available for /lib64/libpthread.so.0 (I guess glibc debuginfo or something like that), to get the exact place where it is hanging.
Edited: Don't spend too much effort on this, the source code I found for pthread_cond_signal() seems clear enough to understand where it must be hanging.

I wonder if it is some glibc bug, or some MariaDB mis-use of pthread, or what could cause such a hang. If the problem is an after-effect of some corruption inside pthread_cond_signal() data structures, the real problem may have occured earlier and be hard to identify from the hung state...

Comment by Kristian Nielsen [ 2023-06-07 ]

If I understand correctly, the source code to the pthread_cond_signal() used in RHEL 7.9 should be this:

https://elixir.bootlin.com/glibc/glibc-2.17.90/source/nptl/pthread_cond_signal.c

This is a relatively simple function, and there's just one internal mutex taken, presumably the one we hang on.
This suggests that the cause of the hang is not directly related to replication, but to some memory corruption that happened to hit the InnoDB lock flush condition variable in this case.

Unfortunately this may make it hard to track down the real problem. It is interesting that you are able to reproduce reliably (if infrequently). It would be interesting to get this reproduced on a slave that is running some kind of memory debugging (valgrind/address sanitiser/...), but that is probably quite hard to pull off, as you wrote it only reproduces every couple of days.

One datapoint that would be very useful as a start would be to get stack traces of a couple of other hangs as you reproduce this again. To see if they get stuck in the same place on that same condition variable; that would suggest some problem in that specific code area (InnoDB group_commit_lock::release), which may make it a little easier to track down...

Comment by Kristian Nielsen [ 2023-06-08 ]

I looked a bit deeper, and I think this is a duplicate of MDEV-29843 .
In fact, there is a stack trace in that bug with a similarly hung thread as the above extract.
And MDEV-29843 fix comes in 10.6.11, while reporter is using 10.6.10.

I also now see that the hang is inside the async completion for the client in the thread pool. The patch for MDEV-29843 makes replication threads not use this mechanism, which sounds correct.

So I think I'll close this as a duplicate of MDEV-29843, and suggest to upgrade (or cherry-pick the MDEV-29843 patch) to get rid of these hangs. But feel free to reopen or come with further comments or questions in case of doubt.

- Kristian.

Comment by Kristian Nielsen [ 2023-06-08 ]

Looks like a duplicate of MDEV-29843

Comment by Mohamed Ismail [ 2023-06-08 ]

Thank you for your efforts. We will upgrade to latest release v10.6.14

Comment by Mohamed Ismail [ 2023-06-08 ]

@knielsen , Would like to know if there's any work around this issue until we plan an upgrade.

Comment by Kristian Nielsen [ 2023-06-08 ]

mariadbuser, you can try setting --thread-handling=one-thread-per-connection .
If I read the code correctly, that will avoid the code path that hangs.

Comment by Andrei Elkin [ 2023-06-09 ]

knielsen, I also looked at this case presented in our mailing list, to reply with a guess of
a hang/loop inside innodb:

From: andrei.elkin@pp.inet.fi
Subject: Re: [Maria-discuss] Backup on the replication server getting affected
To: ragul rangarajan <ragulrangarajan@gmail.com>
Cc: MariaDB discuss <maria-discuss@lists.launchpad.net>
Date: Fri, 02 Jun 2023 10:46:37 +0300 (1 week, 1 hour, 17 minutes ago)
Organization: Home sweet home
 
Howdy Ragul,
 
> Hi Andrei,
>
> Do we have any procedures to reproduce the issue MDEV-30780?
 
Thanks for posting the gdb bt:s. They rule out 30780 yet not
suggesting to me enough about the hang reason. This is something new to
me and does deserve filing an MDEV ticket.
Still I'd defer that until one has confirmed the same issue is seen
on the latest 10.6. So you could run your load against the most recent
slave version that'd be at least the safest (for our time).
 
It might be (a slave worker) Thread 80 spinning inside
 
   #6  0x000055de407a0a3c in log_write_up_to (lsn=<optimized out>, lsn@entry=216757233923297, flush_to_disk=flush_to_disk@entry=false, rotate_key=rotate_key@entry=false, 
 
a goto repeat "loop".
That hopefully you can confirm any next time the hang appears back.
Could you please check whether #6 calls iteratively indeed
`group_commit_lock::release()`? (With  e.g
  (gdb) br thd_decrement_pending_ops thread 80
of course the number may change:-)).

While you must've penetrated deeply, perhaps you can confirm my guess?
Thank you for looking into it!

Andrei

Comment by Kristian Nielsen [ 2023-06-09 ]

Elkin, I agree the issue on the mailing list looks precisely like this issue. I wrote a reply on the list.

- Kristian.

Generated at Thu Feb 08 10:23:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.