[MDEV-31051] Queries show as KILLED but hang indefinitely and lock-up Galera Cluster Created: 2023-04-13  Updated: 2023-06-15  Resolved: 2023-05-24

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6.12
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Rob Schwyzer Assignee: Rob Schwyzer
Resolution: Incomplete Votes: 3
Labels: None
Environment:

Azure Kubernetes containers


Issue Links:
Relates

 Description   

Id	User	Host	db	Command	Time	State	Info	Progress
...
211332	user	IP:33930	DB	Killed	998	starting	COMMIT	0.000
211371	user	IP:41030	DB	Killed	944	starting	COMMIT	0.000
211389	user	IP:35044	DB	Killed	1162	starting	COMMIT	0.000
211395	user	IP:52656	DB	Killed	1181	NULL	set autocommit=1	0.000
211401	user	IP:36886	DB	Killed	1265	starting	COMMIT	0.000
211414	user	IP:41334	DB	Query	      1191	Commit	INSERT INTO TABLE (
        SSID,
        ACTIVITY_DATETIME,
        USER_RSID,
         
  	0.000
...

The above is a small snippet from SHOW PROCESSLIST. Pulling one of the KILLED transactions from SHOW ENGINE INNODB STAUTS which ran slightly earlier-

---TRANSACTION 4123705336, ACTIVE 904 sec
12 lock struct(s), heap size 1128, 6 row lock(s), undo log entries 5
MariaDB thread id 211332, OS thread handle 139648232093440, query id 202880013 IP user starting
COMMIT
Trx read view will not see trx with id >= 4123705336, sees < 4123702558

Customer retested workload on 10.5.18 Enterprise Server Galera Cluster and did not encounter this problem.

We worked with the customer to test a case where only a single Galera node was fed all traffic (write and read) and verified the behavior persists and only the node handling the read/write workload is affected.



 Comments   
Comment by Valerii Kravchuk [ 2023-04-13 ]

What we rally need to verify the root cause is the entire processlist output and, even more important, full gdb backtrace of all threads.

Generated at Thu Feb 08 10:20:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.