[MDEV-25782] MariaDB all nodes stuck on shutdown Created: 2021-05-26  Updated: 2021-06-11

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.5.10
Fix Version/s: 10.5

Type: Bug Priority: Major
Reporter: Jason Logan Assignee: Julius Goryavsky
Resolution: Unresolved Votes: 1
Labels: None
Environment:

Ubuntu 20.10 minimal no gui


Attachments: PNG File chrome_qjYJA7lzWW.png     Text File mariadb.log     File mariadbd.trace    

 Description   

All the following is logged in as root.

If I reboot any node it gets stuck at shutting down mariadb. It does the same thing if I do a "systemctl stop mariadb".

I've been shutting it down then doing a "ps -A | grep mariadb" and then a "kill -9 pid" to make it stop. Then I can perform upgrades, or reboots. The service must be killed to stop.

I've attached a log that is the same on all my nodes when I attempt to stop the service. I did see someone say something about time zones so I set one group of nodes in a cluster to UTC but it did not help. These are in the central time zone.

I've included the log of the shutdown to show when they are all getting stuck.

Also, I've let it sit for days and it will not shutdown.

I am happy to try anything because I have created a test cluster to see if I can get things working.



 Comments   
Comment by Sergei Golubchik [ 2021-05-26 ]

Do you have tables with indexed virtual columns?

Comment by Jason Logan [ 2021-05-26 ]

I do not use virtual columns.

Comment by Sergei Golubchik [ 2021-05-26 ]

Can you run something like

gdb --batch --eval-command="thread apply all bt full" /usr/sbin/mariadbd 12345 > /tmp/mariadbd.trace 

replacing 12345 with the actual mariadbd pid, of course

you might need to install mariadb*dbgsym* packages to get a meaningful output

Comment by Jason Logan [ 2021-05-26 ]

I uploaded the trace file: mariadbd.trace here

Comment by Sergei Golubchik [ 2021-05-27 ]

sysprg, what do you take from it?

Comment by Jason Logan [ 2021-06-10 ]

Is there any movement on this? This happens on CentOS 8 and Ubuntu 20.10

Comment by Julius Goryavsky [ 2021-06-10 ]

Usually at this point (judging by the log) we have a fully completed wsrep deinitialization and a transition to innodb deinitialization. However, here I do not see the line "[Note] InnoDB: FTS optimize thread exiting." in the log. Question to jason1430 - do I understand correctly that "mariadb.log" is the full log of the server? Or is it just a snippet that refers to wsrep, without other lines? Question to marko - can you tell me (based on "mariadbd.trace") if we got into innodb deinitialization in this case, or not? If not, then server probably got stuck in the wsrep deinitialization, which did not completed correctly.

Comment by Jason Logan [ 2021-06-10 ]

Yes, it is the full log.

Comment by Jason Logan [ 2021-06-10 ]

I had a node successfully stop. Here is the resulting log entry:

Normal Shutdown:
2021-06-10 18:08:51 0 [Note] WSREP: Flushing memory map to disk...
2021-06-10 18:08:51 0 [Note] InnoDB: FTS optimize thread exiting.
2021-06-10 18:08:51 0 [Note] InnoDB: Starting shutdown...
2021-06-10 18:08:51 0 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool
2021-06-10 18:08:51 0 [Note] InnoDB: Buffer pool(s) dump completed at 210610 18:08:51
2021-06-10 18:08:51 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2021-06-10 18:08:51 0 [Note] InnoDB: Shutdown completed; log sequence number 571364; transaction id 1173
2021-06-10 18:08:51 0 [Note] /usr/sbin/mariadbd: Shutdown complete

This is where it stops when it does not stop correctly:
Abnormal shutdown stops at:
WSREP: Flushing memory map to disk...

Comment by Julius Goryavsky [ 2021-06-11 ]

jason1430 thanks, now we must further study the issue to understand this hang in wsrep or wsrep managed to complete deinitialization, but then hangs occurs in the FTS processing (in innodb)

Comment by Marko Mäkelä [ 2021-06-11 ]

I do not see any occurrence of storage/innobase in mariadbd.trace. But, on a closer look, Thread 3 is executing in pthread_cond_timedwait(), with two ?? functions above it in the call stack. That might be the buf_flush_page_cleaner() thread. Also Thread 5 might be that thread, ?? is calling pthread_cond_wait(). Thread 4 appears to be the io_getevents() handler. So, clearly the InnoDB I/O subsystem is still up and running.

Thread 6 doesn’t look like InnoDB: some ?? is calling signal_hand(). Other threads seem to be an idle thread pool handler, and something Galera related.

It seems that InnoDB is at a very late phase of shutdown. Possibly Thread 4 is executing a sleep in logs_empty_and_mark_files_at_shutdown(). Where is the server error log? mariadb.log appears to be something else. The function logs_empty_and_mark_files_at_shutdown() should output the reason that is blocking the shutdown?

Could the parameter innodb_disallow_writes be a culprit for this? It is used by some Galera scripts, and the implementation is in my opinion misplaced: blocking the writes at the low level, instead of blocking them at the high level (blocking any operation that would generate redo log).

Finally, are there any InnoDB tables with FULLTEXT INDEX? I suspect that they cannot work correctly with Galera.

Comment by Jason Logan [ 2021-06-11 ]

I can get you any log you need.

These servers have no additional databases on them. They have the default DBs. I'm not using this cluster because I need it to be stable.

Generated at Thu Feb 08 09:40:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.