[MDEV-24433] Intentionally crash the server because it appears to be hung. Created: 2020-12-17  Updated: 2021-01-15  Resolved: 2021-01-15

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.6
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Riad Mekmouche Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: need_feedback


 Description   

Following upgrade of our database from 10.2.8 to 10.5.6
Our database started to crash every days
[ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.

After investigation see below, we decided to add primary key on table and also temporary table => No crash for several days. Hope it continue.

No change in our application
In the log I found relevant information to investigate and try to workaround:

Code involved:

--Thread 139635575420672 has waited at dict0boot.ic line 37 for xxx seconds the semaphore:
Mutex DICT_SYS created in dict0dict.cc:1032, lock var 2

*Doing some research I found the occurence of this wait was caused by inserting a row in a inno table without primary key.
*
row_ins -> row_ins_alloc_row_id_step -> dict_sys_get_new_row_id

See code row0ins.cc
/**********************************************************//*
Allocates a row id for row and inits the node->index field. */
UNIV_INLINE
void
row_ins_alloc_row_id_step(
/======================/
ins_node_t* node) /*!< in: row insert node */
{
row_id_t row_id;

ut_ad(node->state == INS_NODE_ALLOC_ROW_ID);

if (dict_index_is_unique(dict_table_get_first_index(node->table)))

{ /* No row id is stored if the clustered index is unique */ return; }

/* Fill in row id value to row */

row_id = dict_sys_get_new_row_id();

dict_sys_write_row_id(node->sys_buf, row_id);
}



 Comments   
Comment by Marko Mäkelä [ 2020-12-18 ]

Rmekmouche, the diagnostic output in the error log does not include enough detail. Because a deadlock or hang involves multiple threads, it is easiest to analyze hangs based on the function call stack traces of all active threads. Please follow the advice in https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ and try to produce such stack traces (for example, thread apply all backtrace in gdb) during the hang.

Because you are reporting this for 10.5.6, the hang cannot be a duplicate of MDEV-24188 (which affects the two latest available releases in the 10.2, 10.3, 10.4, 10.5 series).

If you upgraded straight from 10.2.8 to 10.3 or later, then this hang might be related to MDEV-15912. Even if you performed a slow shutdown before the upgrade (SET GLOBAL innodb_fast_shutdown=0 before the shutdown of the 10.2 server), later in the 10.2 series we fixed many bugs related to purge and shutdown. One example is MDEV-18936, which was fixed in 10.2.23. At least until then, it was possible that a slow shutdown did not empty all undo logs as it is supposed to.

Generated at Thu Feb 08 09:29:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.