Details
-
Bug
-
Status: Needs Feedback (View Workflow)
-
Critical
-
Resolution: Unresolved
-
10.6.18
-
None
Description
After 10 years of successful operation of the script, a customer upgrades to a new version and encounters a failure in the following scenario:
------------------------
We have our internal tool named live_alter. In short it is used to make live DDLs without blocking the databases nodes and clusters. The working scenario is very near to the Percona tool with the same purpose.
New table structure is copied from the original one with the suffix _LIVE_ALTER at the end of the name.
Usually Necessary DDLs are executed on the _LIVE_ALTER table upfront, before following steps. This time I was missed one index and executed it while the LIVE_ALTER process was fully working, as described bellow.
Triggers for Insert,Update,Delete DMLs are attached to the original table to replicate the queries from original to the new table.
Then the script start moving groups of rows from original table to the LIVE_ALTER with INSERT SELECT expressions.
The difference here is that I ran CREATE INDEX statement on the new, LIVE_ALTER table, when the whole migration process is working: Triggers are manipulating the table with fresh DMLs and the script is moving old rows to the _LIVE_ALTER table.
I believe this is the reason for 2 of 3 nodes Cluster hang. Actually only the node which was used to execute the index remains alive.
------------------------
Excerpt from the log:
2024-08-06 13:34:56 9 [Note] WSREP: SH-EX trx conflict for key (0,FLAT8)a4c20bad 78e73f51: source: 59f93a4e-2e9d-11ef-ab8a-8a4efc00aac2 version: 5 local: 0 flags: 65 conn_id: 1414095382 trx_id: 83290372165 tstamp: 4097012309090758; state: seqnos (l: 1850571743, g: 52074153223, s: 52074153220, d: 52074153132) WS pa_range: 65535; state history: REPLICATING:0->CERTIFYING:3224 <---> source: 59f93a4e-2e9d-11ef-ab8a-8a4efc00aac2 version: 5 local: 0 flags: 69 conn_id: 1409957861 trx_id: -1 tstamp: 4097012308867404; state: seqnos (l: 1850571742, g: 52074153222, s: 52074153220, d: 52074153221) WS pa_range: 1; state history: REPLICATING:0->CERTIFYING:3224->APPLYING:490
|
2024-08-06 13:34:56 18 [Note] WSREP: SH-EX trx conflict for key (0,FLAT8)a4c20bad 78e73f51: source: 59f93a4e-2e9d-11ef-ab8a-8a4efc00aac2 version: 5 local: 0 flags: 65 conn_id: 1414095369 trx_id: 83290371134 tstamp: 4097012309316348; state: seqnos (l: 1850571744, g: 52074153224, s: 52074153220, d: 52074153132) WS pa_range: 65535; state history: REPLICATING:0->CERTIFYING:3224 <---> source: 59f93a4e-2e9d-11ef-ab8a-8a4efc00aac2 version: 5 local: 0 flags: 69 conn_id: 1409957861 trx_id: -1 tstamp: 4097012308867404; state: seqnos (l: 1850571742, g: 52074153222, s: 52074153220, d: 52074153221) WS pa_range: 1; state history: REPLICATING:0->CERTIFYING:3224->APPLYING:490
|
2024-08-06 13:34:56 36 [Note] WSREP: SH-EX trx conflict for key (0,FLAT8)a4c20bad 78e73f51: source: 59f93a4e-2e9d-11ef-ab8a-8a4efc00aac2 version: 5 local: 0 flags: 65 conn_id: 1414095390 trx_id: 83290372799 tstamp: 4097012310113431; state: seqnos (l: 1850571746, g: 52074153226, s: 52074153220, d: 52074153132) WS pa_range: 65535; state history: REPLICATING:0->CERTIFYING:3224 <---> source: 59f93a4e-2e9d-11ef-ab8a-8a4efc00aac2 version: 5 local: 0 flags: 69 conn_id: 1409957861 trx_id: -1 tstamp: 4097012308867404; state: seqnos (l: 1850571742, g: 52074153222, s: 52074153220, d: 52074153221) WS pa_range: 1; state history: REPLICATING:0->CERTIFYING:3224->APPLYING:490
|
2024-08-06 13:34:56 33 [Note] WSREP: SH-EX trx conflict for key (0,FLAT8)a4c20bad 78e73f51: source: 59f93a4e-2e9d-11ef-ab8a-8a4efc00aac2 version: 5 local: 0 flags: 65 conn_id: 1414095370 trx_id: 83290371214 tstamp: 4097012310890468; state: seqnos (l: 1850571748, g: 52074153227, s: 52074153221, d: 52074153179) WS pa_range: 65535; state history: REPLICATING:0->CERTIFYING:3224 <---> source: 59f93a4e-2e9d-11ef-ab8a-8a4efc00aac2 version: 5 local: 0 flags: 69 conn_id: 1409957861 trx_id: -1 tstamp: 4097012308867404; state: seqnos (l: 1850571742, g: 52074153222, s: 52074153220, d: 52074153221) WS pa_range: 1; state history: REPLICATING:0->CERTIFYING:3224->APPLYING:490
|
2024-08-06 13:34:56 28 [Note] WSREP: SH-EX trx conflict for key (0,FLAT8)a4c20bad 78e73f51: source: 59f93a4e-2e9d-11ef-ab8a-8a4efc00aac2 version: 5 local: 0 flags: 65 conn_id: 1414095372 trx_id: 83290371351 tstamp: 4097012311479950; state: seqnos (l: 1850571749, g: 52074153228, s: 52074153221, d: 52074153179) WS pa_range: 65535; state history: REPLICATING:0->CERTIFYING:3224 <---> source: 59f93a4e-2e9d-11ef-ab8a-8a4efc00aac2 version: 5 local: 0 flags: 69 conn_id: 1409957861 trx_id: -1 tstamp: 4097012308867404; state: seqnos (l: 1850571742, g: 52074153222, s: 52074153220, d: 52074153221) WS pa_range: 1; state history: REPLICATING:0->CERTIFYING:3224->APPLYING:490
|
2024-08-06 13:34:56 25 [Note] WSREP: SH-EX trx conflict for key (0,FLAT8)a4c20bad 78e73f51: source: 59f93a4e-2e9d-11ef-ab8a-8a4efc00aac2 version: 5 local: 0 flags: 65 conn_id: 1414095373 trx_id: 83290371492 tstamp: 4097012314550639; state: seqnos (l: 1850571751, g: 52074153230, s: 52074153221, d: 52074153179) WS pa_range: 65535; state history: REPLICATING:0->CERTIFYING:3224 <---> source: 59f93a4e-2e9d-11ef-ab8a-8a4efc00aac2 version: 5 local: 0 flags: 69 conn_id: 1409957861 trx_id: -1 tstamp: 4097012308867404; state: seqnos (l: 1850571742, g: 52074153222, s: 52074153220, d: 52074153221) WS pa_range: 1; state history: REPLICATING:0->CERTIFYING:3224->APPLYING:490->COMMITTING:1301
|
2024-08-06 13:34:56 57 [Note] WSREP: SH-EX trx conflict for key (0,FLAT8)a4c20bad 78e73f51: source: 59f93a4e-2e9d-11ef-ab8a-8a4efc00aac2 version: 5 local: 0 flags: 65 conn_id: 1414095381 trx_id: 83290371683 tstamp: 4097012322639331; state: seqnos (l: 1850571754, g: 52074153233, s: 52074153221, d: 52074153179) WS pa_range: 65535; state history: REPLICATING:0->CERTIFYING:3224 <---> source: 59f93a4e-2e9d-11ef-ab8a-8a4efc00aac2 version: 5 local: 0 flags: 69 conn_id: 1409957861 trx_id: -1 tstamp: 4097012308867404; state: seqnos (l: 1850571742, g: 52074153222, s: 52074153220, d: 52074153221) WS pa_range: 1; state history: REPLICATING:0->CERTIFYING:3224->APPLYING:490->COMMITTING:1301
|
2024-08-06 13:38:36 27 [Note] WSREP: SH-EX trx conflict for key (0,FLAT8)244902f1 d9c0c8dd: source: 59f93a4e-2e9d-11ef-ab8a-8a4efc00aac2 version: 5 local: 0 flags: 65 conn_id: 1414210759 trx_id: 83297369612 tstamp: 4097231736502658; state: seqnos (l: 1850696001, g: 52074267978, s: 52074267975, d: 52074267951) WS pa_range: 65535; state history: REPLICATING:0->CERTIFYING:3224 <---> source: 59f93a4e-2e9d-11ef-ab8a-8a4efc00aac2 version: 5 local: 0 flags: 69 conn_id: 1409957861 trx_id: -1 tstamp: 4097231734057415; state: seqnos (l: 1850695999, g: 52074267976, s: 52074267975, d: 52074267975) WS pa_range: 1; state history: REPLICATING:0->CERTIFYING:3224->APPLYING:490->COMMITTING:1301
|
2024-08-06 13:38:36 65 [Note] WSREP: MDL BF-BF conflict
|
schema: nl_game_providers
|
request: (65 seqno 52074267979 wsrep (high priority, exec, executing) cmd 0 161 UPDATE nl_game_providers.game_sessions SET expire_ts = ( UNIX_TIMESTAMP() + '5400' ) WHERE recno = '1923530980',ý±f^SÇ`^A)
|
granted: (59 seqno 52074267976 wsrep (toi, exec, committed) cmd 0 2 create index `create_ts` ON game_sessions_LIVE_ALTER(`create_ts`))
|
2024-08-06 13:38:36 65 [ERROR] Aborting
|
2024-08-06 13:48:41 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
|
240806 13:48:41 [ERROR] mysqld got signal 6 ;
|
Sorry, we probably made a mistake, and this is a bug.
|
|
Your assistance in bug reporting will enable us to fix this for the next release.
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.6.18-MariaDB-log source revision: 887bb3f73555ff8a50138a580ca8308b9b5c069c
|
key_buffer_size=5242880
|
read_buffer_size=131072
|
max_used_connections=1285
|
max_threads=65537
|
thread_count=981
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 75555347 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x0 thread_stack 0x49000
|
0x133c60c <my_print_stacktrace+0x3c> at /usr/local/libexec/mariadbd
|
0xccb3cf <handle_fatal_signal+0x27f> at /usr/local/libexec/mariadbd
|
0x828b4d4af <pthread_sigmask+0x53f> at /lib/libthr.so.3
|
0x828b4ca6b <pthread_setschedparam+0x83b> at /lib/libthr.so.3
|
0x7ffffffff2d3 <???> at ???
|
0x82d04c41a <__sys_thr_kill+0xa> at /lib/libc.so.7
|
0x82cfc5e64 <__raise+0x34> at /lib/libc.so.7
|
0x82d0766f9 <abort+0x49> at /lib/libc.so.7
|
0x12dcf50 <wsrep_thd_is_local_transaction+0x1d0300> at /usr/local/libexec/mariadbd
|
0x12b6ee5 <wsrep_thd_is_local_transaction+0x1aa295> at /usr/local/libexec/mariadbd
|
0x12dfb8d <_ZN5tpool19thread_pool_generic13timer_generic3runEv+0x3d> at /usr/local/libexec/mariadbd
|
0x12e0497 <_ZN5tpool4task7executeEv+0x27> at /usr/local/libexec/mariadbd
|
0x12de0d6 <_ZN5tpool19thread_pool_generic11worker_mainEPNS_11worker_dataE+0x76> at /usr/local/libexec/mariadbd
|
0x12dfc66 <_ZN5tpool19thread_pool_generic13timer_generic3runEv+0x116> at /usr/local/libexec/mariadbd
|
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains
|
information that should help you find out what is causing the crash.
|
Core pattern: %N.core
|