|
Can you please attach GDB to the process and produce a list of stack traces?
sudo gdb -p $(pgrep mariadbd)
|
If you convert the decimal "OS thread handle" in the output to hexadecimal (for example, p/x 139854291961600 in GDB shows 0x7f325d648700), you should be able to find that thread: thread find 0x7f325d648700. The thread handles are of course specific to each process instance.
To display the stack traces of all threads, you can run something like the following in GDB:
set log file /tmp/log.txt
|
set height 0
|
set log on
|
thread apply all backtrace
|
detach
|
quit
|
|
|
Added innodb status as stacktraces.
log.txt log-debug.txt innodb_status.txt
hread 52 (Thread 0x7f1115501700 (LWP 1339999)):
|
#0 0x00007f51f418f51d in syscall () from target:/lib64/libc.so.6
|
#1 0x000055f42222a818 in srw_mutex_impl<true>::wait (lk=<optimized out>, this=0x7f51a80de8b8) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:430
|
#2 srw_mutex_impl<true>::wait_and_lock (this=this@entry=0x7f51a80de8b8) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:430
|
#3 0x000055f42222ade3 in srw_mutex_impl<true>::wr_lock (this=0x7f51a80de8b8) at /usr/include/c++/8/bits/atomic_base.h:522
|
#4 ssux_lock_impl<true>::rd_wait (this=this@entry=0x7f51a80de8b8) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:523
|
#5 0x000055f42219fe8d in ssux_lock_impl<true>::rd_lock (this=0x7f51a80de8b8) at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/mem0mem.inl:393
|
#6 sux_lock<ssux_lock_impl<true> >::s_lock (this=0x7f51a80de8b8) at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/sux_lock.h:363
|
#7 mtr_t::page_lock (this=this@entry=0x7f11154fe940, block=block@entry=0x7f51a80de8a0, rw_latch=rw_latch@entry=1) at /usr/src/debug/MariaDB-/src_0/storage/innobase/mtr/mtr0mtr.cc:1181
|
#8 0x000055f42229086c in buf_page_get_low (page_id=..., zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=<optimized out>, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0buf.cc:2957
|
#9 0x000055f4222734d7 in btr_cur_search_to_nth_level_func (index=index@entry=0x55f425ad19b0, level=level@entry=0, tuple=tuple@entry=0x7f10381e75f0, mode=mode@entry=PAGE_CUR_LE, latch_mode=<optimized out>, latch_mode@entry=2, cursor=cursor@entry=0x7f11154fdfa0, ahi_latch=<optimized out>, mtr=<optimized out>,
|
autoinc=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/dict0mem.h:1176
|
|
|
In log-debug.txt I can see that Thread 18 is waiting for a page latch. Unfortunately, I forgot to suggest the following
set print frame-arguments all
|
so the page number is obscured:
Thread 18 (Thread 0x7f1115b28700 (LWP 711075)):
|
#0 0x00007f51f418f51d in syscall () from target:/lib64/libc.so.6
|
#1 0x000055f42222ad28 in ssux_lock_impl<true>::wait (lk=<optimized out>, this=0x7f51a80de8b8) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:353
|
#2 ssux_lock_impl<true>::wr_wait (this=0x7f51a80de8b8, lk=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:509
|
#3 0x000055f422274f3f in ssux_lock_impl<true>::wr_lock (this=0x7f51a80de8b8) at /usr/src/debug/MariaDB-/src_0/storage/innobase/btr/btr0cur.cc:1969
|
#4 sux_lock<ssux_lock_impl<true> >::x_lock (for_io=false, this=0x7f51a80de8b8) at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/sux_lock.h:396
|
#5 mtr_t::x_latch_at_savepoint (block=0x7f51a80de8a0, savepoint=<optimized out>, this=0x7f1115b25940) at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/mtr0mtr.inl:148
|
#6 btr_cur_search_to_nth_level_func (index=index@entry=0x55f425ad19b0, level=level@entry=0, tuple=tuple@entry=0x7f0f6d5c35d0, mode=mode@entry=PAGE_CUR_LE, latch_mode=<optimized out>, latch_mode@entry=33, cursor=cursor@entry=0x7f1115b24fa0, ahi_latch=<optimized out>, mtr=<optimized out>, autoinc=<optimized out>)
|
at /usr/src/debug/MariaDB-/src_0/storage/innobase/btr/btr0cur.cc:2229
|
#7 0x000055f4221e40ea in btr_pcur_open_low (level=0, mtr=0x7f1115b25940, autoinc=<optimized out>, cursor=0x7f1115b24fa0, latch_mode=33, mode=PAGE_CUR_LE, tuple=0x7f0f6d5c35d0, index=0x55f425ad19b0) at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/btr0pcur.inl:369
|
#8 row_ins_clust_index_entry_low (flags=0, mode=33, index=0x55f425ad19b0, n_uniq=2, entry=0x7f0f6d5c35d0, n_ext=0, thr=0x7f0f6d5ad208) at /usr/src/debug/MariaDB-/src_0/storage/innobase/row/row0ins.cc:2605
|
#9 0x000055f4221e8803 in row_ins_clust_index_entry (index=0x55f425ad19b0, entry=0x7f0f6d5c35d0, thr=0x7f0f6d5ad208, n_ext=0) at /usr/src/debug/MariaDB-/src_0/storage/innobase/row/row0ins.cc:3248
|
This is executing an insert. The locks prioritize exclusive access, so all further requests on this lock (easily found by searching for 0x7f51a80de8a0) will be blocked.
What would print/x block->page.lock display for the block? Are you able to reproduce this with the rr debugger? We got one 10.6 mystery hang earlier, in MDEV-27026. I was unable to reproduce it myself, and while the fix solved that particular problem, this was strong indication that the lock implementation could be broken.
|
|
I am not able to reproduce it. It occurred 3 time in one week, and then I had to wait 2 weeks for the next occurrence to create the stack traces. Can even find nothing special happening at those time from an application perspective.
But will try to get the frame-arguments next time.
|
|
sstamm, thank you. The problem in MDEV-27026 was only repeatable on certain AMD64 based systems by running a test very many times. For that hang, the apparent scenario was that two near concurrent acquisitions of an exclusive lock would lead to a hang, and the "fix" was to remove an unnecessary exclusive lock acquisition. In this case, the lock is different, and it is very well possible that multiple threads are concurrently acquiring exclusive or shared locks on a page.
According to a core dump that I analyzed for MDEV-27026, one shared lock was apparently being held according to the data structure, but no stack trace would match that. A quick attempt to modify the unit test storage/innobase/unittest/innodb_sync-t.cc to exercise this scenario (2 exclusive lock requests concurrently with N shared lock requests) failed to reproduce the hang back then.
We have not reproduced this kind of hangs in our regular stress testing. The most recent hang was something much more obvious, at a higher level: MDEV-27985.
Maybe we should do some extra test rounds with release executables or build configurations, in addition to testing debug instrumented executables. It could also be that this bug is only reproducible with specific CPU generations, and our main test systems are ‘immune’.
|
|
Possible, running 2x Intel(R) Xeon(R) Gold 6234 + 2 Intel DC U.2 2,5 P4610 SSD via VRoC, maybe the high throughput causes some kind of race condition.
If you have any simple test I can try to run it on the server.
|
|
My workstation is 2×Intel Xeon E5-2630 v4, and I could not reproduce the hang there. My colleagues reproduced the hang on two laptops, one of which was rather new (but still older than Alder Lake).
I’d suggest to compile the unit test from the source code, and modify it. Possibly some delays or sleeps will have to to be added into the critical sections.
|
|
sstamm, would it be possible to get backtrace full of the hanging lock wait (or thread apply all backtrace full of all threads), so that we can see the value of the lock word?
|
|
Added log-full.txt hopefully it has the ' backtrace full', was I bit in a hurry.
---TRANSACTION 549553844, ACTIVE 2400 sec inserting
|
mysql tables in use 1, locked 1
|
11 lock struct(s), heap size 1128, 5 row lock(s), undo log entries 5
|
MariaDB thread id 17233104, OS thread handle 140247800051456, query id 1641081665 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 549553837, ACTIVE 2400 sec inserting
|
mysql tables in use 1, locked 1
|
11 lock struct(s), heap size 1128, 5 row lock(s), undo log entries 6
|
MariaDB thread id 17233103, OS thread handle 140525911111424, query id 1641081422 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 549553830, ACTIVE 2400 sec inserting
|
mysql tables in use 1, locked 1
|
12 lock struct(s), heap size 1128, 5 row lock(s), undo log entries 93
|
MariaDB thread id 17233102, OS thread handle 140525911418624, query id 1641081250 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
|
|
Next one:
|
=====================================
|
2022-04-08 11:24:37 0x7f3d09357700 INNODB MONITOR OUTPUT
|
=====================================
|
Per second averages calculated from the last 3 seconds
|
-----------------
|
BACKGROUND THREAD
|
-----------------
|
srv_master_thread loops: 0 srv_active, 0 srv_shutdown, 689190 srv_idle
|
srv_master_thread log flush and writes: 689180
|
----------
|
SEMAPHORES
|
----------
|
------------
|
TRANSACTIONS
|
------------
|
Trx id counter 556155684
|
Purge done for trx's n:o < 556141965 undo n:o < 0 state: running
|
History list length 6827
|
LIST OF TRANSACTIONS FOR EACH SESSION:
|
---TRANSACTION 556155654, ACTIVE 26 sec inserting
|
mysql tables in use 1, locked 1
|
LOCK WAIT 4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
|
MariaDB thread id 3415764, OS thread handle 139622042445568, query id 391112492 10.254.28.150 fccUser Update
|
insert into FlightRecord (`date`, deviceId, fileName, fileSize, firmwareVersion, forceReload, info, isCorrupt, jddVersion, `md5`, moveVersion, systemSerialNumber, totalBytes, type) values (null, 46348, 'RM3_2022-04-06_5310088160.CRYPT.TRC', 644305, null, 0, '{\"pid\":14656,\"time\":1203950,\"analyzer\":\"1.35.44\"}', 0, null, null, null, null, 0, 'IGNORED')
|
------- TRX HAS BEEN WAITING 25585922 ns FOR THIS LOCK TO BE GRANTED:
|
RECORD LOCKS space id 14354 page no 712865 n bits 184 index UKrukqstyowmh62hwtj79hmay72 of table `fcc_1350`.`flightrecord` trx id 556155654 lock mode S waiting
|
Record lock, heap no 178 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
|
0: len 4; hex 0000b50c; asc ;;
|
1: len 30; hex 524d335f323032322d30342d30365f353331303038383136302e43525950; asc RM3_2022-04-06_5310088160.CRYP; (total 35 bytes);
|
2: len 4; hex 02d35538; asc U8;;
|
|
------------------
|
---TRANSACTION 556155653, ACTIVE 27 sec inserting
|
mysql tables in use 1, locked 1
|
LOCK WAIT 4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
|
MariaDB thread id 3415763, OS thread handle 139900105090816, query id 391112487 10.254.28.150 fccUser Update
|
insert into FlightRecord (`date`, deviceId, fileName, fileSize, firmwareVersion, forceReload, info, isCorrupt, jddVersion, `md5`, moveVersion, systemSerialNumber, totalBytes, type) values (null, 45305, 'RM3_2022-04-06_5310042955.CRYPT.TRC', 2990731, null, 0, '{\"pid\":29756,\"time\":1200042,\"analyzer\":\"1.35.44\"}', 0, null, null, null, null, 0, 'IGNORED')
|
------- TRX HAS BEEN WAITING 26572893 ns FOR THIS LOCK TO BE GRANTED:
|
RECORD LOCKS space id 14354 page no 582857 n bits 272 index UKrukqstyowmh62hwtj79hmay72 of table `fcc_1350`.`flightrecord` trx id 556155653 lock mode S waiting
|
Record lock, heap no 270 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
|
0: len 4; hex 0000b0f9; asc ;;
|
1: len 30; hex 524d335f323032322d30342d30365f353331303034323935352e43525950; asc RM3_2022-04-06_5310042955.CRYP; (total 35 bytes);
|
2: len 4; hex 02d35537; asc U7;;
|
|
------------------
|
---TRANSACTION 556155652, ACTIVE 29 sec inserting
|
mysql tables in use 1, locked 1
|
LOCK WAIT 4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
|
MariaDB thread id 3415762, OS thread handle 139900104476416, query id 391112484 10.254.28.150 fccUser Update
|
insert into FlightRecord (`date`, deviceId, fileName, fileSize, firmwareVersion, forceReload, info, isCorrupt, jddVersion, `md5`, moveVersion, systemSerialNumber, totalBytes, type) values (null, 47425, 'RM3_2022-04-06_5300774464.CRYPT.TRC', 1260974, null, 0, '{\"pid\":33384,\"time\":1204338,\"analyzer\":\"1.35.44\"}', 0, null, null, null, null, 0, 'IGNORED')
|
------- TRX HAS BEEN WAITING 28198303 ns FOR THIS LOCK TO BE GRANTED:
|
RECORD LOCKS space id 14354 page no 604752 n bits 272 index UKrukqstyowmh62hwtj79hmay72 of table `fcc_1350`.`flightrecord` trx id 556155652 lock mode S waiting
|
Record lock, heap no 268 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
|
0: len 4; hex 0000b941; asc A;;
|
1: len 30; hex 524d335f323032322d30342d30365f353330303737343436342e43525950; asc RM3_2022-04-06_5300774464.CRYP; (total 35 bytes);
|
2: len 4; hex 02d35536; asc U6;;
|
|
------------------
|
---TRANSACTION 556155644, ACTIVE 35 sec inserting
|
mysql tables in use 1, locked 1
|
LOCK WAIT 4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
|
MariaDB thread id 3415759, OS thread handle 139900109793024, query id 391112440 10.254.28.150 fccUser Update
|
insert into FlightRecord (`date`, deviceId, fileName, fileSize, firmwareVersion, forceReload, info, isCorrupt, jddVersion, `md5`, moveVersion, systemSerialNumber, totalBytes, type) values (null, 50234, 'RM3_2022-04-05_5310036767.CRYPT.TRC', 1736118, null, 0, '{\"pid\":30804,\"time\":1204414,\"analyzer\":\"1.35.44\"}', 0, null, null, null, null, 0, 'IGNORED')
|
------- TRX HAS BEEN WAITING 34135832 ns FOR THIS LOCK TO BE GRANTED:
|
RECORD LOCKS space id 14354 page no 533077 n bits 320 index UKrukqstyowmh62hwtj79hmay72 of table `fcc_1350`.`flightrecord` trx id 556155644 lock mode S waiting
|
Record lock, heap no 313 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
|
0: len 4; hex 0000c43a; asc :;;
|
1: len 30; hex 524d335f323032322d30342d30355f353331303033363736372e43525950; asc RM3_2022-04-05_5310036767.CRYP; (total 35 bytes);
|
2: len 4; hex 02d35533; asc U3;;
|
|
------------------
|
---TRANSACTION 556155643, ACTIVE 38 sec inserting
|
mysql tables in use 1, locked 1
|
LOCK WAIT 4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
|
MariaDB thread id 3415758, OS thread handle 139900112865024, query id 391112437 10.254.28.150 fccUser Update
|
insert into FlightRecord (`date`, deviceId, fileName, fileSize, firmwareVersion, forceReload, info, isCorrupt, jddVersion, `md5`, moveVersion, systemSerialNumber, totalBytes, type) values (null, 80803, 'RM3_2022-04-05_5310118324.CRYPT.TRC', 2002305, null, 0, '{\"pid\":30360,\"time\":1203743,\"analyzer\":\"1.35.44\"}', 0, null, null, null, null, 0, 'IGNORED')
|
------- TRX HAS BEEN WAITING 37806583 ns FOR THIS LOCK TO BE GRANTED:
|
RECORD LOCKS space id 14354 page no 520760 n bits 104 index UKrukqstyowmh62hwtj79hmay72 of table `fcc_1350`.`flightrecord` trx id 556155643 lock mode S waiting
|
Record lock, heap no 97 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
|
0: len 4; hex 00013ba3; asc ; ;;
|
1: len 30; hex 524d335f323032322d30342d30355f353331303131383332342e43525950; asc RM3_2022-04-05_5310118324.CRYP; (total 35 bytes);
|
2: len 4; hex 02d35532; asc U2;;
|
|
------------------
|
---TRANSACTION 556155642, ACTIVE 39 sec inserting
|
mysql tables in use 1, locked 1
|
LOCK WAIT 4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
|
MariaDB thread id 3415757, OS thread handle 139900110100224, query id 391112434 10.254.28.150 fccUser Update
|
insert into FlightRecord (`date`, deviceId, fileName, fileSize, firmwareVersion, forceReload, info, isCorrupt, jddVersion, `md5`, moveVersion, systemSerialNumber, totalBytes, type) values (null, 53952, 'RM3_2022-04-05_5310057782.CRYPT.TRC', 122868, null, 0, '{\"pid\":2716,\"time\":1200037,\"analyzer\":\"1.35.44\"}', 0, null, null, null, null, 0, 'IGNORED')
|
------- TRX HAS BEEN WAITING 38325593 ns FOR THIS LOCK TO BE GRANTED:
|
RECORD LOCKS space id 14354 page no 105056 n bits 296 index UKrukqstyowmh62hwtj79hmay72 of table `fcc_1350`.`flightrecord` trx id 556155642 lock mode S waiting
|
Record lock, heap no 19 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
|
0: len 4; hex 0000d2c0; asc ;;
|
1: len 30; hex 524d335f323032322d30342d30355f353331303035373738322e43525950; asc RM3_2022-04-05_5310057782.CRYP; (total 35 bytes);
|
2: len 4; hex 02d35531; asc U1;;
|
|
------------------
|
---TRANSACTION 556155637, ACTIVE 44 sec inserting
|
mysql tables in use 1, locked 1
|
LOCK WAIT 4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
|
MariaDB thread id 3415756, OS thread handle 139900122081024, query id 391112428 10.254.28.150 fccUser Update
|
insert into FlightRecord (`date`, deviceId, fileName, fileSize, firmwareVersion, forceReload, info, isCorrupt, jddVersion, `md5`, moveVersion, systemSerialNumber, totalBytes, type) values (null, 108932, 'RM3_2022-04-04_7502930432.CRYPT.TRC', 147325, null, 0, '{\"pid\":41592,\"time\":1204271,\"analyzer\":\"1.35.44\"}', 0, null, null, null, null, 0, 'IGNORED')
|
------- TRX HAS BEEN WAITING 43281509 ns FOR THIS LOCK TO BE GRANTED:
|
RECORD LOCKS space id 14354 page no 869010 n bits 224 index UKrukqstyowmh62hwtj79hmay72 of table `fcc_1350`.`flightrecord` trx id 556155637 lock mode S waiting
|
Record lock, heap no 220 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
|
0: len 4; hex 0001a984; asc ;;
|
1: len 30; hex 524d335f323032322d30342d30345f373530323933303433322e43525950; asc RM3_2022-04-04_7502930432.CRYP; (total 35 bytes);
|
2: len 4; hex 02d3552f; asc U/;;
|
|
------------------
|
---TRANSACTION 556155634, ACTIVE 47 sec inserting
|
mysql tables in use 1, locked 1
|
LOCK WAIT 4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
|
MariaDB thread id 3415755, OS thread handle 139900110407424, query id 391112423 10.254.28.150 fccUser Update
|
insert into FlightRecord (`date`, deviceId, fileName, fileSize, firmwareVersion, forceReload, info, isCorrupt, jddVersion, `md5`, moveVersion, systemSerialNumber, totalBytes, type) values (null, 26874, 'RM3_2022-04-04_7501916388.CRYPT.TRC', 66646, null, 0, '{\"pid\":15680,\"time\":1204281,\"analyzer\":\"1.35.44\"}', 0, null, null, null, null, 0, 'IGNORED')
|
------- TRX HAS BEEN WAITING 46297236 ns FOR THIS LOCK TO BE GRANTED:
|
RECORD LOCKS space id 14354 page no 738936 n bits 248 index UKrukqstyowmh62hwtj79hmay72 of table `fcc_1350`.`flightrecord` trx id 556155634 lock mode S waiting
|
Record lock, heap no 239 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
|
0: len 4; hex 000068fa; asc h ;;
|
1: len 30; hex 524d335f323032322d30342d30345f373530313931363338382e43525950; asc RM3_2022-04-04_7501916388.CRYP; (total 35 bytes);
|
2: len 4; hex 02d3552e; asc U.;;
|
|
------------------
|
---TRANSACTION 556155627, ACTIVE 49 sec inserting
|
mysql tables in use 1, locked 1
|
LOCK WAIT 4 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
|
MariaDB thread id 3415753, OS thread handle 139900111636224, query id 391112382 10.254.28.150 fccUser Update
|
insert into FlightRecord (`date`, deviceId, fileName, fileSize, firmwareVersion, forceReload, info, isCorrupt, jddVersion, `md5`, moveVersion, systemSerialNumber, totalBytes, type) values (null, 80803, 'RM3_2022-04-04_5310118324.CRYPT.TRC', 1889040, null, 0, '{\"pid\":28248,\"time\":1200053,\"analyzer\":\"1.35.44\"}', 0, null, null, null, null, 0, 'IGNORED')
|
------- TRX HAS BEEN WAITING 48497666 ns FOR THIS LOCK TO BE GRANTED:
|
RECORD LOCKS space id 14354 page no 520760 n bits 104 index UKrukqstyowmh62hwtj79hmay72 of table `fcc_1350`.`flightrecord` trx id 556155627 lock mode S waiting
|
Record lock, heap no 96 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
|
0: len 4; hex 00013ba3; asc ; ;;
|
1: len 30; hex 524d335f323032322d30342d30345f353331303131383332342e43525950; asc RM3_2022-04-04_5310118324.CRYP; (total 35 bytes);
|
2: len 4; hex 02d3552d; asc U-;;
|
|
------------------
|
---TRANSACTION (0x7f3d1cef0430), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION (0x7f3d1ceeaf30), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION 556149935, ACTIVE 1223 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3412101, OS thread handle 139900110714624, query id 390909717 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556149892, ACTIVE 1224 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3412080, OS thread handle 139900112557824, query id 390906762 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556149847, ACTIVE 1225 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3412057, OS thread handle 139900111329024, query id 390905047 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556149636, ACTIVE 1231 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3411948, OS thread handle 139900112250624, query id 390894872 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556149576, ACTIVE 1234 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3411917, OS thread handle 139900116858624, query id 390891467 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556149500, ACTIVE 1238 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3411877, OS thread handle 139900123924224, query id 390888074 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556149369, ACTIVE 1242 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3411809, OS thread handle 139900111943424, query id 390881703 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556149172, ACTIVE 1245 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3411710, OS thread handle 139900113786624, query id 390870965 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556149092, ACTIVE 1246 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 6
|
MariaDB thread id 3411671, OS thread handle 139900113172224, query id 390866004 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556148924, ACTIVE 1250 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3411581, OS thread handle 139900119009024, query id 390852307 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556148820, ACTIVE 1253 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3411525, OS thread handle 139900115629824, query id 390842733 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556148766, ACTIVE 1254 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3411495, OS thread handle 139900118087424, query id 390839179 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556148698, ACTIVE 1256 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3411458, OS thread handle 139900123617024, query id 390835422 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556148484, ACTIVE 1260 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3411347, OS thread handle 139900114708224, query id 390820079 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556148403, ACTIVE 1263 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3411303, OS thread handle 139900118701824, query id 390813522 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556148310, ACTIVE 1264 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3411256, OS thread handle 139900116244224, query id 390806010 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556148040, ACTIVE 1269 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3411085, OS thread handle 139900124845824, query id 390785292 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556147815, ACTIVE 1272 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3410974, OS thread handle 139900116551424, query id 390768777 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556147547, ACTIVE 1275 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3410764, OS thread handle 139900117165824, query id 390753997 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556147188, ACTIVE 1279 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3410479, OS thread handle 139900118394624, query id 390734597 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556146954, ACTIVE 1281 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3410297, OS thread handle 139900123002624, query id 390720193 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556146468, ACTIVE 1287 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3409970, OS thread handle 139900124538624, query id 390679395 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556146225, ACTIVE 1290 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3409820, OS thread handle 139900122695424, query id 390659467 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556146048, ACTIVE 1292 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3409712, OS thread handle 139900444956416, query id 390644235 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556145895, ACTIVE 1294 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3409613, OS thread handle 139900445263616, query id 390629731 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556145536, ACTIVE 1298 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3409379, OS thread handle 139900121466624, query id 390601605 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556145327, ACTIVE 1302 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3409248, OS thread handle 139900122388224, query id 390584271 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556145242, ACTIVE 1303 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3409197, OS thread handle 139900126074624, query id 390577125 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556144994, ACTIVE 1307 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3409044, OS thread handle 139900125767424, query id 390555030 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556144969, ACTIVE 1307 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3409029, OS thread handle 139900121159424, query id 390552860 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values ('40000001', '00000090', 'ERROR', 536870912, 0, 0, 243000769)
|
---TRANSACTION 556144718, ACTIVE 1311 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3408870, OS thread handle 139900117780224, query id 390533685 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556144450, ACTIVE 1316 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3408706, OS thread handle 139900444342016, query id 390514021 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556144367, ACTIVE 1317 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3408656, OS thread handle 139900119316224, query id 390506658 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556144365, ACTIVE 1317 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3408657, OS thread handle 139900115015424, query id 390506617 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556144282, ACTIVE 1319 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 3408601, OS thread handle 139900120545024, query id 390500245 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 556144275, ACTIVE 1319 sec inserting
|
mysql tables in use 1, locked 1
|
10 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 23
|
MariaDB thread id 3408599, OS thread handle 139900117473024, query id 390500041 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION (0x7f3d1cec6d30), ACTIVE 1009 sec starting index read
|
mysql tables in use 8, locked 0
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 3407721, OS thread handle 139900120852224, query id 390915843 10.254.28.149 fccUser Sending data
|
SET STATEMENT max_statement_time=180 FOR SELECT DISTINCT HEX(objectId & 0xFFFFFFFF), objectId FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN rm4flightrecord AS rm4Fr USING(flightRecordId) JOIN event AS evt USING(flightRecordId) JOIN Sequence AS seq USING(eventId) JOIN FirmwareError AS fw USING(sequenceId)
|
Trx read view will not see trx with id >= 556150149, sees < 556144275
|
---TRANSACTION (0x7f3d1cec2930), ACTIVE 1009 sec starting index read
|
mysql tables in use 8, locked 0
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 3406658, OS thread handle 139900120237824, query id 390915842 10.254.28.149 fccUser Sending data
|
SET STATEMENT max_statement_time=180 FOR SELECT DISTINCT classId FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN rm4flightrecord AS rm4Fr USING(flightRecordId) JOIN event AS evt USING(flightRecordId) JOIN Sequence AS seq USING(eventId) JOIN FirmwareError AS fw USING(sequenceId)
|
Trx read view will not see trx with id >= 556150149, sees < 556144275
|
---TRANSACTION (0x7f3d1cec5c30), ACTIVE 1720 sec fetching rows
|
mysql tables in use 8, locked 0
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 3405613, OS thread handle 139900125460224, query id 390190990 10.254.28.149 fccUser Sending data
|
SET STATEMENT max_statement_time=180 FOR SELECT DISTINCT classId FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN rm4flightrecord AS rm4Fr USING(flightRecordId) JOIN event AS evt USING(flightRecordId) JOIN Sequence AS seq USING(eventId) JOIN FirmwareError AS fw USING(sequenceId)
|
Trx read view will not see trx with id >= 556141964, sees < 556141964
|
---TRANSACTION (0x7f3d1cebf630), ACTIVE 1720 sec fetching rows
|
mysql tables in use 8, locked 0
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 3405588, OS thread handle 139900125153024, query id 390190992 10.254.28.149 fccUser Sending data
|
SET STATEMENT max_statement_time=180 FOR SELECT DISTINCT HEX(objectId & 0xFFFFFFFF), objectId FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN rm4flightrecord AS rm4Fr USING(flightRecordId) JOIN event AS evt USING(flightRecordId) JOIN Sequence AS seq USING(eventId) JOIN FirmwareError AS fw USING(sequenceId)
|
Trx read view will not see trx with id >= 556141964, sees < 556141964
|
---TRANSACTION (0x7f3d1cebe530), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION (0x7f3d1cebb230), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION (0x7f3d1cebc330), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION (0x7f3d1cebd430), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION (0x7f3d1ceba130), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
--------
|
FILE I/O
|
--------
|
Pending flushes (fsync) log: 0; buffer pool: 0
|
34862639 OS file reads, 124252874 OS file writes, 1031217 OS fsyncs
|
0.00 reads/s, 0 avg bytes/read, 0.33 writes/s, 0.33 fsyncs/s
|
-------------------------------------
|
INSERT BUFFER AND ADAPTIVE HASH INDEX
|
-------------------------------------
|
Ibuf: size 1304, free list len 379796, seg size 381101, 161 merges
|
merged operations:
|
insert 2099, delete mark 907, delete 907
|
discarded operations:
|
insert 0, delete mark 0, delete 0
|
0.00 hash searches/s, 0.00 non-hash searches/s
|
---
|
LOG
|
---
|
Log sequence number 28993909794916
|
Log flushed up to 28993909794916
|
Pages flushed up to 28993611427584
|
Last checkpoint at 28993611427584
|
0 pending log flushes, 0 pending chkp writes
|
541636 log i/o's done, 0.33 log i/o's/second
|
----------------------
|
BUFFER POOL AND MEMORY
|
----------------------
|
Total large memory allocated 275012124672
|
Dictionary memory allocated 1700212936
|
Buffer pool size 16613376
|
Free buffers 1
|
Database pages 19396134
|
Old database pages 7159881
|
Modified db pages 118679
|
Percent of dirty pages(LRU & free pages): 0.612
|
Max dirty pages percent: 90.000
|
Pending reads 0
|
Pending writes: LRU 0, flush list 0
|
Pages made young 31582803, not young 806687281
|
0.00 youngs/s, 0.00 non-youngs/s
|
Pages read 34849708, created 5920383, written 123711308
|
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
|
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
|
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
|
LRU len: 19396134, unzip_LRU len: 3854418
|
I/O sum[0]:cur[0], unzip sum[38]:cur[0]
|
--------------
|
ROW OPERATIONS
|
--------------
|
4 read views open inside InnoDB
|
Process ID=0, Main thread ID=0, state: sleeping
|
Number of rows inserted 510872046, updated 5384958, deleted 231214445, read 129431674923
|
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
|
Number of system rows inserted 0, updated 0, deleted 0, read 0
|
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
|
----------------------------
|
END OF INNODB MONITOR OUTPUT
|
============================
|
|
log_2022-04-08.txt
|
|
sstamm, sorry, no, any InnoDB error log output is not helpful for diagnosing hangs. Stack traces of all threads, as produced by GDB, would be.
However, I have some good news. I can reproduce a hang on MariaDB Server 10.6 on a system that is remotely accessible to me, if I revert this change that we knew was not really the fix of MDEV-27026, but fixed an occasional hang in a test. I used GCC 9.3.0 and an unoptimized debug build:
mkdir bld
|
cmake -DCMAKE_BUILD_TYPE=Debug ..
|
make -j$(nproc)
|
cd mysql-test
|
./mtr --mem --mysqld=--innodb-fatal-semaphore-wait-threshold=3 --parallel=auto --repeat=1000 innodb_fts.concurrent_insert{,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,}
|
Then, I repeated the experiment by replacing the use of futex with pthread_mutex_t and pthread_cond_t, as follows. This depends on the fix of MDEV-26781:
diff --git a/storage/innobase/include/srw_lock.h b/storage/innobase/include/srw_lock.h
|
index 1dca0cc1054..5d5bc28a5de 100644
|
--- a/storage/innobase/include/srw_lock.h
|
+++ b/storage/innobase/include/srw_lock.h
|
@@ -30,7 +30,7 @@ this program; if not, write to the Free Software Foundation, Inc.,
|
# define SUX_LOCK_GENERIC /* fall back to generic synchronization primitives */
|
#endif
|
|
-#if !defined SUX_LOCK_GENERIC && 0 /* defined SAFE_MUTEX */
|
+#if !defined SUX_LOCK_GENERIC
|
# define SUX_LOCK_GENERIC /* Use dummy implementation for debugging purposes */
|
#endif
|
|
With that change, I am not seeing any hang, in over 100 rounds of executing the test. Without enabling the SUX_LOCK_GENERIC, it used to hang in less than 20 rounds.
I do not know yet what exactly could be wrong, but it is a step forward that I am able to reproduce the hangs, and fairly easily at that.
|
|
Does the attached file not have all stacktraces? Am I doing something wrong?
|
|
sstamm, sorry, indeed log_2022-04-08.txt includes the stack traces of all threads. I see that many threads are waiting for a futex on page id 61675730370563 (tablespace 14360, page 3, that is, the clustered index root page).
On the system where my test occasionally hangs, if I disable most CPU cores, it will stop hanging.The cause is currently unclear to me. If it is not something that I have overlooked in my implementation, it could be a bug in the Linux kernel (something similar to this one that was fixed in 2014), or an anomaly of some Intel processors that needs some specific workaround. My contact at Intel is aware of this.
|
|
I am able to repeat the hang on my local system (dual Intel Xeon E5-2630 v4) after all:
|
10.6 5a4a37076db1f3a77a40c9636b96eef5625b3d81 + git revert 48b974b267b10c7ff072494613d46afcc75b3b55
|
innodb_fts.concurrent_insert 'innodb' w30 [ 249 fail ]
|
Test ended at 2022-04-11 10:16:43
|
|
CURRENT_TEST: innodb_fts.concurrent_insert
|
mysqltest: At line 48: query 'reap' failed: <Unknown> (2013): Lost connection to server during query
|
…
|
2022-04-11 10:16:42 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
|
This was an unoptimized CMAKE_BUILD_TYPE=Debug with PLUGIN_PERFSCHEMA=NO that was built with GCC 11.2.0.
|
|
For one hang that I reproduced, there are two threads waiting for an exclusive lock and several waiting for a shared lock. The dict_sys.latch.readers suggests that one thread is holding dict_sys.latch in shared mode (which is not true and thus the reason for the hang), and an exclusive lock is being waited for.
|
10.6 5a4a37076db1f3a77a40c9636b96eef5625b3d81 + git revert 48b974b267b10c7ff072494613d46afcc75b3b55
|
(gdb) p/x dict_sys.latch
|
$1 = {writer = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000005}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000001}, <No data fields>}}
|
(gdb) thread apply all backtrace
|
…
|
Thread 44 (Thread 0x7f8cb4ff9640 (LWP 505552)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x00005604adabfc39 in srw_mutex_impl<false>::wait (this=0x5604ae785780 <dict_sys+64>, lk=2147483650) at /mariadb/10.6/storage/innobase/sync/srw_lock.cc:239
|
#2 0x00005604adabf369 in srw_mutex_impl<false>::wait_and_lock (this=0x5604ae785780 <dict_sys+64>) at /mariadb/10.6/storage/innobase/sync/srw_lock.cc:326
|
#3 0x00005604ad8a86a1 in srw_mutex_impl<false>::wr_lock (this=0x5604ae785780 <dict_sys+64>) at /mariadb/10.6/storage/innobase/include/srw_lock.h:134
|
#4 0x00005604adabfbd0 in ssux_lock_impl<false>::rd_wait (this=0x5604ae785780 <dict_sys+64>) at /mariadb/10.6/storage/innobase/sync/srw_lock.cc:392
|
#5 0x00005604ad8a5657 in ssux_lock_impl<false>::rd_lock (this=0x5604ae785780 <dict_sys+64>) at /mariadb/10.6/storage/innobase/include/srw_lock.h:247
|
#6 0x00005604ad89eb98 in dict_sys_t::freeze (this=0x5604ae785740 <dict_sys>) at /mariadb/10.6/storage/innobase/include/dict0dict.h:1594
|
…
|
#9 0x00005604adc54e67 in fts_optimize_sync_table (table=0x7f8db41c7c60, process_message=true) at /mariadb/10.6/storage/innobase/fts/fts0opt.cc:2791
|
…
|
Thread 30 (Thread 0x7f8df5ffb640 (LWP 505499)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x00005604adabfc39 in srw_mutex_impl<false>::wait (this=0x5604ae785780 <dict_sys+64>, lk=2147483651) at /mariadb/10.6/storage/innobase/sync/srw_lock.cc:239
|
#2 0x00005604adabf369 in srw_mutex_impl<false>::wait_and_lock (this=0x5604ae785780 <dict_sys+64>) at /mariadb/10.6/storage/innobase/sync/srw_lock.cc:326
|
#3 0x00005604ad8a86a1 in srw_mutex_impl<false>::wr_lock (this=0x5604ae785780 <dict_sys+64>) at /mariadb/10.6/storage/innobase/include/srw_lock.h:134
|
#4 0x00005604ad8a550e in ssux_lock_impl<false>::wr_lock (this=0x5604ae785780 <dict_sys+64>) at /mariadb/10.6/storage/innobase/include/srw_lock.h:257
|
#5 0x00005604adbc0c05 in dict_sys_t::lock_wait (this=0x5604ae785740 <dict_sys>) at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:1006
|
#6 0x00005604ad89ea3c in dict_sys_t::lock (this=0x5604ae785740 <dict_sys>) at /mariadb/10.6/storage/innobase/include/dict0dict.h:1572
|
#7 0x00005604adbc20d5 in dict_sys_t::evict_table_LRU (this=0x5604ae785740 <dict_sys>, half=false) at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:1326
|
#8 0x00005604adab1ee0 in srv_master_do_idle_tasks (counter_time=11838497125) at /mariadb/10.6/storage/innobase/srv/srv0srv.cc:1602
|
#9 0x00005604adab232d in srv_master_callback () at /mariadb/10.6/storage/innobase/srv/srv0srv.cc:1660
|
…
|
Thread 21 (Thread 0x7f8df7fff640 (LWP 505496)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x00005604adabfc39 in srw_mutex_impl<false>::wait (this=0x5604ae785780 <dict_sys+64>, lk=2147483652) at /mariadb/10.6/storage/innobase/sync/srw_lock.cc:239
|
#2 0x00005604adabf369 in srw_mutex_impl<false>::wait_and_lock (this=0x5604ae785780 <dict_sys+64>) at /mariadb/10.6/storage/innobase/sync/srw_lock.cc:326
|
#3 0x00005604ad8a86a1 in srw_mutex_impl<false>::wr_lock (this=0x5604ae785780 <dict_sys+64>) at /mariadb/10.6/storage/innobase/include/srw_lock.h:134
|
#4 0x00005604adabfbd0 in ssux_lock_impl<false>::rd_wait (this=0x5604ae785780 <dict_sys+64>) at /mariadb/10.6/storage/innobase/sync/srw_lock.cc:392
|
#5 0x00005604ad8a5657 in ssux_lock_impl<false>::rd_lock (this=0x5604ae785780 <dict_sys+64>) at /mariadb/10.6/storage/innobase/include/srw_lock.h:247
|
#6 0x00005604ad89eb98 in dict_sys_t::freeze (this=0x5604ae785740 <dict_sys>) at /mariadb/10.6/storage/innobase/include/dict0dict.h:1594
|
#7 0x00005604adbcedc4 in dict_table_open_on_id<true> (table_id=2, dict_locked=false, table_op=DICT_TABLE_OP_NORMAL, thd=0x5604b105e7c8, mdl=0x5604b1145160) at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:837
|
…
|
#16 0x00005604adab2de4 in purge_coordinator_callback () at /mariadb/10.6/storage/innobase/srv/srv0srv.cc:1965
|
…
|
Thread 19 (Thread 0x7f8df77fe640 (LWP 505497)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x00005604adabfc39 in srw_mutex_impl<false>::wait (this=0x5604ae785780 <dict_sys+64>, lk=2147483653) at /mariadb/10.6/storage/innobase/sync/srw_lock.cc:239
|
#2 0x00005604adabf369 in srw_mutex_impl<false>::wait_and_lock (this=0x5604ae785780 <dict_sys+64>) at /mariadb/10.6/storage/innobase/sync/srw_lock.cc:326
|
#3 0x00005604ad8a86a1 in srw_mutex_impl<false>::wr_lock (this=0x5604ae785780 <dict_sys+64>) at /mariadb/10.6/storage/innobase/include/srw_lock.h:134
|
#4 0x00005604adabfbd0 in ssux_lock_impl<false>::rd_wait (this=0x5604ae785780 <dict_sys+64>) at /mariadb/10.6/storage/innobase/sync/srw_lock.cc:392
|
#5 0x00005604ad8a5657 in ssux_lock_impl<false>::rd_lock (this=0x5604ae785780 <dict_sys+64>) at /mariadb/10.6/storage/innobase/include/srw_lock.h:247
|
#6 0x00005604ad89eb98 in dict_sys_t::freeze (this=0x5604ae785740 <dict_sys>) at /mariadb/10.6/storage/innobase/include/dict0dict.h:1594
|
#7 0x00005604adbcedc4 in dict_table_open_on_id<true> (table_id=1, dict_locked=false, table_op=DICT_TABLE_OP_NORMAL, thd=0x5604b1223878, mdl=0x5604b1144eb0) at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:837
|
…
|
#15 0x00005604adab2d9b in purge_worker_callback () at /mariadb/10.6/storage/innobase/srv/srv0srv.cc:1956
|
…
|
Thread 3 (Thread 0x7f8e10059640 (LWP 337877)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x00005604adabfd4e in ssux_lock_impl<false>::wait (this=0x5604ae785780 <dict_sys+64>, lk=2147483649) at /mariadb/10.6/storage/innobase/sync/srw_lock.cc:245
|
#2 0x00005604adabfb39 in ssux_lock_impl<false>::wr_wait (this=0x5604ae785780 <dict_sys+64>, lk=2147483649) at /mariadb/10.6/storage/innobase/sync/srw_lock.cc:378
|
#3 0x00005604ad8a554e in ssux_lock_impl<false>::wr_lock (this=0x5604ae785780 <dict_sys+64>) at /mariadb/10.6/storage/innobase/include/srw_lock.h:265
|
#4 0x00005604adbc08b6 in dict_sys_t::lock_wait (this=0x5604ae785740 <dict_sys>) at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:987
|
#5 0x00005604ad89ea3c in dict_sys_t::lock (this=0x5604ae785740 <dict_sys>) at /mariadb/10.6/storage/innobase/include/dict0dict.h:1572
|
#6 0x00005604adbcb1ed in dict_index_set_merge_threshold (index=0x7f8db4263630, merge_threshold=50) at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:4288
|
#7 0x00005604ad884868 in innobase_parse_hint_from_comment (thd=0x7f8da8000db8, table=0x7f8db420aaf0, table_share=0x7f8e10055f90) at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:11914
|
…
|
#14 0x00005604ad14eb86 in mysql_parse (thd=0x7f8da8000db8, rawbuf=0x7f8da8013f60 "ALTER TABLE t2 drop index idx1", length=30, parser_state=0x7f8e100583b0) at /mariadb/10.6/sql/sql_parse.cc:8045
|
|
|
On the remote system where my test hangs easily, it also hangs when I build an optimized executable with GCC 9.3.0 -O2 -march=native -mtune=native. I counted 2 exclusive lock waiters and 5 shared lock waiters, corresponding to dict_sys.latch.writer=0x80000007 and dict_sys.latch.readers=0x80000001 (which should have been dict_sys.latch.readers=0x80000000 to avoid the hang).
|
|
On the remote system, if I rebuild with cmake -DPLUGIN_PERFSCHEMA=NO or cmake -DPLUGIN_PERFSCHEMA=STATIC, the hang seems to go away (nothing in 194 or 63 test rounds, while with the default cmake -DPLUGIN_PERFSCHEMA=DYNAMIC it would hang in less than 30 rounds).
On my local system, cmake -DPLUGIN_PERFSCHEMA=DYNAMIC might help, because I with that I got a hang in 107×80 rounds. Previously, it took way more than 200 rounds.
Here is a little deeper analysis from the remote system. A thread is waiting for an exclusive lock. The local variables are not directly available in GDB, but it looks like we do have the value of lk available in the rdx register:
(gdb) thread 40
|
[Switching to thread 40 (Thread 0x7f3264ff9700 (LWP 2942137))]
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
38 ../sysdeps/unix/sysv/linux/x86_64/syscall.S: No such file or directory.
|
(gdb) disassemble
|
Dump of assembler code for function syscall:
|
0x00007f32c87bf880 <+0>: endbr64
|
0x00007f32c87bf884 <+4>: mov %rdi,%rax
|
0x00007f32c87bf887 <+7>: mov %rsi,%rdi
|
0x00007f32c87bf88a <+10>: mov %rdx,%rsi
|
0x00007f32c87bf88d <+13>: mov %rcx,%rdx
|
0x00007f32c87bf890 <+16>: mov %r8,%r10
|
0x00007f32c87bf893 <+19>: mov %r9,%r8
|
0x00007f32c87bf896 <+22>: mov 0x8(%rsp),%r9
|
0x00007f32c87bf89b <+27>: syscall
|
=> 0x00007f32c87bf89d <+29>: cmp $0xfffffffffffff001,%rax
|
…
|
(gdb) up
|
#1 0x0000557d2a033718 in ssux_lock_impl<false>::wait (lk=<optimized out>, this=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:244
|
244 inline void ssux_lock_impl<spinloop>::wait(uint32_t lk)
|
(gdb) disassemble
|
Dump of assembler code for function ssux_lock_impl<false>::wr_wait(unsigned int):
|
0x0000557d2a0336c0 <+0>: endbr64
|
0x0000557d2a0336c4 <+4>: push %rbx
|
0x0000557d2a0336c5 <+5>: mov %esi,%ecx
|
0x0000557d2a0336c7 <+7>: mov %rdi,%rbx
|
0x0000557d2a0336ca <+10>: sub $0x10,%rsp
|
0x0000557d2a0336ce <+14>: mov (%rdi),%eax
|
0x0000557d2a0336d0 <+16>: test %eax,%eax
|
0x0000557d2a0336d2 <+18>: jns 0x557d2a03372a <ssux_lock_impl<false>::wr_wait(unsigned int)+106>
|
0x0000557d2a0336d4 <+20>: test %ecx,%ecx
|
0x0000557d2a0336d6 <+22>: je 0x557d2a033801 <ssux_lock_impl<false>::wr_wait(unsigned int)+321>
|
0x0000557d2a0336dc <+28>: js 0x557d2a0337ba <ssux_lock_impl<false>::wr_wait(unsigned int)+250>
|
0x0000557d2a0336e2 <+34>: or $0x80000000,%ecx
|
0x0000557d2a0336e8 <+40>: add $0x4,%rbx
|
0x0000557d2a0336ec <+44>: nopl 0x0(%rax)
|
0x0000557d2a0336f0 <+48>: cmp $0x80000000,%ecx
|
0x0000557d2a0336f6 <+54>: jbe 0x557d2a033773 <ssux_lock_impl<false>::wr_wait(unsigned int)+179>
|
0x0000557d2a0336f8 <+56>: sub $0x8,%rsp
|
0x0000557d2a0336fc <+60>: pushq $0x0
|
0x0000557d2a0336fe <+62>: mov $0x80,%edx
|
0x0000557d2a033703 <+67>: xor %r9d,%r9d
|
0x0000557d2a033706 <+70>: xor %r8d,%r8d
|
0x0000557d2a033709 <+73>: mov %rbx,%rsi
|
0x0000557d2a03370c <+76>: mov $0xca,%edi
|
0x0000557d2a033711 <+81>: xor %eax,%eax
|
0x0000557d2a033713 <+83>: callq 0x557d295816b0 <syscall@plt>
|
=> 0x0000557d2a033718 <+88>: mov (%rbx),%ecx
|
…
|
Before the call to FUTEX_WAIT_PRIVATE, the value of the lock word was in the ecx register, and after the call it would have been loaded to there again. The syscall wrapper function copied rcx to rdx, and we do have the value there:
(gdb) info registers
|
rax 0xfffffffffffffe00 -512
|
rbx 0x557d2aa8b184 93996074971524
|
rcx 0x7f32c87bf89d 139856088660125
|
rdx 0x80000001 2147483649
|
rsi 0x80 128
|
rdi 0x557d2aa8b184 93996074971524
|
rbp 0x557d2aa8b180 0x557d2aa8b180 <dict_sys+64>
|
So, this thread seems to be waiting for dict_sys.latch.writer to be something else than 0x80000001 somewhere inside the Linux kernel. But, in fact, the value is something else, namely 0x80000008:
(gdb) set print static-members off
|
(gdb) p/x dict_sys.latch
|
$3 = {writer = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000008}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000001}, <No data fields>}}
|
(gdb) p &dict_sys.latch.writer
|
$4 = (srw_mutex_impl<false> *) 0x557d2aa8b180 <dict_sys+64>
|
(gdb) p &dict_sys.latch.readers
|
$5 = (std::atomic<unsigned int> *) 0x557d2aa8b184 <dict_sys+68>
|
Let us check all the threads that are waiting on dict_sys.latch.writers:
| thread |
call |
lk value |
| Thread 40 |
ssux_lock_impl<false>::wr_lock |
0x80000001 |
| Thread 39 |
ssux_lock_impl<false>::wr_lock |
0x80000004 |
| Thread 36 |
ssux_lock_impl<false>::rd_wait |
0x80000005 |
| Thread 35 |
ssux_lock_impl<false>::rd_wait |
0x80000007 |
| Thread 20 |
ssux_lock_impl<false>::rd_wait |
0x80000008 |
| Thread 17 |
ssux_lock_impl<false>::rd_wait |
0x80000006 |
| Thread 9 |
ssux_lock_impl<false>::wr_lock |
0x80000003 |
| Thread 6 |
ssux_lock_impl<false>::rd_wait |
0x80000003 |
This trace differs from previous ones that I have seen, in that there are 3 threads waiting for an exclusive lock.
Only Thread 20 should actually wait inside the kernel until dict_sys.latch.writers changes from 0x80000008; all other threads should have returned from the futex wait.
Of course, this is just a single snapshot of the threads during the hang. Based on a single sample, we cannot possibly know if the other threads were actually wrongly blocked inside the kernel.
Here are the stack traces in more detail:
Thread 40 (Thread 0x7f3264ff9700 (LWP 2942137)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x0000557d2a033718 in ssux_lock_impl<false>::wait (lk=<optimized out>, this=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:244
|
#2 ssux_lock_impl<false>::wr_wait (this=this@entry=0x557d2aa8b180 <dict_sys+64>, lk=<optimized out>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:378
|
#3 0x0000557d2a0bd321 in ssux_lock_impl<false>::wr_lock (this=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0dict.cc:974
|
#4 dict_sys_t::lock_wait (this=this@entry=0x557d2aa8b140 <dict_sys>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0dict.cc:980
|
#5 0x0000557d29f23f9a in dict_sys_t::lock (this=this@entry=0x557d2aa8b140 <dict_sys>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/include/dict0dict.h:1571
|
#6 0x0000557d2a0c0bf1 in dict_table_close (table=0x7f32b800af10) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0dict.cc:224
|
#7 0x0000557d2a0c0cc5 in dict_table_close (table=0x7f32b800af10, dict_locked=<optimized out>, thd=0x7f31f80095f8, mdl=0x7f327c03b260) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0dict.cc:252
|
#8 0x0000557d2a0da1bb in dict_stats_save (table_orig=<optimized out>, only_for_index=0x0) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0stats.cc:3169
|
#9 0x0000557d2a0ddfdf in dict_stats_update (table=table@entry=0x7f327c0874c0, stats_upd_option=stats_upd_option@entry=DICT_STATS_RECALC_PERSISTENT) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0stats.cc:3858
|
#10 0x0000557d2a0df7b0 in dict_stats_process_entry_from_recalc_pool (thd=0x7f31f80095f8) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0stats_bg.cc:349
|
…
|
Thread 39 (Thread 0x7f32267fc700 (LWP 2942150)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x0000557d2a0332b0 in srw_mutex_impl<false>::wait (lk=<optimized out>, this=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:238
|
#2 srw_mutex_impl<false>::wait_and_lock (this=this@entry=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:326
|
#3 0x0000557d2a0bd32b in srw_mutex_impl<false>::wr_lock (this=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0dict.cc:974
|
#4 ssux_lock_impl<false>::wr_lock (this=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/include/srw_lock.h:257
|
#5 dict_sys_t::lock_wait (this=this@entry=0x557d2aa8b140 <dict_sys>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0dict.cc:980
|
#6 0x0000557d29f23f9a in dict_sys_t::lock (this=this@entry=0x557d2aa8b140 <dict_sys>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/include/dict0dict.h:1571
|
#7 0x0000557d2a0c2dda in dict_sys_t::evict_table_LRU (this=0x557d2aa8b140 <dict_sys>, half=half@entry=false) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0dict.cc:1300
|
#8 0x0000557d2a02ef5f in srv_master_do_idle_tasks (counter_time=4129411899215) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/srv/srv0srv.cc:1602
|
#9 srv_master_callback () at /home/marko/bb-10.6-MDEV-21423/storage/innobase/srv/srv0srv.cc:1660
|
…
|
Thread 36 (Thread 0x7f32677fe700 (LWP 2942126)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x0000557d2a0332b0 in srw_mutex_impl<false>::wait (lk=<optimized out>, this=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:238
|
#2 srw_mutex_impl<false>::wait_and_lock (this=this@entry=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:326
|
#3 0x0000557d2a0339cd in srw_mutex_impl<false>::wr_lock (this=0x557d2aa8b180 <dict_sys+64>) at /usr/include/c++/9/bits/atomic_base.h:549
|
#4 ssux_lock_impl<false>::rd_wait (this=this@entry=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:392
|
#5 0x0000557d29f2032d in ssux_lock_impl<false>::rd_lock (this=0x557d2aa8b180 <dict_sys+64>) at /usr/include/c++/9/bits/atomic_base.h:539
|
#6 dict_sys_t::freeze (this=0x557d2aa8b140 <dict_sys>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/include/dict0dict.h:1593
|
#7 0x0000557d2a0c1d3c in dict_table_open_on_id (table_id=4, dict_locked=dict_locked@entry=false, table_op=table_op@entry=DICT_TABLE_OP_NORMAL, thd=0x557d2be87198, mdl=mdl@entry=0x557d2bd6e9c0) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0dict.cc:834
|
#8 0x0000557d2a00a166 in row_purge_parse_undo_rec (updated_extern=0x7f32677fdbde, thr=0x557d2bd6e780, undo_rec=0x557d2bd81638 "", node=0x557d2bd6e820) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/row/row0purge.cc:1030
|
…
|
Thread 35 (Thread 0x7f32277fe700 (LWP 2942145)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x0000557d2a0332b0 in srw_mutex_impl<false>::wait (lk=<optimized out>, this=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:238
|
#2 srw_mutex_impl<false>::wait_and_lock (this=this@entry=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:326
|
#3 0x0000557d2a0339cd in srw_mutex_impl<false>::wr_lock (this=0x557d2aa8b180 <dict_sys+64>) at /usr/include/c++/9/bits/atomic_base.h:549
|
#4 ssux_lock_impl<false>::rd_wait (this=this@entry=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:392
|
#5 0x0000557d29f2032d in ssux_lock_impl<false>::rd_lock (this=0x557d2aa8b180 <dict_sys+64>) at /usr/include/c++/9/bits/atomic_base.h:539
|
#6 dict_sys_t::freeze (this=0x557d2aa8b140 <dict_sys>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/include/dict0dict.h:1593
|
#7 0x0000557d2a0c1d3c in dict_table_open_on_id (table_id=2, dict_locked=dict_locked@entry=false, table_op=table_op@entry=DICT_TABLE_OP_NORMAL, thd=0x557d2be8dba8, mdl=mdl@entry=0x557d2bd6e070) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0dict.cc:834
|
#8 0x0000557d2a00a166 in row_purge_parse_undo_rec (updated_extern=0x7f32277fdcfe, thr=0x557d2bd6de30, undo_rec=0x557d2bd81580 "", node=0x557d2bd6ded0) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/row/row0purge.cc:1030
|
…
|
Thread 20 (Thread 0x7f32b37fe700 (LWP 2942118)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x0000557d2a0332b0 in srw_mutex_impl<false>::wait (lk=<optimized out>, this=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:238
|
#2 srw_mutex_impl<false>::wait_and_lock (this=this@entry=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:326
|
#3 0x0000557d2a0339cd in srw_mutex_impl<false>::wr_lock (this=0x557d2aa8b180 <dict_sys+64>) at /usr/include/c++/9/bits/atomic_base.h:549
|
#4 ssux_lock_impl<false>::rd_wait (this=this@entry=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:392
|
#5 0x0000557d29f2032d in ssux_lock_impl<false>::rd_lock (this=0x557d2aa8b180 <dict_sys+64>) at /usr/include/c++/9/bits/atomic_base.h:539
|
#6 dict_sys_t::freeze (this=0x557d2aa8b140 <dict_sys>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/include/dict0dict.h:1593
|
#7 0x0000557d2a0c1d3c in dict_table_open_on_id (table_id=3, dict_locked=dict_locked@entry=false, table_op=table_op@entry=DICT_TABLE_OP_NORMAL, thd=0x557d2be43138, mdl=mdl@entry=0x557d2bd6e710) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0dict.cc:834
|
#8 0x0000557d2a00a166 in row_purge_parse_undo_rec (updated_extern=0x7f32b37fdcfe, thr=0x557d2bd6e4d0, undo_rec=0x557d2bd815b0 "", node=0x557d2bd6e570) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/row/row0purge.cc:1030
|
…
|
Thread 17 (Thread 0x7f32977fe700 (LWP 2923252)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x0000557d2a0332b0 in srw_mutex_impl<false>::wait (lk=<optimized out>, this=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:238
|
#2 srw_mutex_impl<false>::wait_and_lock (this=this@entry=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:326
|
#3 0x0000557d2a0339cd in srw_mutex_impl<false>::wr_lock (this=0x557d2aa8b180 <dict_sys+64>) at /usr/include/c++/9/bits/atomic_base.h:549
|
#4 ssux_lock_impl<false>::rd_wait (this=this@entry=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:392
|
#5 0x0000557d29f2032d in ssux_lock_impl<false>::rd_lock (this=0x557d2aa8b180 <dict_sys+64>) at /usr/include/c++/9/bits/atomic_base.h:539
|
#6 dict_sys_t::freeze (this=0x557d2aa8b140 <dict_sys>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/include/dict0dict.h:1593
|
#7 0x0000557d2a0c1d3c in dict_table_open_on_id (table_id=1, dict_locked=dict_locked@entry=false, table_op=table_op@entry=DICT_TABLE_OP_NORMAL, thd=0x557d2bc88e18, mdl=mdl@entry=0x557d2bd6ddc0) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0dict.cc:834
|
#8 0x0000557d2a00a166 in row_purge_parse_undo_rec (updated_extern=0x7f32977fdcfe, thr=0x557d2bd6da40, undo_rec=0x557d2bd81570 "", node=0x557d2bd6dc20) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/row/row0purge.cc:1030
|
…
|
Thread 9 (Thread 0x7f32c2c85700 (LWP 2920172)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x0000557d2a0332b0 in srw_mutex_impl<false>::wait (lk=<optimized out>, this=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:238
|
#2 srw_mutex_impl<false>::wait_and_lock (this=this@entry=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:326
|
#3 0x0000557d2a0bd281 in srw_mutex_impl<false>::wr_lock (this=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0dict.cc:966
|
#4 ssux_lock_impl<false>::wr_lock (this=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/include/srw_lock.h:257
|
#5 dict_sys_t::lock_wait (this=this@entry=0x557d2aa8b140 <dict_sys>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0dict.cc:961
|
#6 0x0000557d29f23f9a in dict_sys_t::lock (this=this@entry=0x557d2aa8b140 <dict_sys>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/include/dict0dict.h:1571
|
#7 0x0000557d2a0c177c in dict_index_set_merge_threshold (index=index@entry=0x7f327c054a50, merge_threshold=merge_threshold@entry=50) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0dict.cc:4262
|
#8 0x0000557d29f0c2d5 in innobase_parse_hint_from_comment (thd=<optimized out>, table=0x7f327c0cd3d0, table_share=0x7f32c2c821a0) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/handler/ha_innodb.cc:11896
|
#9 0x0000557d29f376bb in ha_innobase::commit_inplace_alter_table (this=0x7f327c10e860, altered_table=<optimized out>, ha_alter_info=<optimized out>, commit=<optimized out>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/handler/handler0alter.cc:11353
|
…
|
Thread 6 (Thread 0x7f32967fc700 (LWP 2923254)):
|
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
|
#1 0x0000557d2a0332b0 in srw_mutex_impl<false>::wait (lk=<optimized out>, this=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:238
|
#2 srw_mutex_impl<false>::wait_and_lock (this=this@entry=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:326
|
#3 0x0000557d2a0339cd in srw_mutex_impl<false>::wr_lock (this=0x557d2aa8b180 <dict_sys+64>) at /usr/include/c++/9/bits/atomic_base.h:549
|
#4 ssux_lock_impl<false>::rd_wait (this=this@entry=0x557d2aa8b180 <dict_sys+64>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/sync/srw_lock.cc:392
|
#5 0x0000557d2a0c5c45 in ssux_lock_impl<false>::rd_lock (this=0x557d2aa8b180 <dict_sys+64>) at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:34
|
#6 dict_sys_t::freeze (this=0x557d2aa8b140 <dict_sys>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/include/dict0dict.h:1593
|
#7 dict_table_t::parse_name<false> (this=this@entry=0x7f327c0874c0, db_name=..., tbl_name=..., db_name_len=db_name_len@entry=0x7f32967fb690, tbl_name_len=tbl_name_len@entry=0x7f32967fb698) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/dict/dict0dict.cc:619
|
#8 0x0000557d2a0c6980 in dict_acquire_mdl_shared<true> (table=0x7f327c0874c0, thd=<optimized out>, mdl=mdl@entry=0x7f32967fbbf8, table_op=table_op@entry=DICT_TABLE_OP_NORMAL) at /usr/include/c++/9/bits/atomic_base.h:413
|
#9 0x0000557d2a1110e4 in fts_optimize_sync_table (table=<optimized out>, process_message=<optimized out>) at /home/marko/bb-10.6-MDEV-21423/storage/innobase/fts/fts0opt.cc:2791
|
#10 0x0000557d2a114e54 in fts_optimize_callback () at /home/marko/bb-10.6-MDEV-21423/storage/innobase/fts/fts0opt.cc:2912
|
|
|
On my local system, using a single purge thread significantly improves the reproducibility of the hang:
./mtr --mem --mysqld=--innodb-fatal-semaphore-wait-threshold=3 --mysqld=--innodb-purge-threads=1 --parallel=80 --repeat=1000 innodb_fts.concurrent_insert{,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,}
|
|
10.6 7bccf3dd744b6e85b69629c1e18730e77df503a2 + git revert 48b974b267b10c7ff072494613d46afcc75b3b55
|
innodb_fts.concurrent_insert 'innodb' w66 [ 7 fail ]
|
Test ended at 2022-04-11 13:19:25
|
|
CURRENT_TEST: innodb_fts.concurrent_insert
|
mysqltest: At line 48: query 'reap' failed: <Unknown> (2013): Lost connection to server during query
|
In every hang that I have analyzed so far at the register level, all but one of the FUTEX_WAIT_PRIVATE system calls should actually not be blocked. It would seem that the threads are actually in a livelock, waiting for dict_sys.latch.readers to reach 0x80000000, that is, waiting for the exclusive lock to be granted to one of the threads. The root cause of the hang ought to be that the readers word is being assigned to a value that wrongly indicates that a thread is holding a shared lock.
|
|
sstamm, I filed MDEV-28289 for my hang, which turned out to be a simple bug in higher-level code logic and not a bug in the lock implementation itself. If you never used FULLTEXT INDEX, your hang must definitely be something else, and unfortunately I have no idea how to reproduce or fix it. Because none of the attached stack trace dumps include function name fts_optimize_sync_table, that hang must be something else than MDEV-28289.
|
|
Got the next one: stacktrace.txt
|
=====================================
|
2022-04-22 16:27:59 0x7f19ec3c8700 INNODB MONITOR OUTPUT
|
=====================================
|
Per second averages calculated from the last 8 seconds
|
-----------------
|
BACKGROUND THREAD
|
-----------------
|
srv_master_thread loops: 0 srv_active, 0 srv_shutdown, 1224941 srv_idle
|
srv_master_thread log flush and writes: 1224928
|
----------
|
SEMAPHORES
|
----------
|
|
------------
|
TRANSACTIONS
|
------------
|
Trx id counter 593574256
|
Purge done for trx's n:o < 593538627 undo n:o < 0 state: running
|
History list length 17717
|
LIST OF TRANSACTIONS FOR EACH SESSION:
|
---TRANSACTION (0x7f1a003e2a30), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION (0x7f1a003df730), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION (0x7f1a003db330), ACTIVE 2037 sec starting index read
|
mysql tables in use 5, locked 0
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 26154136, OS thread handle 139471252612864, query id 2173984532 10.254.28.149 fccUser Sending data
|
SET STATEMENT max_statement_time=180 FOR SELECT DISTINCT HEX(objectId & 0xFFFFFFFF), objectId FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN event AS evt USING(flightRecordId) JOIN Sequence AS seq USING(eventId) JOIN FirmwareError AS fw USING(sequenceId) WHERE (fr.systemSerialNumber IN ("5310077883","5310088343","5310090156","5310107958","5310095070","5310051029","5310009325","5310089079","5310090449","5310086021","5310069029","5310088305","5310110449","5310100005","5310112156","5310051977","5310100732","5310105228","5310058836","5310038373","5300808591","5300855523","5300644791
|
Trx read view will not see trx with id >= 593564706, sees < 593538926
|
---TRANSACTION (0x7f1a003d9130), ACTIVE 2037 sec starting index read
|
mysql tables in use 5, locked 0
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 26151574, OS thread handle 139471246063360, query id 2173984530 10.254.28.149 fccUser Sending data
|
SET STATEMENT max_statement_time=180 FOR SELECT DISTINCT classId FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN event AS evt USING(flightRecordId) JOIN Sequence AS seq USING(eventId) JOIN FirmwareError AS fw USING(sequenceId) WHERE (fr.systemSerialNumber IN ("5310077883","5310088343","5310090156","5310107958","5310095070","5310051029","5310009325","5310089079","5310090449","5310086021","5310069029","5310088305","5310110449","5310100005","5310112156","5310051977","5310100732","5310105228","5310058836","5310038373","5300808591","5300855523","5300644791","5310040162","5310034149","
|
Trx read view will not see trx with id >= 593564705, sees < 593538926
|
---TRANSACTION (0x7f1a003a7230), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION (0x7f1a003da230), ACTIVE 4269 sec starting index read
|
mysql tables in use 5, locked 0
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 26146935, OS thread handle 139471245756160, query id 2173703497 10.254.28.149 fccUser Sending data
|
SET STATEMENT max_statement_time=180 FOR SELECT DISTINCT HEX(objectId & 0xFFFFFFFF), objectId FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN event AS evt USING(flightRecordId) JOIN Sequence AS seq USING(eventId) JOIN FirmwareError AS fw USING(sequenceId) WHERE (fr.systemSerialNumber IN ("5310077883","5310088343","5310090156","5310107958","5310095070","5310051029","5310009325","5310089079","5310090449","5310086021","5310069029","5310088305","5310110449","5310100005","5310112156","5310051977","5310100732","5310105228","5310058836","5310038373","5300808591","5300855523","5300644791
|
Trx read view will not see trx with id >= 593552617, sees < 593538926
|
---TRANSACTION (0x7f1a003a9430), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION (0x7f1a003b4f30), ACTIVE 4269 sec starting index read
|
mysql tables in use 5, locked 0
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 26146878, OS thread handle 139471247697664, query id 2173703495 10.254.28.149 fccUser Sending data
|
SET STATEMENT max_statement_time=180 FOR SELECT DISTINCT classId FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN event AS evt USING(flightRecordId) JOIN Sequence AS seq USING(eventId) JOIN FirmwareError AS fw USING(sequenceId) WHERE (fr.systemSerialNumber IN ("5310077883","5310088343","5310090156","5310107958","5310095070","5310051029","5310009325","5310089079","5310090449","5310086021","5310069029","5310088305","5310110449","5310100005","5310112156","5310051977","5310100732","5310105228","5310058836","5310038373","5300808591","5300855523","5300644791","5310040162","5310034149","
|
Trx read view will not see trx with id >= 593552616, sees < 593538926
|
---TRANSACTION (0x7f1a003d8030), ACTIVE 5458 sec starting index read
|
mysql tables in use 5, locked 0
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 26136086, OS thread handle 139471251691264, query id 2173302551 10.254.28.149 fccUser Sending data
|
SET STATEMENT max_statement_time=180 FOR SELECT DISTINCT HEX(objectId & 0xFFFFFFFF), objectId FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN event AS evt USING(flightRecordId) JOIN Sequence AS seq USING(eventId) JOIN FirmwareError AS fw USING(sequenceId) WHERE (fr.systemSerialNumber IN ("5310077883","5310088343","5310090156","5310107958","5310095070","5310051029","5310009325","5310089079","5310090449","5310086021","5310069029","5310088305","5310110449","5310100005","5310112156","5310051977","5310100732","5310105228","5310058836","5310038373","5300808591","5300855523","5300644791
|
Trx read view will not see trx with id >= 593539441, sees < 593538926
|
---TRANSACTION 593539177, ACTIVE 5528 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135941, OS thread handle 139471253534464, query id 2173301874 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539164, ACTIVE 5544 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135938, OS thread handle 139471253841664, query id 2173301710 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539161, ACTIVE 5545 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135936, OS thread handle 139471254148864, query id 2173301581 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539158, ACTIVE 5546 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135934, OS thread handle 139471254763264, query id 2173301455 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539152, ACTIVE 5548 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135932, OS thread handle 139471255684864, query id 2173301291 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539114, ACTIVE 5556 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135913, OS thread handle 139471255377664, query id 2173300398 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539109, ACTIVE 5558 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135908, OS thread handle 139471256299264, query id 2173300259 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539098, ACTIVE 5563 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135900, OS thread handle 139471256913664, query id 2173300060 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539095, ACTIVE 5569 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135897, OS thread handle 139471256606464, query id 2173300040 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539090, ACTIVE 5574 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135893, OS thread handle 139749310920448, query id 2173299882 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539081, ACTIVE 5578 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135889, OS thread handle 139749311842048, query id 2173299790 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values ('40000001', '00000093', 'ERROR', 788529152, 554696704, 0, 248984869)
|
---TRANSACTION 593539067, ACTIVE 5582 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135887, OS thread handle 139749312149248, query id 2173299765 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539066, ACTIVE 5583 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135886, OS thread handle 139749312763648, query id 2173299756 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539055, ACTIVE 5586 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135883, OS thread handle 139749312456448, query id 2173299624 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values ('40000001', '00000093', 'ERROR', 788529152, 554696704, 0, 248984866)
|
---TRANSACTION 593539038, ACTIVE 5592 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135877, OS thread handle 139749313378048, query id 2173299451 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539035, ACTIVE 5592 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135876, OS thread handle 139749314914048, query id 2173299440 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539034, ACTIVE 5593 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135875, OS thread handle 139749315221248, query id 2173299431 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539031, ACTIVE 5595 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135872, OS thread handle 139749315528448, query id 2173299391 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539030, ACTIVE 5595 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135871, OS thread handle 139749316142848, query id 2173299382 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539013, ACTIVE 5601 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 6
|
MariaDB thread id 26135860, OS thread handle 139749317986048, query id 2173298854 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593539000, ACTIVE 5609 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135850, OS thread handle 139749319214848, query id 2173298411 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593538999, ACTIVE 5610 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135849, OS thread handle 139749318907648, query id 2173298402 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593538998, ACTIVE 5611 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135848, OS thread handle 139471251076864, query id 2173298393 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593538989, ACTIVE 5615 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135842, OS thread handle 139749311227648, query id 2173298069 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593538988, ACTIVE 5616 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135841, OS thread handle 139471257220864, query id 2173298060 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593538979, ACTIVE 5624 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135832, OS thread handle 139749311534848, query id 2173297797 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593538976, ACTIVE 5625 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135830, OS thread handle 139471255070464, query id 2173297784 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593538975, ACTIVE 5625 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135831, OS thread handle 139749646280448, query id 2173297777 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593538972, ACTIVE 5628 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135827, OS thread handle 139749316757248, query id 2173297689 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593538961, ACTIVE 5629 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135824, OS thread handle 139749315835648, query id 2173297608 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION (0x7f1a003b8230), ACTIVE 5458 sec starting index read
|
mysql tables in use 5, locked 0
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 26135822, OS thread handle 139749314606848, query id 2173302548 10.254.28.149 fccUser Sending data
|
SET STATEMENT max_statement_time=180 FOR SELECT DISTINCT classId FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN event AS evt USING(flightRecordId) JOIN Sequence AS seq USING(eventId) JOIN FirmwareError AS fw USING(sequenceId) WHERE (fr.systemSerialNumber IN ("5310077883","5310088343","5310090156","5310107958","5310095070","5310051029","5310009325","5310089079","5310090449","5310086021","5310069029","5310088305","5310110449","5310100005","5310112156","5310051977","5310100732","5310105228","5310058836","5310038373","5300808591","5300855523","5300644791","5310040162","5310034149","
|
Trx read view will not see trx with id >= 593539438, sees < 593538926
|
---TRANSACTION (0x7f1a003b6030), ACTIVE 5631 sec starting index read
|
mysql tables in use 5, locked 0
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 26135820, OS thread handle 139749319522048, query id 2173297500 10.254.28.149 fccUser Sending data
|
SET STATEMENT max_statement_time=180 FOR SELECT DISTINCT HEX(objectId & 0xFFFFFFFF), objectId FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN event AS evt USING(flightRecordId) JOIN Sequence AS seq USING(eventId) JOIN FirmwareError AS fw USING(sequenceId) WHERE (fr.systemSerialNumber IN ("5310077883","5310088343","5310090156","5310107958","5310095070","5310051029","5310009325","5310089079","5310090449","5310086021","5310069029","5310088305","5310110449","5310100005","5310112156","5310051977","5310100732","5310105228","5310058836","5310038373","5300808591","5300855523","5300644791
|
Trx read view will not see trx with id >= 593538958, sees < 593538926
|
---TRANSACTION 593538951, ACTIVE 5631 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135817, OS thread handle 139749319829248, query id 2173297476 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values ('40000001', '00000081', 'ERROR', -2062548992, 0, 0, 248984849)
|
---TRANSACTION 593538950, ACTIVE 5631 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135818, OS thread handle 139749317371648, query id 2173297467 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593538942, ACTIVE 5632 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135813, OS thread handle 139749318293248, query id 2173297202 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values ('40000001', '00000090', 'ERROR', 536870912, 0, 0, 248984847)
|
---TRANSACTION 593538936, ACTIVE 5634 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135809, OS thread handle 139749313685248, query id 2173297079 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION 593538933, ACTIVE 5635 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 5
|
MariaDB thread id 26135807, OS thread handle 139471252920064, query id 2173297016 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values ('40000001', '00000080', 'WARNING', -2011492352, 0, 0, 248984845)
|
---TRANSACTION 593538926, ACTIVE 5636 sec inserting
|
mysql tables in use 1, locked 1
|
9 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 35
|
MariaDB thread id 26135801, OS thread handle 139471254456064, query id 2173296782 10.254.28.150 fccUser Update
|
insert into FirmwareError (classId, errorId, errorLevel, objectId, value, stackNr, sequenceId) values (?, ?, ?, ?, ?, ?, ?)
|
---TRANSACTION (0x7f1a003ad830), ACTIVE 5631 sec starting index read
|
mysql tables in use 5, locked 0
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 26135741, OS thread handle 139749316450048, query id 2173297481 10.254.28.149 fccUser Sending data
|
SET STATEMENT max_statement_time=180 FOR SELECT DISTINCT classId FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN event AS evt USING(flightRecordId) JOIN Sequence AS seq USING(eventId) JOIN FirmwareError AS fw USING(sequenceId) WHERE (fr.systemSerialNumber IN ("5310077883","5310088343","5310090156","5310107958","5310095070","5310051029","5310009325","5310089079","5310090449","5310086021","5310069029","5310088305","5310110449","5310100005","5310112156","5310051977","5310100732","5310105228","5310058836","5310038373","5300808591","5300855523","5300644791","5310040162","5310034149","
|
Trx read view will not see trx with id >= 593538956, sees < 593538926
|
---TRANSACTION (0x7f1a003ab630), ACTIVE 5750 sec fetching rows
|
mysql tables in use 5, locked 0
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 26135740, OS thread handle 139749317678848, query id 2173188199 10.254.28.149 fccUser Sending data
|
SET STATEMENT max_statement_time=180 FOR SELECT DISTINCT classId FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN event AS evt USING(flightRecordId) JOIN Sequence AS seq USING(eventId) JOIN FirmwareError AS fw USING(sequenceId) WHERE (fr.systemSerialNumber IN ("5310077883","5310088343","5310090156","5310107958","5310095070","5310051029","5310009325","5310089079","5310090449","5310086021","5310069029","5310088305","5310110449","5310100005","5310112156","5310051977","5310100732","5310105228","5310058836","5310038373","5300808591","5300855523","5300644791","5310040162","5310034149","
|
Trx read view will not see trx with id >= 593538626, sees < 593538626
|
---TRANSACTION (0x7f1a003aa530), ACTIVE 5750 sec fetching rows
|
mysql tables in use 5, locked 0
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 26135731, OS thread handle 139749317064448, query id 2173188202 10.254.28.149 fccUser Sending data
|
SET STATEMENT max_statement_time=180 FOR SELECT DISTINCT HEX(objectId & 0xFFFFFFFF), objectId FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN event AS evt USING(flightRecordId) JOIN Sequence AS seq USING(eventId) JOIN FirmwareError AS fw USING(sequenceId) WHERE (fr.systemSerialNumber IN ("5310077883","5310088343","5310090156","5310107958","5310095070","5310051029","5310009325","5310089079","5310090449","5310086021","5310069029","5310088305","5310110449","5310100005","5310112156","5310051977","5310100732","5310105228","5310058836","5310038373","5300808591","5300855523","5300644791
|
Trx read view will not see trx with id >= 593538626, sees < 593538626
|
---TRANSACTION (0x7f1a003ac730), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION (0x7f1a003a8330), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
---TRANSACTION (0x7f1a003a6130), not started
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
--------
|
FILE I/O
|
--------
|
Pending flushes (fsync) log: 0; buffer pool: 0
|
43167392 OS file reads, 637581095 OS file writes, 3935596 OS fsyncs
|
0.00 reads/s, 0 avg bytes/read, 0.25 writes/s, 0.25 fsyncs/s
|
-------------------------------------
|
INSERT BUFFER AND ADAPTIVE HASH INDEX
|
-------------------------------------
|
Ibuf: size 1284, free list len 379692, seg size 380977, 198 merges
|
merged operations:
|
insert 3041, delete mark 435, delete 435
|
discarded operations:
|
insert 0, delete mark 0, delete 0
|
0.00 hash searches/s, 0.00 non-hash searches/s
|
---
|
LOG
|
---
|
Log sequence number 31156487913110
|
Log flushed up to 31156487913110
|
Pages flushed up to 31156272361314
|
Last checkpoint at 31156272361314
|
0 pending log flushes, 0 pending chkp writes
|
1662154 log i/o's done, 0.25 log i/o's/second
|
----------------------
|
BUFFER POOL AND MEMORY
|
----------------------
|
Total large memory allocated 275012124672
|
Dictionary memory allocated 1699972696
|
Buffer pool size 16613376
|
Free buffers 1
|
Database pages 19274684
|
Old database pages 7115048
|
Modified db pages 90923
|
Percent of dirty pages(LRU & free pages): 0.472
|
Max dirty pages percent: 90.000
|
Pending reads 0
|
Pending writes: LRU 0, flush list 0
|
Pages made young 46726112, not young 1342268161
|
0.00 youngs/s, 0.00 non-youngs/s
|
Pages read 43148083, created 62388193, written 635923533
|
0.00 reads/s, 1.25 creates/s, 0.00 writes/s
|
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
|
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
|
LRU len: 19274684, unzip_LRU len: 3874980
|
I/O sum[1]:cur[0], unzip sum[294]:cur[1]
|
--------------
|
ROW OPERATIONS
|
--------------
|
10 read views open inside InnoDB
|
Process ID=0, Main thread ID=0, state: sleeping
|
Number of rows inserted 4846189129, updated 19547377, deleted 310194726, read 189522332920
|
86.61 inserts/s, 0.12 updates/s, 0.00 deletes/s, 0.25 reads/s
|
Number of system rows inserted 0, updated 0, deleted 0, read 0
|
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
|
----------------------------
|
END OF INNODB MONITOR OUTPUT
|
============================
|
|
|
|
sstamm, next time it occurs, it would help if you could display the state of the lock in every wait() member function (2 stack frames up from syscall), for example the first few ones for stacktrace.txt :
set print static-members off
|
thread 77
|
frame 2
|
print/x *this
|
thread 76
|
frame 2
|
print/x *this
|
thread 71
|
frame 2
|
print/x *this
|
By default, this information will not be available in core dumps, because due to MDEV-10814 the buffer pool will be excluded from core dumps.
It could also help to include the output of the following. The last command is a substitute of the above, but it will unnecessarily attempt to dump *this in stack frame #2 of every thread.
set print static-members off
|
set print frame-arguments all
|
thread apply all backtrace full
|
thread apply all -s -- frame apply level 2 -s print/x *this
|
Seeing the values of the lock words would help us to narrow down the error.
Would the hangs become less likely if you use numactl to restrict the server process to a single CPU socket? We do run tests on NUMA machines (mostly on debug instrumented builds), and I am not currently aware of any hangs.
|
|
Sorry, did see your comment too late. Got another one: stacktrace-2022-04-28.txt will try to apply the adjustments next time.
|
|
mleich found a stack trace from a hang in some 10.6 based development branch.
|
1425b7ab0318ddbfeb2a8bb6e73cc3d7cf83d744 (an attempt of fixing MDEV-28399), 1 commit ahead of 10.6 03e703fd74b43ffd27e858e03e9d147029de086e
|
ssh pluto
|
gdb -c /data/results/1651072040/DEADLOCKED--TBR-655/dev/shm/rqg/1651072040/47/1/data/core /data/Server_bin/bb-10.6-thiruC_asan/bin/mysqld
|
…
|
(gdb) set print static-members off
|
(gdb) set print frame-arguments all
|
(gdb) thread apply all -s -- frame apply level 2 -s print/x *this
|
|
Thread 48 (Thread 0x7ff3e5237700 (LWP 540767)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$1 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 47 (Thread 0x7ff3e6f9c700 (LWP 540634)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$2 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 45 (Thread 0x7ff3e4a8f700 (LWP 541540)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$3 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 43 (Thread 0x7ff3e55e8700 (LWP 540755)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$4 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 42 (Thread 0x7ff3e65c4700 (LWP 540694)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$5 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 40 (Thread 0x7ff3e5ab1700 (LWP 540746)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$6 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 39 (Thread 0x7ff3e70b4700 (LWP 540628)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$7 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 38 (Thread 0x7ff3e6d26700 (LWP 540645)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$8 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 37 (Thread 0x7ff3e93cd700 (LWP 538074)):
|
#2 0x000055661237e767 in ssux_lock_impl<false>::wr_wait (this=0x61600259c478, lk=2147483649) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:378
|
$9 = {writer = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000001}, <No data fields>}}
|
|
Thread 36 (Thread 0x7ff3e5f9d700 (LWP 540727)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$10 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 35 (Thread 0x7ff3e5976700 (LWP 540748)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$11 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 34 (Thread 0x7ff3e5d04700 (LWP 540735)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$12 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 33 (Thread 0x7ff3e6489700 (LWP 540700)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x616001a59478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$13 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000007}, <No data fields>}}
|
|
Thread 32 (Thread 0x7ff3e6975700 (LWP 540667)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$14 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 31 (Thread 0x7ff3e6bc8700 (LWP 540654)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$15 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 29 (Thread 0x7ff3e534f700 (LWP 540766)):
|
#2 0x000055661237e141 in srw_mutex_impl<true>::wait_and_lock (this=0x5566144cb888 <trx_sys+31752>) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$16 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000002}, <No data fields>}}
|
|
Thread 28 (Thread 0x7ff3f12be700 (LWP 562945)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$17 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 27 (Thread 0x7ff3e5700700 (LWP 540753)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x616001a59478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$18 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000007}, <No data fields>}}
|
|
Thread 26 (Thread 0x7ff3e634e700 (LWP 540714)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x616001a59478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$19 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000007}, <No data fields>}}
|
|
Thread 25 (Thread 0x7ff3e6e61700 (LWP 540639)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$20 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 22 (Thread 0x7ff3e6ab0700 (LWP 540660)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$21 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 19 (Thread 0x7ff3e60d8700 (LWP 540725)):
|
#2 0x000055661237e141 in srw_mutex_impl<true>::wait_and_lock (this=0x7ff40891f7b8) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$22 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000002}, <No data fields>}}
|
|
Thread 18 (Thread 0x7ff3e66dc700 (LWP 540685)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$23 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 17 (Thread 0x7ff3e683a700 (LWP 540674)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$24 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 16 (Thread 0x7ff3e71e7700 (LWP 540617)):
|
#2 0x000055661237e4f8 in ssux_lock_impl<true>::wr_wait (this=0x7ff40890ff18, lk=2147483650) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:378
|
$25 = {writer = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000001}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000002}, <No data fields>}}
|
|
Thread 13 (Thread 0x7ff3e50d9700 (LWP 540769)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x616001a59478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$26 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000007}, <No data fields>}}
|
|
Thread 12 (Thread 0x7ff3e583b700 (LWP 540751)):
|
#2 0x000055661237e141 in srw_mutex_impl<true>::wait_and_lock (this=0x7ff40890f218) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$27 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000004}, <No data fields>}}
|
|
Thread 11 (Thread 0x7ff3e4f9e700 (LWP 540773)):
|
#2 0x000055661237e141 in srw_mutex_impl<true>::wait_and_lock (this=0x7ff40890f218) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$28 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000004}, <No data fields>}}
|
|
Thread 10 (Thread 0x7ff3e4bed700 (LWP 540781)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$29 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 9 (Thread 0x7ff3f0265700 (LWP 552009)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$30 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 8 (Thread 0x7ff3e4d05700 (LWP 540779)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x616001a59478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$31 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000007}, <No data fields>}}
|
|
Thread 7 (Thread 0x7ff3e4e63700 (LWP 540775)):
|
#2 0x000055661237e141 in srw_mutex_impl<true>::wait_and_lock (this=0x7ff40890f218) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$32 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000004}, <No data fields>}}
|
|
Thread 6 (Thread 0x7ff3e9e74700 (LWP 537977)):
|
#2 0x000055661237e4f8 in ssux_lock_impl<true>::wr_wait (this=0x7ff40890f218, lk=2147483649) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:378
|
$33 = {writer = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000004}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000001}, <No data fields>}}
|
|
Thread 5 (Thread 0x7ff3e5bec700 (LWP 540739)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$34 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 4 (Thread 0x7ff3e548a700 (LWP 540756)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x616001a59478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$35 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000007}, <No data fields>}}
|
|
Thread 3 (Thread 0x7ff3e6236700 (LWP 540718)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$36 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
|
Thread 2 (Thread 0x7ff3e5e85700 (LWP 540730)):
|
#2 0x000055661237de6a in srw_mutex_impl<false>::wait_and_lock (this=0x61600259c478) at /data/Server/bb-10.6-thiruC/storage/innobase/sync/srw_lock.cc:326
|
$37 = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000018}, <No data fields>}}
|
I filtered the output for unique objects:
sed -ne 's/.*this=\(0x[0-9a-f]*\).*/\1/p'|sort|uniq -c
|
1 0x5566144cb888
|
6 0x616001a59478
|
24 0x61600259c478
|
4 0x7ff40890f218
|
1 0x7ff40890ff18
|
1 0x7ff40891f7b8
|
- The first one is the latch of an rseg=0x5566144cb880. A reader is blocked because the writer Thread 7 is waiting for a latch 0x7ff40890f218 of an undo page 387, block=0x7ff40891f7a0, with 4 waiters in total.
- The second one is the latch of an index=0x616001a59308. CHECK TABLE is attempting to acquire a U-lock in Thread 8, btr_validate_index(). The conflicting lock holder is Thread 6, a purge coordinator that is waiting for the latch 0x7ff40890f218 on the undo page 387.
- The third one is the latch of an index=0x61600259c308 that according to the extended data structures of the debug build is being read-locked by Thread 16, which is waiting for an exclusive latch on page 3 (see below).
- Then, we have 4 threads waiting for the latch on the undo page 387 in the system tablespace, block=0x7ff40890f200, which is read-locked by Thread 16. An exclusive lock request will block future read lock requests until an exclusive lock has been acquired and released.
- Thread 16 is waiting for an exclusive latch on page 3 (the clustered index root page) of tablespace 11, block=0x7ff40890ff00, but Thread 11 and Thread 12 are holding shared latches. This looks wrong, and would be a regression due to
MDEV-15250. We should never acquire an index page latch while holding an undo page latch!
- Finally, Thread 19 is waiting for a shared latch on the undo page 433 in the system tablespace, block=0x7ff40891f7a0, but Thread 7 is holding it while waiting for a latch on the undo page 387.
- Both Thread 11 and Thread 12 are executing CHECK TABLE and holding a shared latch on the index page while waiting for a latch on the undo page (to reconstruct an older version).
The cause of this hang is a higher-level bug (violation of latching order: index page latches must be acquired before undo page latches). I think that MDEV-15250 has to be revised so that trx_t::apply_log() will never acquire an index page latch while holding an undo log page latch. thiru filed MDEV-28443 for fixing this. This hang is not present in any MariaDB release.
Before I realized my mistake in interpreting the core dump, I suspected that some thread would terminate while holding latches. That is not the case here. I was hoping that mleich can provide an https://rr-project.org trace that shows a similar ‘genuine’ hang, so that we can rerun the trace and check what exactly happened. Common wisdom is that rr record --chaos can create ‘fake hangs’ due to extremely unfair scheduling.
Summary: I think that we have to find and fix any ‘low-hanging fruit’ hangs in our code before we can blame the lock implementation (or the Linux kernel or the cache coherence protocol of the processor).
|
|
I realized that I made a mistake when analyzing the core dump: I was misinterpreting a std::set pointer as a thread ID. I will edit the comment to look up the threads again.
I analyzed one rr record trace that looked promising at first. It was using page_compressed to slow things down. As far as I could tell, a DROP operation of a table-rebuilding ALTER TABLE would wait for a U lock to update a system table (page 8) while holding exclusive dict_sys.latch. A write of that page had been submitted, but we know that rr does not support either libaio or liburing, and I have analyzed traces earlier where the simulated AIO threads were not given any time slices. That was the case here. It looks like we might not be able to rely on rr for analyzing this kind of hangs.
|
|
My impression from recent stress test runs on 10.6-based branches is that the MDEV-28443 hangs only occurred a few times during some hours of intensive stress tests. I checked two more core dumps of it. No such hangs occurred with MDEV-28443.
Another impression is that in our stress test environment with hundreds of concurrently running server processes, rr record is very prone to producing "fake hangs". I analyzed one case where server bootstrap (initializing a data directory) hung for 57 seconds. The process was apparently stuck in posix_fallocate() of the temporary tablespace ibtmp1. Another case that I analyzed was that the clustered index root page of the table mysql.innodb_table_stats was recovered and a write initiated, but the write never completed because rr chose an unfair scheduling, not giving any chance for the simulated asynchronous I/O thread to run. A thread that was waiting to load this table into the InnoDB dictionary table was then holding exclusive dict_sys.latch and waiting for the page latch. Finally, another thread that was waiting for a shared dict_sys.latch eventually triggered the InnoDB internal watchdog (innodb_fatal_semaphore_wait_threshold).
I hope that given enough time, our internal stress testing will eventually deliver a core dump of a hang that could match those that sstamm reported. It does not look like there are any really ‘low-hanging fruit’ hangs in our code at this moment.
sstamm, which SQL features do you use? Do you use SYSTEM VERSIONING, application-time periods, FULLTEXT INDEX, SPATIAL INDEX, or virtual columns (nonempty INFORMATION_SCHEMA.INNODB_SYS_VIRTUAL)? If there was a high-level bug that caused some code to ‘forget’ to release page latches, I would suspect those areas.
|
|
No SYSTEM VERSIONING or application-time periods, and the indices are all "INDEX .... USING BTREE".
Indeed, we are using virtual columns, but at a first look not at the tables we saw the locks.
Not sure if relevant, but some of the table effected have "ROW_FORMAT=COMPRESSED".
|
|
looks like at
thread apply all -s -- frame apply level 2 -s print/x *this
|
Is something wrong, but I am not so familiar with GDB to figure it out:
But checked some threads manually: stacktrace-2022-05-06.txt
|
|
Any updates or idea what's wrong at the GDB call? Got another lock today, and it ended up in "Too many connections" so would be tedious to geht all the stack traces manually.
|
|
Updated to 10.6.8 (and OS to Oracle Linux Server 8.6 and Kernel: Linux 5.4.17-2136.300.7.el8uek.x86_64)
Still got the Problem but slightly different. Looks like it started with an aborted sort:
2022-06-16 7:55:12 1144265 [Warning] Sort aborted, host: 10.254.28.149, user: fccUser, thread: 1144265, query: SET STATEMENT max_statement_time=180 FOR WITH confCTE AS (SELECT JSON_MERGE_PRESERVE(JSON_OBJECT('ocDeviceId', ocDeviceId), versionData, JSON_OBJECT('count', COUNT(DISTINCT flightRecordId))) AS jsonData, DATE_FORMAT(fr.date,'%x/%v') AS TimeSpan FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN rm4flightrecord AS rm4Fr USING(flightRecordId) LEFT JOIN otherComponent AS oc USING(flightRecordId) LEFT JOIN OcVersion AS ocv USING(ocId) WHERE NOT d.isInternal AND d.customer = "YKB" AND d.country = "TR_Turkey" AND fr.type = 'RM4' AND fr.calendarWeek >= 202213 AND fr.moduleType = 'RM4H' AND versionData IS NOT null GROUP BY versionData, ocDeviceId, DATE_FORMAT(fr.date,'%x/%v') ) SELECT DATE_FORMAT(fr.date,'%x/%v') AS TimeSpan, MIN(fr.date) AS MinDate, MAX(fr.date) AS MaxDate, COUNT(DISTINCT fr.deviceId) AS Devices, COUNT(DISTINCT fr.flightRecordId) AS Days, CONCAT('[', GROUP_CONCAT(DISTINCT jsonData),']') AS jsonData FROM Device AS
|
Added log-2022-06-17.7z witch hopefully contains the lock-state of all threads.
|
|
Next one: log-2022-06-20.txt
Anything else I can provide to get progress on this? Looks like it happens more often with 10.6.8
|
|
It could be that the version of GDB that you are using does not support the command that I suggested. I am currently using GDB 11.2.
It does not seem too likely that indexed virtual columns (even when they are created internally as a result of MDEV-371) would contribute to this, unless an affected index page actually includes a virtual column. Can you try to find out which index(es) of which table(s) the problematic locks belong to? In stacktrace-2022-05-06.txt , I see page_id={m_id = 61714385076230}. That number would be 0x382100000006 in hexadecimal, referring to page 6 of tablespace 0x3821 (SELECT * FROM INFORMATION_SCHEMA.INNODB_SYS_TABLES WHERE SPACE=14369 to find the table). If you can find the .ibd file for that tablespace 14369, you should find the 8 bytes of PAGE_INDEX_ID at byte offset 6*innodb_page_size+66 from the start of the file. That should correspond to the INFORMATION_SCHEMA.INNODB_SYS_INDEXES.INDEX_ID of the index. In the same stack trace, we can also see index=0x559e0428ccb0. What would print index->name, print index->table->name, print *index->fields@index->n_fields have displayed for it? (Of course without any core dump we will never know it for this old crash, but maybe for a later hang you can get that information.)
The print/x output in stacktrace-2022-05-06.txt is always for the same object 0x7f440c001d78, and it looks like there are up to 42 (36+6=0x24+6) threads waiting for or holding it. If the stack traces does not indicate anywhere near that amount of activity on that index page, some unlock operation (likely a call to mtr_t::commit()) was missed somewhere. One lock would be explained by a pending read or write operation in the file system.
In log-2022-06-17.7z I see 0x153 (339) threads waiting for a write lock on the object 0x7f3a98091c98. That pointer value occurs 5,075 times in the log. There apparently are 1143 threads in the log, or perhaps it is actually no more than 381 if the log includes the output of several commands that would list all or most threads.
It is very time-consuming to analyze large logs. We sometimes do it for support customers. In this case, what would be needed is to search the stack traces to account for each reference to the waited-for lock. The block or lock pointers or page numbers are not always available, but the table names might be.
I think that we will depend on you to provide some analysis of the logs along these lines, because we haven’t seen any hangs like this in our internal testing.
Perhaps you could set up a replica that would run a debug instrumented executable with a similar workload, to possibly catch problems earlier.
|
|
gdb.txt gdb2.txt
Hello,
We have the same issue. We have queries on "Sending Data" and when we kill them they stay stuck on "Killed".
They were all similar SELECT queries. No deletes/updates etc... running.
I have attached gdb.txt which has most output, but also gdb2.txt which also has the "thread apply all -s – frame apply level 2 -s print/x *this" command. I could not run that with my gdb version so compiled the latest gdb and ran it again with that. But it appears to not give any output.
Can this help at all..?
|
|
nielsh, unfortunately both gdb.txt and gdb2.txt show ?? for all the interesting functions, and for those functions that are shown a name, no function parameter names or values are displayed. To fix that, you must install the debug symbol packages that match the executable. For the record, (I like living on the edge, with latest possible software on my development system), I have the following line in my /etc/apt/sources.list:
deb http://deb.debian.org/debian-debug/ unstable-debug main
|
The instructions for installing the debug symbols depend on the operating system and the packages that are being used. For some GNU/Linux distributions, some MariaDB packages are available both from the distribution and from MariaDB.
Ideally, someone who can reproduce the hang would produce an SQL-based test case. We have been unable to reproduce this hang internally.
|
|
Hi Marko,
Thanks for your response. Unfortunately we ran into this issue on a single environment that is in production. We have since downgraded from 10.6.8 to 1.6.5. At the moment I'm not able test with this server.
I think we can check with 10.6.9 when it is released and if we can get a slave where we can trigger it we may be able to leave it in a broken state for a while to debug.
Sorry, wish I could debug this sooner but the impact is too great to do this right now I am afraid.
|
|
Added log-2022-07-29.txt with all the gdb things you requested.
|
|
In log-2022-07-29.txt , I checked all threads that would be waiting for a syscall to complete:
grep -A2 -w syscall log-2022-07-29.txt |grep '^#1'|sort|uniq -c
|
1 #1 0x000055b2e3ae20a8 in srw_mutex_impl<true>::wait (lk=<optimized out>, this=0x7f4385c61c40) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:238
|
4 #1 0x000055b2e3ae20a8 in srw_mutex_impl<true>::wait (lk=<optimized out>, this=0x7f6bb006f698) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:238
|
5 #1 0x000055b2e3ae20a8 in srw_mutex_impl<true>::wait (lk=<optimized out>, this=0x7f6bd0013bb8) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:238
|
257 #1 0x000055b2e3ae20a8 in srw_mutex_impl<true>::wait (lk=<optimized out>, this=0x7f85b0099a38) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:238
|
1 #1 0x000055b2e3ae2138 in ssux_lock_impl<true>::wait (lk=<optimized out>, this=0x7f6bb006f698) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:244
|
1 #1 0x000055b2e3ae2138 in ssux_lock_impl<true>::wait (lk=<optimized out>, this=0x7f6bd0013bb8) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:244
|
1 #1 0x000055b2e3ae2138 in ssux_lock_impl<true>::wait (lk=<optimized out>, this=0x7f85b0099a38) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:244
|
1 #1 0x000055b2e3be4bf5 in my_getevents (min_nr=1, nr=256, ev=0x7f85f5d17c60, ctx=<optimized out>) at /usr/src/debug/MariaDB-/src_0/tpool/aio_linux.cc:104
|
The last one is io_getevents() for the old Linux native asynchronous I/O implementation. We can ignore it. We can see that all mutexes except 0x7f4385c61c40 also occur in ssux_lock_impl<true>::wait. Let me check that one first:
Thread 18 (Thread 0x7f860808e700 (LWP 1684202)):
|
#0 0x00007f86157b991d in syscall () from target:/lib64/libc.so.6
|
No symbol table info available.
|
#1 0x000055b2e3ae20a8 in srw_mutex_impl<true>::wait (lk=<optimized out>, this=0x7f4385c61c40) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:238
|
…
|
#6 sux_lock<ssux_lock_impl<true> >::s_lock (this=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/sux_lock.h:362
|
No locals.
|
#7 buf_page_get_low (page_id={m_id = 192298576637790}, zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=10, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>)
|
We are waiting for a shared lock on an index page (id=192298576637790=0xaee50059ef5e). Because a conflicting (exclusive) latch exists on that block, we will wait for an exclusive latch, to release it immediately. Thread 14 is waiting for this same page. No stack traces of other threads refer to this page.
The most popular latch 0x7f85b0099a38 is associated with page_id=192298570743811=0xaee500000003, that is, the clustered index root page of the same table that Threads 14 and 18 are accessing. The root page contains some allocation information (needed when allocating or freeing pages), as well as the AUTO_INCREMENT value (MDEV-6076).
The other latches are 0x7f6bd0013bb8 (page_id=192298571577911=0xaee5000cba37) and 0x7f6bb006f698 (page_id=192298579600721=0xaee500872551). All latches are for the same table.
Let us check the lock states:
[Switching to thread 300 (Thread 0x7f44c8aed700 (LWP 1825042))]
|
#0 0x00007f86157b991d in syscall () from target:/lib64/libc.so.6
|
#2 srw_mutex_impl<true>::wait_and_lock (this=this@entry=0x7f85b0099a38) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:326
|
326 wait(lk);
|
$6 = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 0x4, _M_i = 0x80000102}, <No data fields>}, static HOLDER = 0x80000000}
|
…
|
[Switching to thread 96 (Thread 0x7f4535f4a700 (LWP 1815168))]
|
#0 0x00007f86157b991d in syscall () from target:/lib64/libc.so.6
|
#2 srw_mutex_impl<true>::wait_and_lock (this=this@entry=0x7f6bd0013bb8) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:326
|
326 wait(lk);
|
$188 = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 0x4, _M_i = 0x80000006}, <No data fields>}, static HOLDER = 0x80000000}
|
…
|
[Switching to thread 59 (Thread 0x7f453685f700 (LWP 1813731))]
|
#0 0x00007f86157b991d in syscall () from target:/lib64/libc.so.6
|
#2 srw_mutex_impl<true>::wait_and_lock (this=this@entry=0x7f6bb006f698) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:326
|
326 wait(lk);
|
$225 = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 0x4, _M_i = 0x80000005}, <No data fields>}, static HOLDER = 0x80000000}
|
…
|
[Switching to thread 18 (Thread 0x7f860808e700 (LWP 1684202))]
|
#0 0x00007f86157b991d in syscall () from target:/lib64/libc.so.6
|
#2 srw_mutex_impl<true>::wait_and_lock (this=this@entry=0x7f4385c61c40) at /usr/src/debug/MariaDB-/src_0/storage/innobase/sync/srw_lock.cc:326
|
326 wait(lk);
|
$266 = {lock = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 0x4, _M_i = 0x80000002}, <No data fields>}, static HOLDER = 0x80000000}
|
[Switching to thread 17 (Thread 0x7f860816f700 (LWP 1660784))]
|
In each case, the 31 least significant bits of the lock word are 1 more than the number of waiting threads. The exclusive lock holder should decrement the word by HOLDER+1 upon releasing the latch:
void wr_unlock()
|
{
|
const uint32_t lk= lock.fetch_sub(HOLDER + 1, std::memory_order_release);
|
if (lk != HOLDER + 1)
|
{
|
DBUG_ASSERT(lk & HOLDER);
|
wake();
|
}
|
}
|
The bookkeeping suggests that each of these latches is actually being held in exclusive mode, and there is no reason to suspect a bug in the latch implementation (which was new in 10.6).
An exclusive page latch could be held by a mini-transaction that is modifying the page, typically executing at most microseconds, or maybe milliseconds when waiting to load some page that is part of the same mini-transaction, such as splitting or merging an index tree page, or allocating a page for a BLOB.
Unfortunately, the buffer page pointer is not visible in the stack traces, so I can’t search easily if any other thread is using the affected blocks. A quick search for the page identifier only turned up the futex waits.
An exclusive page latch would be held while InnoDB is waiting for the page to be read into the buffer pool. It might actually be the case here. If that is the case, then MDEV-29190 would be a duplicate of this bug.
We would need to know the contents of read_slots during the hang. Here is an example from MDEV-29190:
(gdb) p read_slots
|
$5 = std::unique_ptr<io_slots> = {get() = 0x564d23f06e80}
|
To dereference the std::unique_ptr, you can execute a command like the following (using the same pointer value):
print *(io_slots*)0x564d23f06e80
|
Note: In MDEV-29190 the "simulated AIO" code path (using a pool of threads that would invoke pread or pwrite) was used, while in this case, the pre-uring (MDEV-24883) native asynchronous interface of io_submit() and io_getevents() is being used. I would expect the problem to be in the MariaDB thread pool and not the Linux kernel (like we did in MDEV-26674, causing hung writes).
|
|
I think that in a 64-bit build, the address of buf_page_t::lock should be 3*8 bytes after the start of the buf_page_t or buf_block_t object. Indeed, in log-2022-07-29.txt , I can find 0x7f85b0099a20 for 0x7f85b0099a38, but only as an argument to the mtr_t::page_lock() function. That is, I have no indication of a thread that could be holding the page latch in exclusive mode.
The following block addresses do not occur all in the output:
| block address |
latch address |
| 0x7f6bb006f680 |
x7f6bb006f698 |
| 0x7f6bd0013ba0 |
0x7f6bd0013bb8 |
| 0x7f4385c61c28 |
0x7f4385c61c40 |
The addresses could still be buried in mtr_t::m_memo, which are not available in the output. Those as well as read_slots should be present in a core dump. The buffer pool (and the buffer page descriptors) are excluded by default (MDEV-10814).
sstamm (or anyone who reproduces this hang), next time, after you have dumped the precious part of the buffer pool, please use the GDB command gcore or generate-core-file to write out a core dump for further analysis. I suggest that you practice with it first before this type of a hang occurs again. I have not used the command myself. The following commands could be useful:
set use-coredump-filter off
|
set dump-excluded-mappings on
|
generate-core-file /path/to/core
|
As far as I understand, enabling dump-excluded-mappings should cause the buffer pool (and some other large buffers) to be included in the core dump. You might want to produce two core dumps, one with and another without that setting.
|
|
Can try but I'm a little bit curious because we have an buffer pool of 256GB and an memory footprint of > 300GB. Not sure what generate-core-file exactly does, but if it tries to dump the whole memory it could be bad.
|
|
I understand that full buffer pool is quite a lot, but
to prove that that there are pending ios, or no pending ios (that is which marko is trying to find out, one needs only to get the value of read_slots->pending_io_count() at the time of the hang, nothing else. Which is hopefully automatable in gdb (@marko, as gdb automation expert, hopefully can tell to tell how). One can also print those read_slots, marko can tell how.
|
|
sstamm, if you do not execute set dump-excluded-mappings on, then based on the coredump that was generated in MDEV-29190 using GDB, the buffer pool will be excluded from the dump. Ideally, we would need only a single full core dump to conclude what exactly is going wrong.
wlad, the read_slots was developed by you and I do not have much knowledge of it. As demonstrated in MDEV-29190, the read_slots should always be included in a core dump. The GDB commands print or call should be able to invoke functions as long as a process is active and the function is not inlined, which tends not to hold in optimized builds. One may have to substitute the final expression for the call. That is what I had to do previously when trying to debug the code, but I don’t think I fully figured it out. You defined the function io_slots::pending_io_count() inline in storage/innobase/os/os0file.cc:
size_t pending_io_count()
|
{
|
return (size_t)m_max_aio - m_cache.size();
|
}
|
Also the tpool::cache<tpool::aiocb>::size() is defined inline, possibly in tpool/tpool_structs.h.
|
|
wlad, with the following patch, I established that the io_slots::pending_io_count() is the only caller of tpool::cache::size():
diff --git a/tpool/tpool_structs.h b/tpool/tpool_structs.h
|
index 7b0fb857695..83f9d05e6a7 100644
|
--- a/tpool/tpool_structs.h
|
+++ b/tpool/tpool_structs.h
|
@@ -122,7 +122,7 @@ template<typename T> class cache
|
m_waiters--;
|
}
|
|
- TPOOL_SUPPRESS_TSAN size_t size()
|
+ TPOOL_SUPPRESS_TSAN __attribute__((deprecated)) size_t size()
|
{
|
return m_cache.size();
|
}
|
The only callers of io_slots::pending_io_count() are os_aio_wait_until_no_pending_reads() and os_aio_wait_until_no_pending_writes(). It is not obvious to me that an incorrect emptiness check would cause trouble in those functions.
In the InnoDB code base, we try to document in comments the relationship between mutexes, condition variables, and the data members that they protect. In tpool::cache I do not see any comments:
template<typename T> class cache
|
{
|
std::mutex m_mtx;
|
std::condition_variable m_cv;
|
std::vector<T> m_base;
|
std::vector<T*> m_cache;
|
cache_notification_mode m_notification_mode;
|
int m_waiters;
|
|
bool is_full()
|
{
|
return m_cache.size() == m_base.size();
|
}
|
In is_full() there is no comment that m_base.size() is expected to be constant. If it is not, then I think that it really should acquire a mutex, to protect those two unrelated data members. Invoking std::vector::size() while the collection might be concurrently modified by another thread would not seem to be thread-safe. If the size is expected to be a constant, then I think it would be better to use std:array or plain arrays.
|
|
markoI 'm glad that Innodb has established the coding guidelines, and documents every member of every struct. It is innodb's internal thing however.
The is_full member function is a private one, and in both cases it is used in the code, it is called with the the mutex protection.
|
|
I guess I found out how to reproduce the error. At our fronend we are limiting queries with "SET STATEMENT max_statement_time=180 SELECT ...." and in some cases, they were killed, but the processes does not terminate:
---TRANSACTION (0x7f6934467780), ACTIVE 1442 sec starting index read
|
mysql tables in use 10, locked 0
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 7538683, OS thread handle 140089494353664, query id 921379883 10.254.28.149 fccUser Sending data
|
SET STATEMENT max_statement_time=180 FOR WITH svCTE AS (SELECT flightRecordId FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN rm3flightrecord AS rm3Fr USING(flightRecordId) LEFT JOIN otherComponent AS oc USING(flightRecordId) LEFT JOIN OcVersion AS ocv USING(ocId) WHERE ... GROUP BY flightRecordId ) SELECT f
|
Trx read view will not see trx with id >= 931963402, sees < 931963402
|
---TRANSACTION (0x7f693446a380), ACTIVE 1442 sec starting index read
|
mysql tables in use 11, locked 0
|
0 lock struct(s), heap size 1128, 0 row lock(s)
|
MariaDB thread id 7538674, OS thread handle 140089493432064, query id 921379884 10.254.28.149 fccUser Sending data
|
SET STATEMENT max_statement_time=180 FOR WITH svCTE AS (SELECT flightRecordId FROM Device AS d JOIN Flightrecord AS fr USING(deviceId) JOIN rm3flightrecord AS rm3Fr USING(flightRecordId) LEFT JOIN otherComponent AS oc USING(flightRecordId) LEFT JOIN OcVersion AS ocv USING(ocId) WHERE ... GROUP BY flightRecordId ) SELECT J
|
Trx read view will not see trx with id >= 931963402, sees < 931963402
|
An this looks like the cause of the hang.
Added log-2022-08-02.txt where I tried to "p read_slots", not sure if I did the right things.
Also create 2 core-dumbs, one 25G and one 281G. Question would be how to share them, I'm reserved to post such thing public to the internet.
|
|
sstamm, great work, also with the core dumps. Would it be possible to get ssh access to a system where it is available? Or something with screen sharing, so that you can monitor that no data will be leaked?
Related to MDEV-29190, danblack was suspecting something around killed user transactions or connections. One possible hypothesis would be that when a transaction is killed, somewhere we forget to invoke mtr_t::commit() and thus fail to release an exclusive page latch that the thread was operating on. We do not currently have any debug check against that in mtr_t::~mtr_t(); in fact, we do not define any destructor. A repeatable test case would be pretty much preferred, because the haystack is so large and the needle so small.
|
|
You've got mail. I'm not able to reproduce it by 100%, just hidded the db with queries which times out over 15 min and then I got two which stalled in the 'killed' state. But not directly on the first call.
|
|
sstamm, thank you for sharing where to download the core dumps. The core dumps alone are not sufficient for debugging. I would also need copies of the shared libraries that were used for generating the core dump (ldd /usr/sbin/mariadbd should list the file names) as well as a copy of the executable and its debug information.
Anything that reproduces the problem even intermittently is much better than nothing. A core dump is only a snapshot of the system state. When concurrency is involved, it is often difficult to guess what kind of interleaved execution between threads led to that situation, especially when some of the threads have already moved on.
Last night, I checked that whenever the function lock_wait() (which waits for InnoDB table or record locks to be granted) is invoked, the caller should have released any page latches, that is, there should not be any active mini-transaction. The logic is that we will create an object for the waiting lock request, then commit the mini-transaction of the operation that would have required the lock, then invoke lock_wait(). That function would finish with one of "lock granted", "our transaction was aborted", or "timeout". If anything like KILL QUERY or max_statement_time caused the connection to be aborted, the lock_wait() would be woken up, and the execution would return from InnoDB to the SQL layer. There is no obvious place where an mtr_t::commit() call could be forgotten.
That said, MDEV-27031 has something to do with aborting transactions. I will try to dig up the rr replay trace of that failure.
|
|
Using 10.6.8 from http://yum.mariadb.org/10.6/rhel8-amd64 (including the debug-symbols)
[sebastian.stamm@depb3cifc02lyp ~]$ ldd /usr/sbin/mariadbd
|
linux-vdso.so.1 (0x00007fff4c5f7000)
|
libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007fd642250000)
|
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fd642027000)
|
libaio.so.1 => /lib64/libaio.so.1 (0x00007fd641e24000)
|
liblzma.so.5 => /lib64/liblzma.so.5 (0x00007fd641bfd000)
|
libpmem.so.1 => /lib64/libpmem.so.1 (0x00007fd6419cb000)
|
libsystemd.so.0 => /lib64/libsystemd.so.0 (0x00007fd641681000)
|
libz.so.1 => /lib64/libz.so.1 (0x00007fd641469000)
|
libssl.so.1.1 => /lib64/libssl.so.1.1 (0x00007fd6411d5000)
|
libcrypto.so.1.1 => /lib64/libcrypto.so.1.1 (0x00007fd640cec000)
|
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd640acc000)
|
libdl.so.2 => /lib64/libdl.so.2 (0x00007fd6408c8000)
|
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fd640533000)
|
libm.so.6 => /lib64/libm.so.6 (0x00007fd6401b1000)
|
libc.so.6 => /lib64/libc.so.6 (0x00007fd63fdec000)
|
/lib64/ld-linux-x86-64.so.2 (0x00007fd644738000)
|
librt.so.1 => /lib64/librt.so.1 (0x00007fd63fbe4000)
|
liblz4.so.1 => /lib64/liblz4.so.1 (0x00007fd63f9c7000)
|
libcap.so.2 => /lib64/libcap.so.2 (0x00007fd63f7bf000)
|
libmount.so.1 => /lib64/libmount.so.1 (0x00007fd63f565000)
|
libgcrypt.so.20 => /lib64/libgcrypt.so.20 (0x00007fd63f247000)
|
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fd63f02e000)
|
libblkid.so.1 => /lib64/libblkid.so.1 (0x00007fd63eddb000)
|
libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fd63ebd3000)
|
libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fd63e9a9000)
|
libgpg-error.so.0 => /lib64/libgpg-error.so.0 (0x00007fd63e788000)
|
And placed the libs on the SFTP
|
|
sstamm, thank you, I am downloading the files (ETA 2 hours). Since I am on Debian, I will ask someone to provide the mariadbd and the debug information to me.
It may take a couple of weeks before I can return to this. It is good to have the core dump, so that I can check whatever hypotheses I will come up with. If you can provide anything that would reproduce the hang, that would still be a huge help.
|
|
I had attached a spare SSD so that I had enough space for the core files. unzip -t of the archive showed no errors. I only unpacked and checked the smaller core file so far.
For future reference, here is how I could get good stack traces from the smaller core file:
- Download MariaDB-server-10.6.8-1.el8.x86_64.rpm and extract usr/sbin/mariadbd out of it.
- Download MariaDB-server-debuginfo-10.6.8-1.el8.x86_64.rpm and extract all of it (to usr). I used GNOME file-roller for that.
- Extract the shared libraries to the subdirectory ldd and the core dump to core.small.
- Invoke GDB as follows:
gdb --ex 'set debug-file-directory usr/lib/debug' --ex 'file usr/sbin/mariadbd' --ex 'set solib-search-path ldd' --ex 'core-file core.small'
|
A good stack trace contains source file names, line numbers, function parameter names and values, like this:
#0 0x00007f69416a7ab1 in poll () from /mariadb/MDEV-27983/ldd/libc.so.6
|
#1 0x00005641152bbf85 in poll (__timeout=-1, __nfds=<optimized out>,
|
__fds=<optimized out>) at /usr/include/bits/poll2.h:46
|
#2 handle_connections_sockets ()
|
at /usr/src/debug/MariaDB-/src_0/sql/mysqld.cc:6162
|
#3 0x00005641152bcb88 in mysqld_main (argc=<optimized out>,
|
argv=<optimized out>) at /usr/src/debug/MariaDB-/src_0/sql/mysqld.cc:5867
|
#4 0x00007f69415bdcf3 in __libc_start_main ()
|
from /mariadb/MDEV-27983/ldd/libc.so.6
|
#5 0x00005641152b114e in _start ()
|
If the debug information is unavailable, this would be displayed instead:
#0 0x00007f69416a7ab1 in poll () from /mariadb/MDEV-27983/ldd/libc.so.6
|
#1 0x00005641152bbf85 in handle_connections_sockets() ()
|
#2 0x00005641152bcb88 in mysqld_main(int, char**) ()
|
#3 0x00007f69415bdcf3 in __libc_start_main ()
|
from /mariadb/MDEV-27983/ldd/libc.so.6
|
#4 0x00005641152b114e in _start ()
|
I do not currently have any ideas what to check in the core dumps. I hope that we will be able to repeat the hang in not too distant future. These dumps would then be useful for checking hypotheses about the root cause, or checking if we will have repeated the same hang.
|
|
A potential explanation of this might be MDEV-29336.
Edit: Sorry, no, that is a corner case that would seem to only affect BLOB operations in tiny tables (consisting of a single index page).
|
|
sstamm, if you are using ROW_FORMAT=COMPRESSED tables, then MDEV-29190 could offer an explanation for the hang. I did not yet plug in the spare hard disk to check the core dumps that you provided.
|
|
sstamm, it looks like your hang is the same as MDEV-29190. In log-2022-08-02.txt (from MariaDB Server 10.6.8), I see the following:
|
mariadb-10.6.8
|
Thread 27 (Thread 0x7f69208c3700 (LWP 2361764)):
|
…
|
#6 sux_lock<ssux_lock_impl<true> >::s_lock (this=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/sux_lock.h:362
|
No locals.
|
#7 buf_page_get_low (page_id={m_id = 192169722320674}, zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=10, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>)
|
at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0buf.cc:2698
|
The same page identifier occurs in another thread:
|
mariadb-10.6.8
|
Thread 9 (Thread 0x7f69207e2700 (LWP 1836702)):
|
#0 0x00007f6942275180 in nanosleep () from target:/lib64/libpthread.so.0
|
No symbol table info available.
|
#1 0x00005641152abcd1 in std::this_thread::sleep_for<long, std::ratio<1l, 1000000l> > (__rtime=<optimized out>) at /usr/include/c++/8/chrono:465
|
__s = <optimized out>
|
__ns = <optimized out>
|
__ts = {tv_sec = 0, tv_nsec = 83160}
|
__s = <optimized out>
|
__ns = <optimized out>
|
__ts = <optimized out>
|
#2 buf_page_get_low (page_id={m_id = 192169722320674}, zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=10, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>)
|
at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0buf.cc:2812
|
These are the same line numbers and the same release (10.6.8) as in MDEV-29190.
I did not yet check the core dump that contains the buffer pool. If it contains block->page.zip.fix == 0x20000002 for this block, then it is exactly the same scenario as in MDEV-29190.
Before we can declare this bug fixed, we must attempt to reproduce this hang internally, both with and without my tentative fix. Reproducing it requires some buffer pool eviction activity combined with ROW_FORMAT=COMPRESSED.
|
|
I waited 42 minutes for the 300GiB core dump to be decompressed, and indeed the scenario is just like in MDEV-29190.
|
mariadb-10.6.8
|
Thread 9 (LWP 1836702):
|
#0 0x00007f6942275180 in nanosleep () from /mariadb/MDEV-27983/ldd/libpthread.so.0
|
#1 0x00005641152abcd1 in std::this_thread::sleep_for<long, std::ratio<1l, 1000000l> > (__rtime=<optimized out>) at /usr/include/c++/8/thread:379
|
#2 buf_page_get_low (page_id={m_id = 192169722320674}, zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=10, mtr=<optimized out>, err=<optimized out>, allow_ibuf_merge=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0buf.cc:2812
|
…
|
(gdb) thread 27
|
[Switching to thread 27 (LWP 2361764)]
|
(gdb) frame 7
|
#7 buf_page_get_low (page_id={m_id = 192169722320674}, zip_size=<optimized out>, rw_latch=<optimized out>, guess=<optimized out>, mode=10, mtr=<optimized out>, err=<optimized out>,
|
allow_ibuf_merge=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/buf/buf0buf.cc:2698
|
2698 block->page.lock.s_lock();
|
(gdb) p/x *block.page
|
value has been optimized out
|
(gdb) down
|
#6 sux_lock<ssux_lock_impl<true> >::s_lock (this=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/sux_lock.h:362
|
362 lock.rd_lock();
|
(gdb)
|
#5 0x00005641152ac515 in ssux_lock_impl<true>::rd_lock (this=0x7f230c036180) at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/srw_lock.h:247
|
(gdb) p/x *(buf_page_t*)0x7f230c036168
|
$1 = {id_ = {m_id = 0xaec700091722}, hash = 0x0, oldest_modification_ = {m = {<std::__atomic_base<unsigned long>> = {_M_i = 0x0}, <No data fields>}}, lock = {lock = {writer = {
|
lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000002}, <No data fields>}}, readers = {<std::__atomic_base<unsigned int>> = {_M_i = 0x80000000}, <No data fields>}}, recursive = 0x1,
|
writer = {<std::__atomic_base<unsigned long>> = {_M_i = 0x7f69207e2700}, <No data fields>}}, frame = 0x0, zip = {data = 0x7f370511e000, m_end = 0x0, m_nonempty = 0x0, n_blobs = 0x0, ssize = 0x4, fix = {
|
m = {<std::__atomic_base<unsigned int>> = {_M_i = 0x20000002}, <No data fields>}}}, list = {prev = 0x0, next = 0x0}, LRU = {prev = 0x7f2684fa32d8, next = 0x7f45bc10e0a0}, old = 0x1,
|
freed_page_clock = 0x0, access_time = {m_counter = {<std::__atomic_base<unsigned int>> = {_M_i = 0x0}, <No data fields>}}}
|
Just like in MDEV-29190, the zip.fix field contains the value 0x20000002, or buf_page_t::UNFIXED + 2, and the frame is nullptr, meaning that the block only exists in the buffer pool in compressed form. The buf_page_t::UNFIXED+2 accounts for the reference counts held by Thread 27 (waiting for a shared latch) and Thread 9 (holding an exclusive latch, waiting for Thread 27 to release its reference).
Curiously, thread find 0x7f69207e2700 does not find anything for me in this core dump; my GDB does not know Thread 9 by that identifier.
|
|
Thanks, I am glad you found.
|
|
I checked two core dumps for hangs that mleich gave me, and both match this scenario. They involved the change buffer bitmap page (page 1) of a ROW_FORMAT=COMPRESSED table. In both cases, the zip.fix was buf_page_t::UNFIXED + 2 as well. The stack traces were quite different in the two hangs: SELECT and INSERT, and in the other case, SELECT and a purge of an old version of a BLOB.
|
|
For the record, the hangs were reproduced with this patch, which I hope improved the reproducibility:
diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
|
index eaf0f955a1f..01b5c689bc8 100644
|
--- a/storage/innobase/buf/buf0buf.cc
|
+++ b/storage/innobase/buf/buf0buf.cc
|
@@ -2529,6 +2529,7 @@ buf_page_get_low(
|
return nullptr;
|
}
|
|
+ std::this_thread::sleep_for(std::chrono::microseconds(500));
|
/* A read-fix is released after block->page.lock
|
in buf_page_t::read_complete() or
|
buf_pool_t::corrupted_evict(), or
|
This would give the other thread a better chance to grab an exclusive latch before the current thread would acquire a shared latch. In the tentative fix, this thread would not acquire a shared latch at all, but retry a lookup in some time.
|
|
The final fix uses a simpler condition. The more complex condition in the tentative fix would not avoid all hangs.
Because 10.6.9 introduced recovery bugs such as MDEV-29374, the consequences of this hang got more serious in 10.6.9. I would recommend a downgrade to 10.6.8 until 10.6.10 is available. The fix does apply cleanly to 10.6.8.
|