Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Do
-
10.5, 10.6, 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.7(EOL)
Description
Split from MDEV-26381. Logging a simplified overview here, with easy reproducibility and keeping things simple, though there are likely more aspects to these hang(s), some described in that ticket.
Execute the attached hang.sql (identical to [MDEV-26381_OTHER_1.sql] from MDEV-26381), using 10k threads, with all threads replaying in random order (against test db).
After a few minutes, even on optimized builds, partial hang issues will start to show. SHOW FULL PROCESSLIST attached as show_full_processlist.txt as a 10.7 example of such an occurrence. Issue is very easy to reproduce.
When logging errors (like ERROR 1146 (42S02) at line 1: Table 'test.t2' doesn't exist) to the screen, it's easy to see when the server starts locking up after 1-5 minutes as the error rate either abruptly stops or slows down clearly/significantly. It then stays in that semi-hang state for 30+ minutes, sometimes unlocking partially with some threads continuing to process transactions whilst others remain in hanged state.
Machine is not OOM, nor OOS, nor busy (nothing else running), not challenged by the 10k threads (low load average in htop). IOW, this is not server hardware/capability related in any way afaict.
Tested version/revision was 10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Optimized)
Attachments
Issue Links
- relates to
-
MDEV-26935 Improve MDL to be scalable with many thousands OS threads
- Open
- split from
-
MDEV-26381 InnoDB: Failing assertion: fsize != os_offset_t(-1) on CREATE TABLE with many partitions / ENOMEM (out of kernel memory) from fstat handling
- Closed