Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26873

Partial server hang when using many threads



    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Won't Do
    • 10.2, 10.3, 10.4, 10.5, 10.6, 10.7
    • N/A
    • Locking


      Split from MDEV-26381. Logging a simplified overview here, with easy reproducibility and keeping things simple, though there are likely more aspects to these hang(s), some described in that ticket.

      Execute the attached hang.sql (identical to [MDEV-26381_OTHER_1.sql] from MDEV-26381), using 10k threads, with all threads replaying in random order (against test db).

      After a few minutes, even on optimized builds, partial hang issues will start to show. SHOW FULL PROCESSLIST attached as show_full_processlist.txt as a 10.7 example of such an occurrence. Issue is very easy to reproduce.

      When logging errors (like ERROR 1146 (42S02) at line 1: Table 'test.t2' doesn't exist) to the screen, it's easy to see when the server starts locking up after 1-5 minutes as the error rate either abruptly stops or slows down clearly/significantly. It then stays in that semi-hang state for 30+ minutes, sometimes unlocking partially with some threads continuing to process transactions whilst others remain in hanged state.

      Machine is not OOM, nor OOS, nor busy (nothing else running), not challenged by the 10k threads (low load average in htop). IOW, this is not server hardware/capability related in any way afaict.

      Tested version/revision was 10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Optimized)


        Issue Links



              wlad Vladislav Vaintroub
              Roel Roel Van de Paar
              0 Vote for this issue
              5 Start watching this issue



                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.