Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29809

MariaDB: node crash and recovery - Semaphore wait has lasted > 600 seconds. We intentionally crash....

Details

    Description

      Several days ago,we met a problem a node of MM cluster crashed and then recovery itself.There was a heavy load-in migration task at that time.The related log is in the attachment,11.PNG tells that the server is hanged and crashed raised signal 6. 12.PNG tells the server raised singal 11 and aborting ,that is the reason backtracing file is not generated.13.PNG and 14.PNG tell the informtion when analyzing singal waiting.
      If you need any other information to helping solve the problem ,please contact us,thank you!

      Attachments

        1. 11.PNG
          11.PNG
          9 kB
        2. 12.PNG
          12.PNG
          49 kB
        3. 13.PNG
          13.PNG
          11 kB
        4. 14.PNG
          14.PNG
          6 kB

        Issue Links

          Activity

            danblack Daniel Black added a comment -

            Was a core file generated ? Is installing debug info packages and obtaining a backtrace from the core (as text) possible?

            A 91M count in the resevation array (13.png) seems like a lot. What configuration are you running? Do you have some forms of query for the load-in migration and their tables?

            I removed MDEV-24294 as this was 10.5+ and Galera so not your problem. It could be similar to MDEV-25048, though like that one, much more information is needed to even approach this problem. For private uploads push data to https://mariadb.com/kb/en/meta/mariadb-ftp-server/ and this won't be public and will only be used to resolve this issue.

            danblack Daniel Black added a comment - Was a core file generated ? Is installing debug info packages and obtaining a backtrace from the core (as text) possible? A 91M count in the resevation array (13.png) seems like a lot. What configuration are you running? Do you have some forms of query for the load-in migration and their tables? I removed MDEV-24294 as this was 10.5+ and Galera so not your problem. It could be similar to MDEV-25048 , though like that one, much more information is needed to even approach this problem. For private uploads push data to https://mariadb.com/kb/en/meta/mariadb-ftp-server/ and this won't be public and will only be used to resolve this issue.

            By default, due to MDEV-10814, core dumps will not include a copy of the buffer pool or the buffer page descriptor. For debugging this hang, we might need access to them. A dict_index_t::lock covers all non-leaf pages in the corresponding index B-tree. It could be good to attach GDB to the server before it hangs and let the execution continue until the fatal signal is delivered.

            It would be better to attach output as text instead of bitmaps. In the server error log, were there any reports of corrupted pages? I believe that before the fix of MDEV-13542, InnoDB could hang in some cases when it is trying to read a corrupted page.

            Is this hang reproducible with MariaDB Server 10.6.10 or a later version?

            marko Marko Mäkelä added a comment - By default, due to MDEV-10814 , core dumps will not include a copy of the buffer pool or the buffer page descriptor. For debugging this hang, we might need access to them. A dict_index_t::lock covers all non-leaf pages in the corresponding index B-tree. It could be good to attach GDB to the server before it hangs and let the execution continue until the fatal signal is delivered. It would be better to attach output as text instead of bitmaps. In the server error log, were there any reports of corrupted pages? I believe that before the fix of MDEV-13542 , InnoDB could hang in some cases when it is trying to read a corrupted page. Is this hang reproducible with MariaDB Server 10.6.10 or a later version?

            People

              Unassigned Unassigned
              jjs Wenwen Jing
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.