Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29846

deadlock on dict_sys.mutex on database drop

Details

    Description

      There is semaphore timeout, found by Matthias during MDEV-28709 testing(10.5).

      The timeout is on "mutex_enter(&dict_sys.mutex)" in "dict_table_open_on_name()".
      This mutex was acquired by the thread which executes "while ((table_name = dict_get_first_table_name_in_db(name)))" loop in "row_drop_database_for_mysql()" under that mutex. And execution is so long that semaphore timeout was caught.

      The loop does not make any progress, as "dict_get_first_table_name_in_db(name)" constantly returns the same table_name after "row_drop_table_for_mysql(table_name, ...)" returned DB_SUCCESS for it. Table name is "cool_down/#sql-ib22", so it's tmp table. DICT_TF2_TEMPORARY is not set for the table, that is why "row_drop_table_for_mysql(table_name, ...)" pushes the table in background table drop list and returns DB_SUCCESS.

      row_drop_tables_for_mysql_in_background() tries to open the table with dict_table_open_on_id() , which, in turns, requests dict_sys.mutex and can't acquire it, as row_drop_database_for_mysql() holds it.

      So, we have deadlock, which can't be detected with sync_array_detect_deadlock() , as row_drop_database_for_mysql() waits state changes from row_drop_tables_for_mysql_in_background() , which is waiting for the mutex held by row_drop_database_for_mysql() .

      Attachments

        Issue Links

          Activity

            vlad.lesin Vladislav Lesin added a comment - - edited

            10.5 only is analysed, I don't know yet if other versions are affected.

            vlad.lesin Vladislav Lesin added a comment - - edited 10.5 only is analysed, I don't know yet if other versions are affected.

            DROP DATABASE was rewritten and the "background DROP queue" removed in MariaDB 10.6 as part of MDEV-25506 and MDEV-25691. Also, the combination of what used to be dict_sys.mutex and dict_operation_lock was ultimately replaced with dict_sys.latch in MDEV-24258.

            I do not think that 10.6 or later versions should be affected by this. It may not be feasible to spend time fixing this bug in earlier versions.

            marko Marko Mäkelä added a comment - DROP DATABASE was rewritten and the "background DROP queue" removed in MariaDB 10.6 as part of MDEV-25506 and MDEV-25691 . Also, the combination of what used to be dict_sys.mutex and dict_operation_lock was ultimately replaced with dict_sys.latch in MDEV-24258 . I do not think that 10.6 or later versions should be affected by this. It may not be feasible to spend time fixing this bug in earlier versions.

            People

              vlad.lesin Vladislav Lesin
              vlad.lesin Vladislav Lesin
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.