Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36465

MDEV-33813 Regression, Queries in 'Waiting for someone to free space' state will not automatically retry IO and hang forever

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.11.11
    • 10.11
    • None
    • Debian Linux 12

    Description

      Unfortunately the fix for MDEV-33813 (Commit https://github.com/MariaDB/server/commit/3655cefc42902a3b3f584d252de7af488feaff55 ) seems to have introduced a regression:
      Since this patch, queries in the state 'Waiting for someone to free space' will not automatically retry writes (after 60 seconds), so once a query enters the state, it will never leave it (unless it is killed).

      I tested this with a MariaDB 10.11.11 installation on Debian Linux 12, using a MyISAM table.
      While bulk-inserting into the table, I restricted MariaDB disk usage via quota and the queries entered the 'Waiting for someone to free space' state (as expected). After removing the quota again, the queries would then hang forever without retrying IO.

      The behaviour before the above mentioned commit was to sleep MY_WAIT_FOR_USER_TO_FIX_PANIC (default: 60) seconds in wait_for_free_space(), which got called repeatedly e.g. by mysys/my_write.c -> my_write().
      After the commit however, this timed retry is gone, because mariadb_sleep_for_space() replaces the sleep() call and enters mysql_cond_wait() for the current thread, waiting indefinitely for wakeup signals (e.g. issued by kill commands). This leads to the described hanging queries after freeing disk space.

      Proposed fix: Replace mysql_cond_wait() with mysql_cond_timedwait(), e.g.:

      --- sql/sql_class.cc.org        2025-01-30 12:01:24.000000000 +0100
      +++ sql/sql_class.cc    2025-04-02 21:02:14.120032562 +0200
      @@ -8511,11 +8511,13 @@
           sleep(seconds);
           return;
         }
      +  struct timespec abstime;
      +  set_timespec(abstime, seconds);
        mysql_mutex_lock(&thd->LOCK_wakeup_ready);
         thd->ENTER_COND(&thd->COND_wakeup_ready, &thd->LOCK_wakeup_ready,
                         &stage_waiting_for_disk_space, &old_stage);
         if (!thd->killed)
      -    mysql_cond_wait(&thd->COND_wakeup_ready, &thd->LOCK_wakeup_ready);
      +    mysql_cond_timedwait(&thd->COND_wakeup_ready, &thd->LOCK_wakeup_ready, &abstime);
         thd->EXIT_COND(&old_stage);
         return;
       }
      

      I am currently running MariaDB 10.11 with the above patch, which resolves the issue for me.

      Attachments

        Issue Links

          Activity

            No workflow transitions have been executed yet.

            People

              monty Michael Widenius
              thomas.stangner Thomas Stangner
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.