Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20126

Semaphore timeout due to large fulltext indexes

Details

    Description

      MySQL 5.6.45 contains a change that refers to Oracle Bug #25289359 DML/DDL ON LARGE FULLTEXT TABLES CAUSE SEMAPHORE TIMEOUTS AND ASSERTION/SUICIDE.

      There is no test case, but there is debug instrumentation. The change continues the bogus assumption that time(NULL)) is a monotonically increasing sequence. That assumption is demonstrably broken in MDEV-14154. The change might also assume fair scheduling among the threads, which might not hold on a heavily loaded system.

      I think that we should study what the problem is and whether it affects MariaDB, and then come up with a better fix.

      Attachments

        Issue Links

          Activity

            The main idea of the Oracle fix is related to the limiting the atomicity of fts_sync_write_words().
            This part of the fix wrongly assumes that the system clock is monotonic (never moving backwards):

            ulint cache_lock_time = ut_time() - sync_start_time;
            if (cache_lock_time > lock_threshold) {
            

            Similar to our MDEV-14154 changes, in particular the one that removed bogus assertions, we should use something like this:

            ulint interval = ulint(time(NULL) - start_time);
            if (lint(interval) < 0 || interval > time_limit)) {
            

            That is, we will time out if the time moved backwards.

            Because time(NULL) may have a much lower overhead than my_interval_timer() or other monotonic clock sources and because the precision of one second suffices here, I think that we should stick to time(NULL).

            Anyway, the main idea of the Oracle change is to extend the innodb_fatal_semaphore_wait_threshold (srv_fatal_semaphore_wait_threshold) if a fts_sync_table() operation from outside the optimizer thread is taking longer. That is, it will prevent the operation of the InnoDB built-in watchdog, instead of actually fixing the root cause of the problem.

            marko Marko Mäkelä added a comment - The main idea of the Oracle fix is related to the limiting the atomicity of fts_sync_write_words() . This part of the fix wrongly assumes that the system clock is monotonic (never moving backwards): ulint cache_lock_time = ut_time() - sync_start_time; if (cache_lock_time > lock_threshold) { Similar to our MDEV-14154 changes, in particular the one that removed bogus assertions , we should use something like this: ulint interval = ulint( time (NULL) - start_time); if (lint(interval) < 0 || interval > time_limit)) { That is, we will time out if the time moved backwards. Because time(NULL) may have a much lower overhead than my_interval_timer() or other monotonic clock sources and because the precision of one second suffices here, I think that we should stick to time(NULL) . Anyway, the main idea of the Oracle change is to extend the innodb_fatal_semaphore_wait_threshold ( srv_fatal_semaphore_wait_threshold ) if a fts_sync_table() operation from outside the optimizer thread is taking longer. That is, it will prevent the operation of the InnoDB built-in watchdog, instead of actually fixing the root cause of the problem.

            To fix this issue, InnoDB should have multiple fts_optimize_threads to process the messages from the queue.
            By using multiple fts_optimize_threads, InnoDB can reduce the cache size significantly and it can make
            lesser wait time for DDL/dict_table_mem_free() for fts_optimize_remove_table().

            thiru Thirunarayanan Balathandayuthapani added a comment - To fix this issue, InnoDB should have multiple fts_optimize_threads to process the messages from the queue. By using multiple fts_optimize_threads, InnoDB can reduce the cache size significantly and it can make lesser wait time for DDL/dict_table_mem_free() for fts_optimize_remove_table() .

            People

              thiru Thirunarayanan Balathandayuthapani
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.