Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20621

FULLTEXT INDEX activity causes InnoDB hang

Details

    Description

      We are experiencing technical difficulties with the latest MariaDB 10.1.41-MariaDB.
      This is only happening on one server while we have more with the same system package versions.

      The database is freezing and does not accept new connections.
      The error_log shows so much error data eg:

      InnoDB: Warning: a long semaphore wait:
      --Thread 140300680931072 has waited at dict0dict.cc line 984 for 241.00 seconds the semaphore:
      Mutex at 0x7f9e26c112e8 '&dict_sys->mutex', lock var 1
      Last time reserved by thread 140300697716480 in file not yet reserved line 0, waiters flag 1
      InnoDB: Warning: semaphore wait:
      --Thread 140300680931072 has waited at dict0dict.cc line 984 for 241.00 seconds the semaphore:
      Mutex at 0x7f9e26c112e8 '&dict_sys->mutex', lock var 1
      Last time reserved by thread 140300697716480 in file not yet reserved line 0, waiters flag 1
      

      We can provide more error log data but not in a public.

      Attachments

        Issue Links

          Activity

            mleich Matthias Leich added a comment - - edited

            Results of RQG testing on bb-10.2-thiru commit 0b91f74906c8dcbcc1dac486fcc66c1e9c0c603a
            - > 1500 RQG tests were executed
            There was some surprising low fraction of failing tests.
            All asserts/crashes are already covered by open bugs in JIRA except one
            - mysqld: sql/sql_list.h:684: void ilink::assert_linked(): Assertion `prev != 0 && next != 0' failed.
              happening during shutdown of the server
            - per Thiru: Unlikely that its caused by the changes in bb-10.3-thiru
            - occuring only once ==  Attempts to replay that on actual 10.2 have a too low chance
            https://jira.mariadb.org/browse/MDEV-20843
            

            mleich Matthias Leich added a comment - - edited Results of RQG testing on bb-10.2-thiru commit 0b91f74906c8dcbcc1dac486fcc66c1e9c0c603a - > 1500 RQG tests were executed There was some surprising low fraction of failing tests. All asserts/crashes are already covered by open bugs in JIRA except one - mysqld: sql/sql_list.h:684: void ilink::assert_linked(): Assertion `prev != 0 && next != 0' failed. happening during shutdown of the server - per Thiru: Unlikely that its caused by the changes in bb-10.3-thiru - occuring only once == Attempts to replay that on actual 10.2 have a too low chance https://jira.mariadb.org/browse/MDEV-20843

            This is a welcome step to the right direction, but I think that this needs some more work.

            First of all, the in_queue should not be stored in a bit-field that is shared with other bit-fields that are protected by a different mutex.

            I would suggest to use bool, and to document the possible state transitions carefully. We might consider using atomic memory access.

            Second, in 10.1, fts_optimize_init() is not adding tables to the queue, while in 10.2 it is doing that. I’d like to see a 10.1 patch that does this. It should also avoid the unnecessary use of std::vector.

            Third, fts_optimize_remove_table() should assert !table->fts->in_queue in the end.

            marko Marko Mäkelä added a comment - This is a welcome step to the right direction, but I think that this needs some more work. First of all, the in_queue should not be stored in a bit-field that is shared with other bit-fields that are protected by a different mutex. I would suggest to use bool , and to document the possible state transitions carefully. We might consider using atomic memory access. Second, in 10.1, fts_optimize_init() is not adding tables to the queue, while in 10.2 it is doing that. I’d like to see a 10.1 patch that does this. It should also avoid the unnecessary use of std::vector . Third, fts_optimize_remove_table() should assert !table->fts->in_queue in the end.

            At the end of fts_optimize_remove_table(), the fts_optimize_wq->mutex acquisition and release around the debug assertion should be inside ut_d(), to avoid unnecessary operations on the release build.

            I saw a redundant sync_table = mem_heap_alloc(…) call whose result was immediately overwritten by {{sync_table=table;}

            In fts_optimize_new_table() the assignment slot->running = false is redundant because of a preceding memset() call.

            If fts_slots can be accessed by multiple threads, then we should extend some mutex hold time. It could be that it is only being accessed by a single thread.

            Should we call fts_init_index() already on ha_innobase::open()? Otherwise, it seems that FTS-indexed columns could be updated before any fulltext search is performed (and ha_innobase::ft_init_ext() is called). Could that lead to some updates being missed by the fulltext indexes?

            Finally, please check the following for differences in white-space or comments, and try to fix those:

            diff -I^@@ <(git show origin/bb-10.1-thiru storage/innobase) <(git show origin/bb-10.1-thiru storage/xtradb/)
            git show origin/bb-10.2-thiru|diff -^@@ - <(git show origin/bb-10.1-thiru storage/innobase)
            

            marko Marko Mäkelä added a comment - At the end of fts_optimize_remove_table() , the fts_optimize_wq->mutex acquisition and release around the debug assertion should be inside ut_d() , to avoid unnecessary operations on the release build. I saw a redundant sync_table = mem_heap_alloc(…) call whose result was immediately overwritten by {{sync_table=table;} In fts_optimize_new_table() the assignment slot->running = false is redundant because of a preceding memset() call. If fts_slots can be accessed by multiple threads, then we should extend some mutex hold time. It could be that it is only being accessed by a single thread. Should we call fts_init_index() already on ha_innobase::open() ? Otherwise, it seems that FTS-indexed columns could be updated before any fulltext search is performed (and ha_innobase::ft_init_ext() is called). Could that lead to some updates being missed by the fulltext indexes? Finally, please check the following for differences in white-space or comments, and try to fix those: diff -I^@@ <(git show origin/bb-10.1-thiru storage/innobase) <(git show origin/bb-10.1-thiru storage/xtradb/) git show origin/bb-10.2-thiru|diff -^@@ - <(git show origin/bb-10.1-thiru storage/innobase)

            Thanks, this looks OK. I made a suggestion to declare fts_optimize_wq) without static scope, to avoid having to add trivial non-inline accessor functions.

            marko Marko Mäkelä added a comment - Thanks, this looks OK. I made a suggestion to declare fts_optimize_wq ) without static scope, to avoid having to add trivial non- inline accessor functions.

            I tested the tree bb-10.2-thiru commit ce813ca178e499ab2171978bf0140537cb9ca612 which contains
            patches for the current MDEV.
            There were no asserts/crashes which do not occur in actual
            10.2 commit 28098420317bc2efe082df799c917babde879242
            too.
            So from my point of view the MDEV-20621 patch is ok.
            

            mleich Matthias Leich added a comment - I tested the tree bb-10.2-thiru commit ce813ca178e499ab2171978bf0140537cb9ca612 which contains patches for the current MDEV. There were no asserts/crashes which do not occur in actual 10.2 commit 28098420317bc2efe082df799c917babde879242 too. So from my point of view the MDEV-20621 patch is ok.

            People

              thiru Thirunarayanan Balathandayuthapani
              Novkovski Stevo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.