Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21269

Parallel merging of fts index rebuild fails

Details

    Description

      While issuing ALTER on a InnoDB table with FTS MariaDB breaks with a segmentation fault:

      kernel: mysqld[812]: segfault at 0 ip 00005649f0b31ac8 sp 00007febffffd0b0 error 6 in mysqld[5649efe3e000+12c9000]

      MariaDB [test]> optimize table t;
      -----------------------------------------------------------------------------------------+

      Table Op Msg_type Msg_text

      -----------------------------------------------------------------------------------------+

      test.t optimize note Table does not support optimize, doing recreate + analyze instead
      test.t optimize status OK

      -----------------------------------------------------------------------------------------+
      2 rows in set (10.312 sec)

      MariaDB [test]> optimize table t;
      -----------------------------------------------------------------------------------------+

      Table Op Msg_type Msg_text

      -----------------------------------------------------------------------------------------+

      test.t optimize note Table does not support optimize, doing recreate + analyze instead
      test.t optimize status OK

      -----------------------------------------------------------------------------------------+
      2 rows in set (6.379 sec)

      MariaDB [test]> optimize table t;
      ERROR 2013 (HY000): Lost connection to MySQL server during query
      MariaDB [test]> optimize table t;
      ERROR 2013 (HY000): Lost connection to MySQL server during query
      MariaDB [test]> optimize table t;
      ERROR 2013 (HY000): Lost connection to MySQL server during query

      We managed to replicate this after around 3 consecutive calls of ALTER TABLE t ENGINE=InnoDB; after the initial data import.

      If we remove the FTS index there are no more issues.
      It seems the issue appears on FTS index merge, recreate.

      I've attached to this ticket the backtrace, MariaDB log, configuration and sample data used to replicate the issue.

      If you require any additional info, just let me know.

      Attachments

        1. backtrace.txt
          79 kB
          Ovidiu Stanila
        2. mariadb.log
          4 kB
          Ovidiu Stanila
        3. my.cnf
          1 kB
          Ovidiu Stanila
        4. t.sql.gz
          5.16 MB
          Ovidiu Stanila

        Issue Links

          Activity

            Relevant part of backtrace.txt:

            mariadb-10.3.20

            #2  0x00005649f05d1600 in handle_fatal_signal (sig=11) at /usr/src/debug/MariaDB-10.3.20/src_0/sql/signal_handler.cc:339
                    curr_time = 1575902970
                    tm = {tm_sec = 30, tm_min = 49, tm_hour = 0, tm_mday = 10, tm_mon = 11, tm_year = 119, tm_wday = 2, tm_yday = 343, tm_isdst = 0, tm_gmtoff = 36000, tm_zone = 0x5649f3088f90 "AEST"}
                    print_invalid_query_pointer = false
            #3  <signal handler called>
            No symbol table info available.
            #4  0x00005649f080ca86 in row_fts_merge_insert (index=<optimized out>, table=<optimized out>, psort_info=<optimized out>, id=<optimized out>)
                at /usr/src/debug/MariaDB-10.3.20/src_0/storage/innobase/row/row0ftsort.cc:1757
                    dtuple = <optimized out>
                    n_ext = 4
                    min_rec = 1684628227
                    last_doc_id = 0
                    aux_index = <optimized out>
                    new_word = {text = {f_str = 0x0, f_len = 0, f_n_char = 0}, nodes = 0x7fec0c0454c0}
                    ins_ctx = {charset = 0x5649f14b7720 <my_charset_utf8_general_ci>, heap = 0x7fec0c000c60, opt_doc_id_size = 1, btr_bulk = 0x7fec0c0473f0, tuple = 0x7fec0c0456f0}
                    count_diag = <optimized out>
                    space = <optimized out>
                    i = <optimized out>
                    aux_table = <optimized out>
                    positions = <optimized out>
                    fts_table = {type = FTS_INDEX_TABLE, table_id = 362, index_id = 561, suffix = 0x5649f0cfe65d "INDEX_4", table = 0x7fec20032bf0, charset = 0x0}
                    count = 0
                    error = DB_SUCCESS
                    start = <optimized out>
                    aux_table_name = "test/FTS_", '0' <repeats 13 times>, "16a_", '0' <repeats 13 times>, "231_INDEX_4\000\000\000\000\000\000\020\353\377\377\353\177\000\000\060\355\377\377\353\177\000\000p\375\377\377\353\177\000\000\000\354\377\377\353\177\000\000\240\201\274\214\354\177\000\000\000\367\377\377\353\177\000\000\004\035\222\211\354\177\000\000\370\354\377\377\353\177\000\000\000\355\377\377\353\177\000\000\000\000\000\000\000\000\000\000\b\355\377\377\353\177", '\000' <repeats 18 times>, "\060\355\377\377\353\177", '\000' <repeats 42 times>...
                    trx = 0x7fec84e632e8
            #5  0x00005649f080d889 in fts_parallel_merge (arg=0x7fec17c1ed08) at /usr/src/debug/MariaDB-10.3.20/src_0/storage/innobase/row/row0ftsort.cc:1111
                    psort_info = 0x7fec17c1ed08
                    id = <optimized out>
            #6  0x00007fec8cbc1e65 in start_thread () from /lib64/libpthread.so.0
            No symbol table info available.
            

                                    min_rec = sel_tree[0];
             
                                    if (min_rec ==  -1) {
            

            Could it be that we have fts_sort_pll_degree <= 2 (so that row_fts_build_sel_tree() did nothing) and sel_tree==NULL for some reason? The minimum value of innodb_ft_sort_pll_degree appears to be 1, and the default is 2.

            marko Marko Mäkelä added a comment - Relevant part of backtrace.txt : mariadb-10.3.20 #2 0x00005649f05d1600 in handle_fatal_signal (sig=11) at /usr/src/debug/MariaDB-10.3.20/src_0/sql/signal_handler.cc:339 curr_time = 1575902970 tm = {tm_sec = 30, tm_min = 49, tm_hour = 0, tm_mday = 10, tm_mon = 11, tm_year = 119, tm_wday = 2, tm_yday = 343, tm_isdst = 0, tm_gmtoff = 36000, tm_zone = 0x5649f3088f90 "AEST"} print_invalid_query_pointer = false #3 <signal handler called> No symbol table info available. #4 0x00005649f080ca86 in row_fts_merge_insert (index=<optimized out>, table=<optimized out>, psort_info=<optimized out>, id=<optimized out>) at /usr/src/debug/MariaDB-10.3.20/src_0/storage/innobase/row/row0ftsort.cc:1757 dtuple = <optimized out> n_ext = 4 min_rec = 1684628227 last_doc_id = 0 aux_index = <optimized out> new_word = {text = {f_str = 0x0, f_len = 0, f_n_char = 0}, nodes = 0x7fec0c0454c0} ins_ctx = {charset = 0x5649f14b7720 <my_charset_utf8_general_ci>, heap = 0x7fec0c000c60, opt_doc_id_size = 1, btr_bulk = 0x7fec0c0473f0, tuple = 0x7fec0c0456f0} count_diag = <optimized out> space = <optimized out> i = <optimized out> aux_table = <optimized out> positions = <optimized out> fts_table = {type = FTS_INDEX_TABLE, table_id = 362, index_id = 561, suffix = 0x5649f0cfe65d "INDEX_4", table = 0x7fec20032bf0, charset = 0x0} count = 0 error = DB_SUCCESS start = <optimized out> aux_table_name = "test/FTS_", '0' <repeats 13 times>, "16a_", '0' <repeats 13 times>, "231_INDEX_4\000\000\000\000\000\000\020\353\377\377\353\177\000\000\060\355\377\377\353\177\000\000p\375\377\377\353\177\000\000\000\354\377\377\353\177\000\000\240\201\274\214\354\177\000\000\000\367\377\377\353\177\000\000\004\035\222\211\354\177\000\000\370\354\377\377\353\177\000\000\000\355\377\377\353\177\000\000\000\000\000\000\000\000\000\000\b\355\377\377\353\177", '\000' <repeats 18 times>, "\060\355\377\377\353\177", '\000' <repeats 42 times>... trx = 0x7fec84e632e8 #5 0x00005649f080d889 in fts_parallel_merge (arg=0x7fec17c1ed08) at /usr/src/debug/MariaDB-10.3.20/src_0/storage/innobase/row/row0ftsort.cc:1111 psort_info = 0x7fec17c1ed08 id = <optimized out> #6 0x00007fec8cbc1e65 in start_thread () from /lib64/libpthread.so.0 No symbol table info available. min_rec = sel_tree[0];   if (min_rec == -1) { Could it be that we have fts_sort_pll_degree <= 2 (so that row_fts_build_sel_tree() did nothing) and sel_tree==NULL for some reason? The minimum value of innodb_ft_sort_pll_degree appears to be 1, and the default is 2.

            I can repeat the issue in 10.3.20 in debug build as well as release build.

            thiru Thirunarayanan Balathandayuthapani added a comment - I can repeat the issue in 10.3.20 in debug build as well as release build.

            The following patch solves the issue:

            diff --git a/storage/innobase/row/row0ftsort.cc b/storage/innobase/row/row0ftsort.cc
            index ec65f295e7f..874854c2da9 100644
            --- a/storage/innobase/row/row0ftsort.cc
            +++ b/storage/innobase/row/row0ftsort.cc
            @@ -1528,10 +1528,11 @@ row_fts_build_sel_tree(
                            sel_tree[i + start] = int(i);
                    }
             
            -       for (i = treelevel; --i; ) {
            +       i = treelevel;
            +       do {
                            row_fts_build_sel_tree_level(
            -                       sel_tree, i, mrec, offsets, index);
            -       }
            +                       sel_tree, --i, mrec, offsets, index);
            +       } while (i > 0);
             
                    return(treelevel);
             }
            

            thiru Thirunarayanan Balathandayuthapani added a comment - The following patch solves the issue: diff --git a/storage/innobase/row/row0ftsort.cc b/storage/innobase/row/row0ftsort.cc index ec65f295e7f..874854c2da9 100644 --- a/storage/innobase/row/row0ftsort.cc +++ b/storage/innobase/row/row0ftsort.cc @@ -1528,10 +1528,11 @@ row_fts_build_sel_tree( sel_tree[i + start] = int(i); } - for (i = treelevel; --i; ) { + i = treelevel; + do { row_fts_build_sel_tree_level( - sel_tree, i, mrec, offsets, index); - } + sel_tree, --i, mrec, offsets, index); + } while (i > 0); return(treelevel); }

            Pushed the patch with the reason in bb-10.3-MDEV-21269

            thiru Thirunarayanan Balathandayuthapani added a comment - Pushed the patch with the reason in bb-10.3- MDEV-21269

            I see that that the code was broken in MariaDB 10.3.7 by me, in a preparatory effort for MDEV-21907 that was finalized in 10.5. I had failed to notice (and regression tests failed to cover) that the original loop termination condition i >= 0 was relying on the variable i being signed.

            marko Marko Mäkelä added a comment - I see that that the code was broken in MariaDB 10.3.7 by me, in a preparatory effort for MDEV-21907 that was finalized in 10.5. I had failed to notice (and regression tests failed to cover) that the original loop termination condition i >= 0 was relying on the variable i being signed.

            People

              thiru Thirunarayanan Balathandayuthapani
              ovidiu.stanila Ovidiu Stanila
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.