[MDEV-21269] Parallel merging of fts index rebuild fails Created: 2019-12-10  Updated: 2020-05-18  Resolved: 2020-05-18

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3.7, 10.4.0, 10.3.20, 10.5.0
Fix Version/s: 10.5.4, 10.3.24, 10.4.14

Type: Bug Priority: Critical
Reporter: Ovidiu Stanila Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: fulltext, innodb
Environment:

CentOS 7.7.1908 - kernel 3.10.0-1062.7.1.el7.x86_64


Attachments: Text File backtrace.txt     File mariadb.log     File my.cnf     File t.sql.gz    
Issue Links:
Relates
relates to MDEV-21907 Enable -Wconversion for InnoDB and Ma... Closed

 Description   

While issuing ALTER on a InnoDB table with FTS MariaDB breaks with a segmentation fault:

kernel: mysqld[812]: segfault at 0 ip 00005649f0b31ac8 sp 00007febffffd0b0 error 6 in mysqld[5649efe3e000+12c9000]

MariaDB [test]> optimize table t;
-----------------------------------------------------------------------------------------+

Table Op Msg_type Msg_text

-----------------------------------------------------------------------------------------+

test.t optimize note Table does not support optimize, doing recreate + analyze instead
test.t optimize status OK

-----------------------------------------------------------------------------------------+
2 rows in set (10.312 sec)

MariaDB [test]> optimize table t;
-----------------------------------------------------------------------------------------+

Table Op Msg_type Msg_text

-----------------------------------------------------------------------------------------+

test.t optimize note Table does not support optimize, doing recreate + analyze instead
test.t optimize status OK

-----------------------------------------------------------------------------------------+
2 rows in set (6.379 sec)

MariaDB [test]> optimize table t;
ERROR 2013 (HY000): Lost connection to MySQL server during query
MariaDB [test]> optimize table t;
ERROR 2013 (HY000): Lost connection to MySQL server during query
MariaDB [test]> optimize table t;
ERROR 2013 (HY000): Lost connection to MySQL server during query

We managed to replicate this after around 3 consecutive calls of ALTER TABLE t ENGINE=InnoDB; after the initial data import.

If we remove the FTS index there are no more issues.
It seems the issue appears on FTS index merge, recreate.

I've attached to this ticket the backtrace, MariaDB log, configuration and sample data used to replicate the issue.

If you require any additional info, just let me know.



 Comments   
Comment by Marko Mäkelä [ 2019-12-10 ]

Relevant part of backtrace.txt:

mariadb-10.3.20

#2  0x00005649f05d1600 in handle_fatal_signal (sig=11) at /usr/src/debug/MariaDB-10.3.20/src_0/sql/signal_handler.cc:339
        curr_time = 1575902970
        tm = {tm_sec = 30, tm_min = 49, tm_hour = 0, tm_mday = 10, tm_mon = 11, tm_year = 119, tm_wday = 2, tm_yday = 343, tm_isdst = 0, tm_gmtoff = 36000, tm_zone = 0x5649f3088f90 "AEST"}
        print_invalid_query_pointer = false
#3  <signal handler called>
No symbol table info available.
#4  0x00005649f080ca86 in row_fts_merge_insert (index=<optimized out>, table=<optimized out>, psort_info=<optimized out>, id=<optimized out>)
    at /usr/src/debug/MariaDB-10.3.20/src_0/storage/innobase/row/row0ftsort.cc:1757
        dtuple = <optimized out>
        n_ext = 4
        min_rec = 1684628227
        last_doc_id = 0
        aux_index = <optimized out>
        new_word = {text = {f_str = 0x0, f_len = 0, f_n_char = 0}, nodes = 0x7fec0c0454c0}
        ins_ctx = {charset = 0x5649f14b7720 <my_charset_utf8_general_ci>, heap = 0x7fec0c000c60, opt_doc_id_size = 1, btr_bulk = 0x7fec0c0473f0, tuple = 0x7fec0c0456f0}
        count_diag = <optimized out>
        space = <optimized out>
        i = <optimized out>
        aux_table = <optimized out>
        positions = <optimized out>
        fts_table = {type = FTS_INDEX_TABLE, table_id = 362, index_id = 561, suffix = 0x5649f0cfe65d "INDEX_4", table = 0x7fec20032bf0, charset = 0x0}
        count = 0
        error = DB_SUCCESS
        start = <optimized out>
        aux_table_name = "test/FTS_", '0' <repeats 13 times>, "16a_", '0' <repeats 13 times>, "231_INDEX_4\000\000\000\000\000\000\020\353\377\377\353\177\000\000\060\355\377\377\353\177\000\000p\375\377\377\353\177\000\000\000\354\377\377\353\177\000\000\240\201\274\214\354\177\000\000\000\367\377\377\353\177\000\000\004\035\222\211\354\177\000\000\370\354\377\377\353\177\000\000\000\355\377\377\353\177\000\000\000\000\000\000\000\000\000\000\b\355\377\377\353\177", '\000' <repeats 18 times>, "\060\355\377\377\353\177", '\000' <repeats 42 times>...
        trx = 0x7fec84e632e8
#5  0x00005649f080d889 in fts_parallel_merge (arg=0x7fec17c1ed08) at /usr/src/debug/MariaDB-10.3.20/src_0/storage/innobase/row/row0ftsort.cc:1111
        psort_info = 0x7fec17c1ed08
        id = <optimized out>
#6  0x00007fec8cbc1e65 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.

                        min_rec = sel_tree[0];
 
                        if (min_rec ==  -1) {

Could it be that we have fts_sort_pll_degree <= 2 (so that row_fts_build_sel_tree() did nothing) and sel_tree==NULL for some reason? The minimum value of innodb_ft_sort_pll_degree appears to be 1, and the default is 2.

Comment by Thirunarayanan Balathandayuthapani [ 2020-05-15 ]

I can repeat the issue in 10.3.20 in debug build as well as release build.

Comment by Thirunarayanan Balathandayuthapani [ 2020-05-16 ]

The following patch solves the issue:

diff --git a/storage/innobase/row/row0ftsort.cc b/storage/innobase/row/row0ftsort.cc
index ec65f295e7f..874854c2da9 100644
--- a/storage/innobase/row/row0ftsort.cc
+++ b/storage/innobase/row/row0ftsort.cc
@@ -1528,10 +1528,11 @@ row_fts_build_sel_tree(
                sel_tree[i + start] = int(i);
        }
 
-       for (i = treelevel; --i; ) {
+       i = treelevel;
+       do {
                row_fts_build_sel_tree_level(
-                       sel_tree, i, mrec, offsets, index);
-       }
+                       sel_tree, --i, mrec, offsets, index);
+       } while (i > 0);
 
        return(treelevel);
 }

Comment by Thirunarayanan Balathandayuthapani [ 2020-05-17 ]

Pushed the patch with the reason in bb-10.3-MDEV-21269

Comment by Marko Mäkelä [ 2020-05-18 ]

I see that that the code was broken in MariaDB 10.3.7 by me, in a preparatory effort for MDEV-21907 that was finalized in 10.5. I had failed to notice (and regression tests failed to cover) that the original loop termination condition i >= 0 was relying on the variable i being signed.

Generated at Thu Feb 08 09:05:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.