Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32578

row_merge_fts_doc_tokenize() handles FTS plugin parser inconsistently

Details

    Description

      The recent release of MySQL 8.0.35 includes the following change:
      Bug#35432973 InnoDB: processing single character tokens with FTS parser plugin

      When a tokenizer plugin interface was added to MySQL 5.7, fts_tokenize_ctx::processed_len got a second meaning, which is only partly implemented in row_merge_fts_doc_tokenize().

      Attachments

        Issue Links

          Activity

            I found a rather simple test case that will crash MySQL 8.0.34 but not MySQL 5.7.43. The code has been refactored between MySQL 5.7 and 8.0. I will need to investigate what exactly has changed between the two versions.

            marko Marko Mäkelä added a comment - I found a rather simple test case that will crash MySQL 8.0.34 but not MySQL 5.7.43. The code has been refactored between MySQL 5.7 and 8.0. I will need to investigate what exactly has changed between the two versions.

            In MySQL 5.7.44 with or without the fix, the only way I can reproduce the crash is the following patch:

            diff --git a/storage/innobase/row/row0ftsort.cc b/storage/innobase/row/row0ftsort.cc
            index 36ce6eb6cca..03149b68a21 100644
            --- a/storage/innobase/row/row0ftsort.cc
            +++ b/storage/innobase/row/row0ftsort.cc
            @@ -468,7 +468,7 @@ row_merge_fts_doc_tokenize(
             	row_merge_buf_t* buf;
             	dfield_t*	field;
             	fts_string_t	t_str;
            -	ibool		buf_full = FALSE;
            +	ibool		buf_full = TRUE;
             	byte		str_buf[FTS_MAX_WORD_LEN + 1];
             	ulint		data_size[FTS_NUM_AUX_INDEX];
             	ulint		n_tuple[FTS_NUM_AUX_INDEX];
            

            But, this will crash also when the fix is present:

            mysql-5.7.44 with the above patch

            2023-10-27 11:28:35 0x7f4c437fe6c0  InnoDB: Assertion failure in thread 139965526697664 in file row0ftsort.cc line 837
            InnoDB: Failing assertion: t_ctx.rows_added[t_ctx.buf_used]
            

            In MySQL 8.0, the buffer size calculation is quite different. ddl::Context::scan_buffer_size will allocate a buffer by dividing innodb_sort_buffer_size (default: 1MiB) by the number of threads (2 in this case) and index partitions (hard-coded as 6 in the file format). These 54613 bytes will then be passed on to key_buffer.m_buffer_size. It seems that the intention was to round this up to some multiple of 4096 bytes, but that did not happen.

            Whatever I tried, I am unable to reproduce a crash in MySQL 5.7 with an SQL test case that crashes MySQL 8.0. I think that I must more or less apply the MySQL 5.7 fix without adding a test case. An additional challenge would be that we do not actually have an n-gram tokenizer in MariaDB (MDEV-10267). We only have tests for a simple_parser.

            marko Marko Mäkelä added a comment - In MySQL 5.7.44 with or without the fix , the only way I can reproduce the crash is the following patch: diff --git a/storage/innobase/row/row0ftsort.cc b/storage/innobase/row/row0ftsort.cc index 36ce6eb6cca..03149b68a21 100644 --- a/storage/innobase/row/row0ftsort.cc +++ b/storage/innobase/row/row0ftsort.cc @@ -468,7 +468,7 @@ row_merge_fts_doc_tokenize( row_merge_buf_t* buf; dfield_t* field; fts_string_t t_str; - ibool buf_full = FALSE; + ibool buf_full = TRUE; byte str_buf[FTS_MAX_WORD_LEN + 1]; ulint data_size[FTS_NUM_AUX_INDEX]; ulint n_tuple[FTS_NUM_AUX_INDEX]; But, this will crash also when the fix is present: mysql-5.7.44 with the above patch 2023-10-27 11:28:35 0x7f4c437fe6c0 InnoDB: Assertion failure in thread 139965526697664 in file row0ftsort.cc line 837 InnoDB: Failing assertion: t_ctx.rows_added[t_ctx.buf_used] In MySQL 8.0, the buffer size calculation is quite different. ddl::Context::scan_buffer_size will allocate a buffer by dividing innodb_sort_buffer_size (default: 1MiB) by the number of threads (2 in this case) and index partitions (hard-coded as 6 in the file format). These 54613 bytes will then be passed on to key_buffer.m_buffer_size . It seems that the intention was to round this up to some multiple of 4096 bytes, but that did not happen. Whatever I tried, I am unable to reproduce a crash in MySQL 5.7 with an SQL test case that crashes MySQL 8.0. I think that I must more or less apply the MySQL 5.7 fix without adding a test case. An additional challenge would be that we do not actually have an n-gram tokenizer in MariaDB ( MDEV-10267 ). We only have tests for a simple_parser .

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.