[MDEV-32578] row_merge_fts_doc_tokenize() handles FTS plugin parser inconsistently Created: 2023-10-25 Updated: 2023-10-29 Resolved: 2023-10-27 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1, 11.2 |
| Fix Version/s: | 10.4.32, 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3, 11.2.2 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | upstream-fix | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
The recent release of MySQL 8.0.35 includes the following change: When a tokenizer plugin interface was added to MySQL 5.7, fts_tokenize_ctx::processed_len got a second meaning, which is only partly implemented in row_merge_fts_doc_tokenize(). |
| Comments |
| Comment by Marko Mäkelä [ 2023-10-26 ] | ||||||||||||||||
|
I found a rather simple test case that will crash MySQL 8.0.34 but not MySQL 5.7.43. The code has been refactored between MySQL 5.7 and 8.0. I will need to investigate what exactly has changed between the two versions. | ||||||||||||||||
| Comment by Marko Mäkelä [ 2023-10-27 ] | ||||||||||||||||
|
In MySQL 5.7.44 with or without the fix, the only way I can reproduce the crash is the following patch:
But, this will crash also when the fix is present:
In MySQL 8.0, the buffer size calculation is quite different. ddl::Context::scan_buffer_size will allocate a buffer by dividing innodb_sort_buffer_size (default: 1MiB) by the number of threads (2 in this case) and index partitions (hard-coded as 6 in the file format). These 54613 bytes will then be passed on to key_buffer.m_buffer_size. It seems that the intention was to round this up to some multiple of 4096 bytes, but that did not happen. Whatever I tried, I am unable to reproduce a crash in MySQL 5.7 with an SQL test case that crashes MySQL 8.0. I think that I must more or less apply the MySQL 5.7 fix without adding a test case. An additional challenge would be that we do not actually have an n-gram tokenizer in MariaDB (MDEV-10267). We only have tests for a simple_parser. |