[MDEV-25440] Assertion `cmp_rec_rec(rec, old_rec, offsets, old_offsets, m_index) > 0' failed in PageBulk::insert Created: 2021-04-17 Updated: 2023-11-10 Resolved: 2022-01-26 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Character Sets, Storage Engine - InnoDB |
| Affects Version/s: | 10.2.2, 10.3.0, 10.4.0, 10.5.0, 10.2, 10.3, 10.4, 10.5, 10.6 |
| Fix Version/s: | 10.4.23, 10.5.14, 10.6.6, 10.7.2, 10.8.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Stepanova | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | corruption, regression-10.2 | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Reproducible on 10.2-10.6. |
| Comments |
| Comment by Marko Mäkelä [ 2021-04-19 ] | ||||||||||||||||||||||||||||||||
|
The records that are submitted to comparison are: The assertion fails because the field id in old_rec is larger (2) than in rec (1). We have skip_pk_sort=true, because the table-rebuilding ALTER TABLE can use the pre-existing ordering of PRIMARY KEY(a,id). The ordering was violated already when the second row was inserted. The problem is that two empty CHAR with exactly the same contents are being treated unequal (-32 will be returned) when searching the position for the second INSERT:
Note: We are passing b_length=24 and a_length=8 here. One is in bytes, the other in characters. In ROW_FORMAT=REDUNDANT, InnoDB reserves n*mbmaxlen bytes for each CHAR column. In other ROW_FORMAT, CHAR columns of variable-length encoded character sets occupy n*mbminlen to n*mbmaxlen bytes. The ‘incorrect’ length of 8 (characters, not bytes) was set here:
The problem was introduced by bar in MariaDB Server 10.2.2 by An easy fix would be to reject the creation of indexes on NO_PAD collated CHAR columns. A much more risky and costly fix would be to rewrite large parts of InnoDB to take care of this somehow. It could cause a performance regression, and we might still miss some cases. | ||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-04-19 ] | ||||||||||||||||||||||||||||||||
|
With some advice from bar, I patched cmp_data() so that it would trim the extra padding. This is only a proof of concept of a fix, and my debug assertions would almost certainly fail for other than ROW_FORMAT=REDUNDANT tables. But, the patch made the test case pass. | ||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-04-27 ] | ||||||||||||||||||||||||||||||||
|
I am not sure how to fix this. InnoDB only stores column and index field lengths in bytes.
Unless ROW_FORMAT=REDUNDANT, a CHAR column of a variable-length encoding such as UTF-8 or UTF-16 is actually stored as variable length in InnoDB, occupying n*mbminlen to n*mbmaxlen bytes. For example, the CHAR(3) CHARACTER SET utf8mb3 column value 'äh' would be stored in exactly 3 bytes, using 2 bytes for the first character and 1 byte for the next one. Trailing spaces will be trimmed in order to reach the minimum column value of 3*1 bytes. At the low level, such as cmp_data_data(), we only have the column lengths in bytes, not in characters at all. How are we supposed to compare CHAR columns in a NO PAD collation in this case? If one of the CHAR(3) strings is 'äh' and another is 'ah ' (note the trailing space), which comparison should be invoked? | ||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2021-04-27 ] | ||||||||||||||||||||||||||||||||
|
What is 'ah '? Shouldn't it be 'ah' followed by seven spaces, to make nine bytes total, where nine is CHAR(3)*3, where the last 3 is mbmaxlen for utf8mb3? This is what the Field::ptr representation provides. | ||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2021-04-27 ] | ||||||||||||||||||||||||||||||||
|
Sorry, I reread your comment and understood. It's an InnoDB specific representation which is different from Field::ptr representation. | ||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2021-04-27 ] | ||||||||||||||||||||||||||||||||
|
The function added by In the currently | ||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-04-27 ] | ||||||||||||||||||||||||||||||||
|
bar, thank you, and I am sorry for not making this clear in the beginning.
At that point, we do know the length of the full column in bytes:
But, in rem0cmp.cc we would currently not have that information available. We would have to add an additional parameter to many functions, which would not only be a large change but could also hurt performance. At that level, we basically only know the length of both fields in bytes, the collation code, and that the column is CHAR (mtype==DATA_MYSQL for CHAR encodings and collations other than latin1_swedish_ci). For InnoDB, it would be easier if we had something that would override the NO PAD attribute of the collation, and just compare the two CHAR index fields as they would be compared without NO PAD. As far as I understand, CHAR should always be logically space-padded (hence NO PAD should make no difference), and we would never compare VARCHAR to CHAR inside InnoDB. | ||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2022-01-20 ] | ||||||||||||||||||||||||||||||||
|
server part is ok as per | ||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-01-20 ] | ||||||||||||||||||||||||||||||||
|
I think that the InnoDB part can be best covered by Random Query Generator, possibly tweaked to increase the use of nopad collations. | ||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2022-01-21 ] | ||||||||||||||||||||||||||||||||
|
Testing bb-10.4- | ||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2022-01-24 ] | ||||||||||||||||||||||||||||||||
|
The patches in https://github.com/mariadb/server/tree/bb-10.4-MDEV-25440 are OK to push. | ||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2022-01-26 ] | ||||||||||||||||||||||||||||||||
|
OK to push, no directly related issues observed during testing. | ||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-01-26 ] | ||||||||||||||||||||||||||||||||
|
This was not fixed in the 10.2 or 10.3 series, because the fix depends on | ||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2022-02-08 ] | ||||||||||||||||||||||||||||||||
|
Additional testing was done on the second iteration of the patch as requested by Marko. | ||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2022-02-08 ] | ||||||||||||||||||||||||||||||||
|
Logged |