[MDEV-24797] Column Compression - ERROR 1265 (01000): Data truncated for column Created: 2021-02-06 Updated: 2023-08-15 Resolved: 2023-08-15 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Data Definition - Alter Table, Data Manipulation - Insert |
| Affects Version/s: | 10.3, 10.4, 10.5, 10.6, 10.7 |
| Fix Version/s: | 10.4.32, 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | Nuno | Assignee: | Alexander Barkov |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Description |
|
Hello, Testing MariaDB 10.5, and first time implementing column compression as part of upgrade. Converting normal columns to COMPRESSED columns seem to work ok. However, when I want to do an ALTER TABLE later, it appears that I have problems with the compressed columns, even if I'm not changing them. Example: MariaDB [mydb]> ALTER TABLE `mytable` Query OK, 3708285 rows affected (1 min 8.067 sec) MariaDB [mydb]> ALTER TABLE `mytable` ERROR 1265 (01000): Data truncated for column 'question' at row 8362 MariaDB [mydb]> ALTER TABLE `mytable` ERROR 1265 (01000): Data truncated for column 'question' at row 8362 – Any ideas why? |
| Comments |
| Comment by Nuno [ 2021-02-06 ] | ||||||||||||||||||||||||
|
marko greenman serg GeoffMontee | ||||||||||||||||||||||||
| Comment by Rick James [ 2021-08-02 ] | ||||||||||||||||||||||||
|
What was the text that was truncated? In experiments (with other compression techniques), I have found that text strings shorter than 10 bytes expanded instead of shrinking. Would any "expansion" cause the error? If so, I suggest there is a design flaw in the overall design. | ||||||||||||||||||||||||
| Comment by Nuno [ 2021-08-02 ] | ||||||||||||||||||||||||
No idea... what is the "row 8362" anyway? Also, I didn't change the column length. It was a VARCHAR(500), and I kept it as a VARCHAR(500), but with the COMPRESSED option. As I didn't want to risk my data, I gave up on using this feature. | ||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-08-17 ] | ||||||||||||||||||||||||
|
nunop, given that the ALTER TABLE operation ought to have been executed with ALGORITHM=COPY, I think that you could have extracted the data for "row 8362" simply by something like
I would expect this bug to be repeatable with a table containing only 1 row (and possibly only 1 column), either on INSERT or on ALTER TABLE. This is because the Can you try to produce a repeatable test case? | ||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-08-17 ] | ||||||||||||||||||||||||
|
rjasdfiii, I investigated a little to check if your suggestion about expansion is plausible. In zlib, there is the function compressBound() that returns an upper bound for data expansion. I do not see it being invoked in the column compression code. There is both a constant overhead (I remember that zlib can store a minimal or a larger header) and a variable overhead. Information theory says that n bits of arbitrary input have n bits of entropy, and thus can represent 2^n different values. If a compression algorithm can encode some of those 2^n values in less than n bits each, it must necessarily expand the representation of some values. If some input is "uncompressible", the compression algorithm would have to encode something like "n bits of literal data follows" followed by the n bits of uncompressed payload. I had a quick look at the implementation and I cannot see anything obviously wrong in the function compress_zlib(). The stream.avail_out seems to correctly set the maximum output length. If the data does not fit, the function should return 0, which I suppose means that the column will be stored uncompressed. I think that we definitely need a test case before we can fix this. | ||||||||||||||||||||||||
| Comment by Nuno [ 2021-08-18 ] | ||||||||||||||||||||||||
I just tried on my snapshot database. Altered the "question" and "col2", then "col2" back to uncompressed, just like the initial example.
Note that "question" is a VARCHAR(500). | ||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-08-19 ] | ||||||||||||||||||||||||
|
nunop, thank you. Can you try to produce a self-contained test case? My hypothesis is that this should be repeatable with only that "row 693" present in the table. I hope that my use of LIMIT and OFFSET are correct. For example, it could be off by one, depending on whether the first row is counted as row number 0 or 1 in each context. I would expect the self-contained test case to be something like this (with all the … replaced):
| ||||||||||||||||||||||||
| Comment by Nuno [ 2021-09-23 ] | ||||||||||||||||||||||||
|
Just tried this again and got the error on the same line (693).
> ERROR 1265 (01000): Data truncated for column 'question' at row 693 I tried this too:
So I can see more rows around 693.
> ERROR 1265 (01000): Data truncated for column 'question' at row 3 I'll come back to you soon. | ||||||||||||||||||||||||
| Comment by Nuno [ 2021-09-23 ] | ||||||||||||||||||||||||
|
Ok, so this gives me (one of) the trouble row:
Has to be N-1. It has 129 characters, so there's no reason for it to be "truncated". So I did the test in another way:
And I get the same: > #1265 - Data truncated for column 'question' at row 1 Even converting all to "xxx" does it:
> #1265 - Data truncated for column 'question' at row 1 (you can use the last one as a test to try to reproduce on your side) MariaDB 10.5.12 on CentOS 7. | ||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-10-25 ] | ||||||||||||||||||||||||
|
Here is a simpler test case:
The warning is being issued in:
The problem appears to be an incorrect condition in do_varstring2_mb() that does not take column compression into account:
Note: copy->to_length is 1503 and from_length (apparently, the length of the compressed column) is 8 in this case. It was 40 for nunop’s example. | ||||||||||||||||||||||||
| Comment by Rick James [ 2022-01-20 ] | ||||||||||||||||||||||||
|
Another theory: Could it be that zlib (or the wrapper around it) is returning 0 when the string cannot be compressed? That is, the (500) is not really what it is testing against. Meanwhile, `REPEAT('x',127)` is highly compressible, hence not useful for borderline tests. Note that .pdf and .jpg files are already compressed, thereby making them good candidates for 'large' test cases that are likely to expand when compressed. | ||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-09-02 ] | ||||||||||||||||||||||||
|
For the record, zlib provides the function compressBound(sourceLen) that gives the worst-case compressed size (the expanded size) corresponding to an input data length sourceLen. In |