[MCOL-4966] 2-argument CRC32 call upon Columnstore table returns a wrong value Created: 2022-01-16 Updated: 2022-02-25 Resolved: 2022-02-15 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | PrimProc |
| Affects Version/s: | N/A |
| Fix Version/s: | 6.3.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Stepanova | Assignee: | Roman |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
So, whichever argument of the function is read from the table, the result ends up to be wrong. With a one-argument version it seems to work all right:
With a two-argument function and a more conventional engine it also works all right:
CRC32C function is not a subject of this report, because it's rejected as unsupported by Columnstore right away:
|
| Comments |
| Comment by Marko Mäkelä [ 2022-01-17 ] | ||||||||
|
ColumnStore is reimplementing a unary crc32() function in utils/funcexp/func_crc32.cpp. Internally, it is using an implementation in utils/winport/crc.c that would be compatible with a binary function but not accelerated with any SIMD extensions (
| ||||||||
| Comment by Marko Mäkelä [ 2022-01-17 ] | ||||||||
|
The ColumnStore implementation of crc32() appears to assume that the top-level SQL parser only accepts a unary variant of that function:
Furthermore, instead of invoking the std::string member functions data() and size(), the implementation is forcing the buffer to be NUL-terminated (by invoking c_str()) and then assuming that the output will terminate at the first NUL byte, by invoking strlen(). This should mean that if you invoke crc32() on a column that contains embedded NUL bytes (such as UTF-16 encoded or binary data), ColumnStore should return a different result than other storage engines. |