[MCOL-4966] 2-argument CRC32 call upon Columnstore table returns a wrong value Created: 2022-01-16  Updated: 2022-02-25  Resolved: 2022-02-15

Status: Closed
Project: MariaDB ColumnStore
Component/s: PrimProc
Affects Version/s: N/A
Fix Version/s: 6.3.1

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Roman
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-27208 Implement 2-ary CRC32() and the CRC32... Closed

 Description   

preview-10.8-MDEV-27265-misc 6e615c62b8

MariaDB [test]> create table tc (a int unsigned, b varchar(8)) engine=Columnstore;
Query OK, 0 rows affected (0.374 sec)
 
MariaDB [test]> insert into tc values (1,'foo');
Query OK, 1 row affected (0.224 sec)
 
MariaDB [test]> select a, b, crc32(a,b), crc32(a,'foo'), crc32(1,b), crc32(1,'foo') from tc;
+------+------+------------+----------------+------------+----------------+
| a    | b    | crc32(a,b) | crc32(a,'foo') | crc32(1,b) | crc32(1,'foo') |
+------+------+------------+----------------+------------+----------------+
|    1 | foo  | 2212294583 |     2212294583 | 2212294583 |     2377191190 |
+------+------+------------+----------------+------------+----------------+
1 row in set (0.076 sec)

So, whichever argument of the function is read from the table, the result ends up to be wrong.

With a one-argument version it seems to work all right:

MariaDB [test]> select a, b, crc32(b), crc32('foo') from tc;
+------+------+------------+--------------+
| a    | b    | crc32(b)   | crc32('foo') |
+------+------+------------+--------------+
|    1 | foo  | 2356372769 |   2356372769 |
+------+------+------------+--------------+
1 row in set (0.011 sec)

With a two-argument function and a more conventional engine it also works all right:

MariaDB [test]> create table ti (a int unsigned, b varchar(8)) engine=InnoDB;
Query OK, 0 rows affected (0.012 sec)
 
MariaDB [test]> insert into ti values (1,'foo');
Query OK, 1 row affected (0.002 sec)
 
MariaDB [test]> select a, b, crc32(a,b), crc32(a,'foo'), crc32(1,b), crc32(1,'foo') from ti;
+------+------+------------+----------------+------------+----------------+
| a    | b    | crc32(a,b) | crc32(a,'foo') | crc32(1,b) | crc32(1,'foo') |
+------+------+------------+----------------+------------+----------------+
|    1 | foo  | 2377191190 |     2377191190 | 2377191190 |     2377191190 |
+------+------+------------+----------------+------------+----------------+
1 row in set (0.000 sec)

CRC32C function is not a subject of this report, because it's rejected as unsupported by Columnstore right away:

MariaDB [test]> select crc32c(a,b) from tc;
ERROR 1178 (42000): The storage engine for the table doesn't support MCS-1001: Function 'crc32c' isn't supported.



 Comments   
Comment by Marko Mäkelä [ 2022-01-17 ]

ColumnStore is reimplementing a unary crc32() function in utils/funcexp/func_crc32.cpp. Internally, it is using an implementation in utils/winport/crc.c that would be compatible with a binary function but not accelerated with any SIMD extensions (MDEV-22749):

unsigned int idb_crc32(const unsigned int crc, const unsigned char* buf, const size_t len)

Comment by Marko Mäkelä [ 2022-01-17 ]

The ColumnStore implementation of crc32() appears to assume that the top-level SQL parser only accepts a unary variant of that function:

int64_t Func_crc32::getIntVal(rowgroup::Row& row,
                              FunctionParm& parm,
                              bool& isNull,
                              CalpontSystemCatalog::ColType& ct)
{
    const string& val = parm[0]->data()->getStrVal(row, isNull);
    return (int64_t) crc32(0L, (unsigned char*)val.c_str(), strlen(val.c_str()));
}

Furthermore, instead of invoking the std::string member functions data() and size(), the implementation is forcing the buffer to be NUL-terminated (by invoking c_str()) and then assuming that the output will terminate at the first NUL byte, by invoking strlen(). This should mean that if you invoke crc32() on a column that contains embedded NUL bytes (such as UTF-16 encoded or binary data), ColumnStore should return a different result than other storage engines.

Generated at Thu Feb 08 02:54:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.