[MDEV-27208] Implement 2-ary CRC32() and the CRC32C() function Created: 2021-12-09 Updated: 2022-01-22 Resolved: 2022-01-21 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Server |
| Fix Version/s: | 10.8.1 |
| Type: | Task | Priority: | Blocker |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | checksum | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Description |
|
The SQL parser defines the unary function crc32() that computes the CRC-32 of a string using the ISO 3309 polynomial that is being used by zlib and many others. Often, CRC is computed in pieces. To faciliate this, we introduce an optional second parameter: crc32('MariaDB') is equal to crc32(crc32('Maria','DB')). InnoDB files use a different polynomial, which is used by the special instructions that the Intel Nehalem microarchitecture introduced in SSE4.2. This is commonly called CRC-32C. It would be very convenient to introduce an SQL function crc32c() that would compute CRC-32C checksums. Then we could could define simple SQL function that would generate a logically empty InnoDB redo log corresponding to a particular checkpoint LSN. Starting with |
| Comments |
| Comment by Marko Mäkelä [ 2021-12-09 ] | ||||||||||||||||||||
|
Please review bb-10.8-MDEV-27208. | ||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-12-10 ] | ||||||||||||||||||||
|
I forgot to mention that also MyRocks uses CRC-32C. In a table of popular CRC polynomials, the column "Maximum bits of payload by Hamming distance" looks better for CRC-32C than the ISO 3309 CRC-32 polynomial. That might be the motivation why Intel chose to implement an "incompatible" polynomial back in SSE 4.2. In modern processors, we calculate both types of CRC using more generic SIMD instructions, such as carry-less multiplication. Both CRC polynomials need to exist because of file format compatibility. | ||||||||||||||||||||
| Comment by Sergei Golubchik [ 2021-12-12 ] | ||||||||||||||||||||
|
How do you plan to use this function to generate a logically empty InnoDB redo log? | ||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-12-13 ] | ||||||||||||||||||||
|
At the minimum, the ib_logfile0 must consist of 3 sections:
Each of these sections must carry a valid CRC-32C checksum. I was expecting that a binary file could be generated in the following style:
I now see that it is not so simple. In the output file, I see that each NUL octet has been replaced with the sequence \0. Also, I was unable to figure out how to convert a numeric expression (other than a numeric hexadecimal literal) into binary. Sure, the data could be output in hexadecimal format, something like this, but then something like the Perl print pack("H*",…) would be needed to convert it to binary, and in that case, we might also compute the checksum in Perl:
| ||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-12-14 ] | ||||||||||||||||||||
|
While our regression test suite is written in Perl, I was hoping that it would be possible to generate binary data files using SQL, because not every user is using a LAMP stack with P=Perl. In
It would be ideal if the same could be achieved by SELECT…INTO OUTFILE 'ib_logfile0'. | ||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-12-14 ] | ||||||||||||||||||||
|
Later, I will try to create an SQL version of the above Perl using appropriate constructs:
| ||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-12-15 ] | ||||||||||||||||||||
|
Here is the SQL to create a logically empty log file in the
| ||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-01-05 ] | ||||||||||||||||||||
|
serg, based on your review feedback I reversed the arguments of the 2-ary functions. When a previous checksum is specified, it must be the first and not the second argument:
This would return the same value 809606978 three times. | ||||||||||||||||||||
| Comment by Elena Stepanova [ 2022-01-21 ] | ||||||||||||||||||||
|
According to GCOV, there are a few missing lines in the coverage:
These are minor omissions, but maybe it makes sense to add queries for them, for completeness.
should do it. For the other two errors, I can't see at the first glance how to get there, but I suppose whoever wrote it would know right away. I have no objections against merging the new variations of functions into 10.8 main and releasing with 10.8.1. The above note about MTR tests is not a mandatory requirement. The functions don't work with Columnstore ( | ||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-01-21 ] | ||||||||||||||||||||
|
The handling for the error ER_WRONG_PARAMETERS_TO_NATIVE_FCT is redundant and unreachable:
I will replace those redundant error checks with assertions and include test cases to exercise this. |