Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27208

Implement 2-ary CRC32() and the CRC32C() function

Details

    Description

      The SQL parser defines the unary function crc32() that computes the CRC-32 of a string using the ISO 3309 polynomial that is being used by zlib and many others.

      Often, CRC is computed in pieces. To faciliate this, we introduce an optional second parameter: crc32('MariaDB') is equal to crc32(crc32('Maria','DB')).

      InnoDB files use a different polynomial, which is used by the special instructions that the Intel Nehalem microarchitecture introduced in SSE4.2. This is commonly called CRC-32C.

      It would be very convenient to introduce an SQL function crc32c() that would compute CRC-32C checksums. Then we could could define simple SQL function that would generate a logically empty InnoDB redo log corresponding to a particular checkpoint LSN. Starting with MDEV-14425 and MDEV-27199, InnoDB would refuse normal startup if the redo log file was deleted.

      Attachments

        Issue Links

          Activity

            Please review bb-10.8-MDEV-27208.

            marko Marko Mäkelä added a comment - Please review bb-10.8-MDEV-27208 .

            I forgot to mention that also MyRocks uses CRC-32C.

            In a table of popular CRC polynomials, the column "Maximum bits of payload by Hamming distance" looks better for CRC-32C than the ISO 3309 CRC-32 polynomial. That might be the motivation why Intel chose to implement an "incompatible" polynomial back in SSE 4.2.

            In modern processors, we calculate both types of CRC using more generic SIMD instructions, such as carry-less multiplication. Both CRC polynomials need to exist because of file format compatibility.

            marko Marko Mäkelä added a comment - I forgot to mention that also MyRocks uses CRC-32C. In a table of popular CRC polynomials , the column "Maximum bits of payload by Hamming distance" looks better for CRC-32C than the ISO 3309 CRC-32 polynomial. That might be the motivation why Intel chose to implement an "incompatible" polynomial back in SSE 4.2. In modern processors, we calculate both types of CRC using more generic SIMD instructions, such as carry-less multiplication. Both CRC polynomials need to exist because of file format compatibility.

            How do you plan to use this function to generate a logically empty InnoDB redo log?

            serg Sergei Golubchik added a comment - How do you plan to use this function to generate a logically empty InnoDB redo log?

            At the minimum, the ib_logfile0 must consist of 3 sections:

            • 512-byte header that identifies the log format
            • a checkpoint block that includes the checkpoint LSN and points to the start of the log
            • a log block (or with the format change in MDEV-14425, a log record) that is consistent with the checkpoint block

            Each of these sections must carry a valid CRC-32C checksum.

            I was expecting that a binary file could be generated in the following style:

            SET @lsn:=_binary 0x0000000000123456;
            SELECT
            CONCAT(@a:=CONCAT(_binary 0xfa0000,@lsn),_binary 0x01,HEX(CRC32C(@a)))
            INTO OUTFILE 'ib_logfile0';
            

            I now see that it is not so simple. In the output file, I see that each NUL octet has been replaced with the sequence \0. Also, I was unable to figure out how to convert a numeric expression (other than a numeric hexadecimal literal) into binary.

            Sure, the data could be output in hexadecimal format, something like this, but then something like the Perl print pack("H*",…) would be needed to convert it to binary, and in that case, we might also compute the checksum in Perl:

            SELECT
            CONCAT(HEX(@a:=CONCAT(_binary 0xfa0000,@lsn)),'01',HEX(CRC32C(@a)));
            

            marko Marko Mäkelä added a comment - At the minimum, the ib_logfile0 must consist of 3 sections: 512-byte header that identifies the log format a checkpoint block that includes the checkpoint LSN and points to the start of the log a log block (or with the format change in MDEV-14425 , a log record) that is consistent with the checkpoint block Each of these sections must carry a valid CRC-32C checksum. I was expecting that a binary file could be generated in the following style: SET @lsn:=_binary 0x0000000000123456; SELECT CONCAT(@a:=CONCAT(_binary 0xfa0000,@lsn),_binary 0x01,HEX(CRC32C(@a))) INTO OUTFILE 'ib_logfile0' ; I now see that it is not so simple. In the output file, I see that each NUL octet has been replaced with the sequence \0. Also, I was unable to figure out how to convert a numeric expression (other than a numeric hexadecimal literal) into binary. Sure, the data could be output in hexadecimal format, something like this, but then something like the Perl print pack("H*",…) would be needed to convert it to binary, and in that case, we might also compute the checksum in Perl: SELECT CONCAT(HEX(@a:=CONCAT(_binary 0xfa0000,@lsn)), '01' ,HEX(CRC32C(@a)));

            While our regression test suite is written in Perl, I was hoping that it would be possible to generate binary data files using SQL, because not every user is using a LAMP stack with P=Perl. In MDEV-14425, I am rewriting the test mariabackup.huge_lsn to generate a minimal log file using Perl. This is how it currently looks like:

            do "$ENV{MTR_SUITE_DIR}/../innodb/include/crc32.pl";
            my $file= "$ENV{MYSQLD_DATADIR}/ib_logfile0";
            open(FILE, ">", $file) or die "Unable to open $file\n";
            binmode FILE;
            # the desired log sequence number, plus 16
            my $lsn_hi=4096,$lsn_lo=0xfffffe00;
            my $polynomial = 0x82f63b78; # CRC-32C
            my ($header, $checkpoint, $log);
            $header = "Phys" . pack("x[4]NN", $lsn_hi, $lsn_lo) .
                      "some Perl code" . pack("x[478]");
            $header .= pack("Nx[3584]", mycrc32($header, 0, $polynomial));
            $checkpoint = pack("NNNNx[44]", $lsn_hi, $lsn_lo, $lsn_hi, $lsn_lo);
            $checkpoint .= pack("Nx[8128]", mycrc32($checkpoint, 0, $polynomial));
            $log = pack("CxxNN", 0xfa, $lsn_hi, $lsn_lo);
            $log .= pack("xN", mycrc32($log, 0, $polynomial));
            print FILE $header, $checkpoint, $log;
            close(FILE) or die "Unable to close $file\n";
            

            It would be ideal if the same could be achieved by SELECT…INTO OUTFILE 'ib_logfile0'.

            marko Marko Mäkelä added a comment - While our regression test suite is written in Perl, I was hoping that it would be possible to generate binary data files using SQL, because not every user is using a LAMP stack with P=Perl. In MDEV-14425 , I am rewriting the test mariabackup.huge_lsn to generate a minimal log file using Perl. This is how it currently looks like: do "$ENV{MTR_SUITE_DIR}/../innodb/include/crc32.pl" ; my $file = "$ENV{MYSQLD_DATADIR}/ib_logfile0" ; open (FILE, ">" , $file ) or die "Unable to open $file\n" ; binmode FILE; # the desired log sequence number, plus 16 my $lsn_hi =4096, $lsn_lo =0xfffffe00; my $polynomial = 0x82f63b78; # CRC-32C my ( $header , $checkpoint , $log ); $header = "Phys" . pack ( "x[4]NN" , $lsn_hi , $lsn_lo ) . "some Perl code" . pack ( "x[478]" ); $header .= pack ( "Nx[3584]" , mycrc32( $header , 0, $polynomial )); $checkpoint = pack ( "NNNNx[44]" , $lsn_hi , $lsn_lo , $lsn_hi , $lsn_lo ); $checkpoint .= pack ( "Nx[8128]" , mycrc32( $checkpoint , 0, $polynomial )); $log = pack ( "CxxNN" , 0xfa, $lsn_hi , $lsn_lo ); $log .= pack ( "xN" , mycrc32( $log , 0, $polynomial )); print FILE $header , $checkpoint , $log ; close (FILE) or die "Unable to close $file\n" ; It would be ideal if the same could be achieved by SELECT…INTO OUTFILE 'ib_logfile0' .

            Later, I will try to create an SQL version of the above Perl using appropriate constructs:

            SELECT …unhex(hex(crc32c(…)))… INTO DUMPFILE 'ib_logfile0';
            

            marko Marko Mäkelä added a comment - Later, I will try to create an SQL version of the above Perl using appropriate constructs: SELECT …unhex(hex(crc32c(…)))… INTO DUMPFILE 'ib_logfile0' ;

            Here is the SQL to create a logically empty log file in the MDEV-14425 format corresponding to the 64-bit log sequence number specified in the first line:

            set @lsn=x'000000000000c96b';
            set @header=concat('Phys',x'00000000',@lsn,repeat(x'00',492));
            set @header=concat(@header,unhex(hex(crc32c(@header))),repeat(x'00',3584));
             
            set @checkpoint=concat(@lsn,@lsn,repeat(x'00',44));
            set @checkpoint=concat(@checkpoint,unhex(hex(crc32c(@checkpoint))),
                                   repeat(x'00',8128));
            set @payload=concat(x'fa0000',@lsn);
            set @payload=concat(@payload,x'01',unhex(hex(crc32c(@payload))));
             
            select concat(@header,@checkpoint,@payload) into dumpfile 'ib_logfile0';
            

            marko Marko Mäkelä added a comment - Here is the SQL to create a logically empty log file in the MDEV-14425 format corresponding to the 64-bit log sequence number specified in the first line: set @lsn=x '000000000000c96b' ; set @header=concat( 'Phys' ,x '00000000' ,@lsn,repeat(x '00' ,492)); set @header=concat(@header,unhex(hex(crc32c(@header))),repeat(x '00' ,3584));   set @ checkpoint =concat(@lsn,@lsn,repeat(x '00' ,44)); set @ checkpoint =concat(@ checkpoint ,unhex(hex(crc32c(@ checkpoint ))), repeat(x '00' ,8128)); set @payload=concat(x 'fa0000' ,@lsn); set @payload=concat(@payload,x '01' ,unhex(hex(crc32c(@payload))));   select concat(@header,@ checkpoint ,@payload) into dumpfile 'ib_logfile0' ;

            serg, based on your review feedback I reversed the arguments of the 2-ary functions. When a previous checksum is specified, it must be the first and not the second argument:

            SELECT CRC32C('MariaDB'),CRC32C(CRC32C('Maria'),'DB'),CRC32C(0,'MariaDB');
            

            This would return the same value 809606978 three times.

            marko Marko Mäkelä added a comment - serg , based on your review feedback I reversed the arguments of the 2-ary functions. When a previous checksum is specified, it must be the first and not the second argument: SELECT CRC32C( 'MariaDB' ),CRC32C(CRC32C( 'Maria' ), 'DB' ),CRC32C(0, 'MariaDB' ); This would return the same value 809606978 three times.
            elenst Elena Stepanova added a comment - - edited

            According to GCOV, there are a few missing lines in the coverage:

            line numbers as of preview-10.8-MDEV-27265-misc 6e615c62b

            ===File sql/item_create.cc:
               3141 : +    my_error(ER_WRONG_PARAMCOUNT_TO_NATIVE_FCT, MYF(0), name->str);
               3142 : +    return nullptr;
               3150 : +    my_error(ER_WRONG_PARAMETERS_TO_NATIVE_FCT, MYF(0), name->str);
               3151 : +    return nullptr;
               3170 : +    my_error(ER_WRONG_PARAMCOUNT_TO_NATIVE_FCT, MYF(0), name->str);
               3171 : +    return nullptr;
               3179 : +    my_error(ER_WRONG_PARAMETERS_TO_NATIVE_FCT, MYF(0), name->str);
               3180 : +    return nullptr;
            

            These are minor omissions, but maybe it makes sense to add queries for them, for completeness.
            3141/3142 and 3170/3171 seem obvious, something like

            --error ER_WRONG_PARAMCOUNT_TO_NATIVE_FCT
            select crc32(1,'foo','bar');
            --error ER_WRONG_PARAMCOUNT_TO_NATIVE_FCT
            select crc32c(1,'foo','bar');
            

            should do it. For the other two errors, I can't see at the first glance how to get there, but I suppose whoever wrote it would know right away.

            I have no objections against merging the new variations of functions into 10.8 main and releasing with 10.8.1. The above note about MTR tests is not a mandatory requirement.
            It concerns the functions themselves, the way they appear to work as built-in functions. How InnoDB uses them to create its log is out of the scope of this task.

            The functions don't work with Columnstore (MCOL-4966 to track), but anyway Columnstore doesn't support everything that the server does, so I don't see it as a blocker. Also the functions lack proper parameter validation, but as a legacy issue concerning other existing functions (MDEV-27480), it cannot block the feature either.

            elenst Elena Stepanova added a comment - - edited According to GCOV, there are a few missing lines in the coverage: line numbers as of preview-10.8-MDEV-27265-misc 6e615c62b ===File sql/item_create.cc: 3141 : + my_error(ER_WRONG_PARAMCOUNT_TO_NATIVE_FCT, MYF(0), name->str); 3142 : + return nullptr; 3150 : + my_error(ER_WRONG_PARAMETERS_TO_NATIVE_FCT, MYF(0), name->str); 3151 : + return nullptr; 3170 : + my_error(ER_WRONG_PARAMCOUNT_TO_NATIVE_FCT, MYF(0), name->str); 3171 : + return nullptr; 3179 : + my_error(ER_WRONG_PARAMETERS_TO_NATIVE_FCT, MYF(0), name->str); 3180 : + return nullptr; These are minor omissions, but maybe it makes sense to add queries for them, for completeness. 3141/3142 and 3170/3171 seem obvious, something like --error ER_WRONG_PARAMCOUNT_TO_NATIVE_FCT select crc32(1, 'foo' , 'bar' ); --error ER_WRONG_PARAMCOUNT_TO_NATIVE_FCT select crc32c(1, 'foo' , 'bar' ); should do it. For the other two errors, I can't see at the first glance how to get there, but I suppose whoever wrote it would know right away. I have no objections against merging the new variations of functions into 10.8 main and releasing with 10.8.1. The above note about MTR tests is not a mandatory requirement. It concerns the functions themselves, the way they appear to work as built-in functions. How InnoDB uses them to create its log is out of the scope of this task. The functions don't work with Columnstore ( MCOL-4966 to track), but anyway Columnstore doesn't support everything that the server does, so I don't see it as a blocker. Also the functions lack proper parameter validation, but as a legacy issue concerning other existing functions ( MDEV-27480 ), it cannot block the feature either.

            The handling for the error ER_WRONG_PARAMETERS_TO_NATIVE_FCT is redundant and unreachable:

            --error ER_WRONG_PARAMETERS_TO_NATIVE_FCT
            select crc32c('' as empty);
            

            #0  my_error (nr=1583, MyFlags=0) at /mariadb/10.8/mysys/my_error.c:109
            #1  0x00005635f0b30506 in Create_native_func::create_func (
                this=0x5635f1ceb478 <Create_func_crc32c::s_singleton>, thd=0x7f8b34000d48, 
                name=0x7f8b4a75e6d0, item_list=0x7f8b34016f20)
                at /mariadb/10.8/sql/item_create.cc:2653
            #2  0x00005635f0a321e0 in MYSQLparse (thd=thd@entry=0x7f8b34000d48)
                at /mariadb/10.8/sql/sql_yacc.yy:10441
            

            Item*
            Create_native_func::create_func(THD *thd, LEX_CSTRING *name, List<Item> *item_list)
            {
              if (unlikely(has_named_parameters(item_list)))
              {
                my_error(ER_WRONG_PARAMETERS_TO_NATIVE_FCT, MYF(0), name->str);
                return NULL;
              }
             
              return create_native(thd, name, item_list);
            }
            

            I will replace those redundant error checks with assertions and include test cases to exercise this.

            marko Marko Mäkelä added a comment - The handling for the error ER_WRONG_PARAMETERS_TO_NATIVE_FCT is redundant and unreachable: --error ER_WRONG_PARAMETERS_TO_NATIVE_FCT select crc32c( '' as empty); #0 my_error (nr=1583, MyFlags=0) at /mariadb/10.8/mysys/my_error.c:109 #1 0x00005635f0b30506 in Create_native_func::create_func ( this=0x5635f1ceb478 <Create_func_crc32c::s_singleton>, thd=0x7f8b34000d48, name=0x7f8b4a75e6d0, item_list=0x7f8b34016f20) at /mariadb/10.8/sql/item_create.cc:2653 #2 0x00005635f0a321e0 in MYSQLparse (thd=thd@entry=0x7f8b34000d48) at /mariadb/10.8/sql/sql_yacc.yy:10441 Item* Create_native_func::create_func(THD *thd, LEX_CSTRING *name, List<Item> *item_list) { if (unlikely(has_named_parameters(item_list))) { my_error(ER_WRONG_PARAMETERS_TO_NATIVE_FCT, MYF(0), name->str); return NULL; }   return create_native(thd, name, item_list); } I will replace those redundant error checks with assertions and include test cases to exercise this.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.