Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21165

Server gets crash on checksum CRC32 is 0

Details

    Description

      Recently we get a issue which lead database get crash when query a specific row. We believe it is a MariaDB bug.

      What we see

      2019-11-26  5:11:45 1636 [Note] InnoDB: Uncompressed page, stored checksum in field1 0, calculated checksums for field1: crc32 0, innodb 2221428917,  none 3735928559, page type 17855 == INDEX. stored checksum in field2 0, calculated checksums for field2: crc32 0, innodb 1038010250, none 3735928559,  page LSN 224 1636471998, low 4 bytes of LSN at page end 1636471998, page number (if stored to page already) 7072818, space id (if created with >= MySQL-4.1.1 and stored already) 0
      

      We confirmed that the page is not empty and has right data by hexdump like tool.

      *Analysis *
      1. In buf0checksum.cc,

      uint32_t buf_calc_page_crc32(const byte* page) {
      	return ut_crc32(page + FIL_PAGE_OFFSET,
      			FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION
      			- FIL_PAGE_OFFSET)
      		^ ut_crc32(page + FIL_PAGE_DATA,
      			   srv_page_size
      			   - (FIL_PAGE_DATA + FIL_PAGE_END_LSN_OLD_CHKSUM));
      }
      #endif
      

      Which means header_crc32 xor content_crc32. Which means it can get 0. In our scenario, we get a 0 value on the page.

      On buf0buf.cc function buf_page_is_corrupted

      	/* A page filled with NUL bytes is considered not corrupted.
      	The FIL_PAGE_FILE_FLUSH_LSN field may be written nonzero for
      	the first page of each file of the system tablespace.
      	Ignore it for the system tablespace. */
      	if (!checksum_field1 && !checksum_field2) {
      		ulint i = 0;
      		do {
      			if (read_buf[i]) {
      				return true;
      			}
      		} while (++i < FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION);
      		do {
      			if (read_buf[i]) {
      				return true;
      			}
      		} while (++i < srv_page_size);
      		return false;
      	}
      

      Which when the checksum value is 0, it will think it is crash.
      So on the stack

      15 0000002a`44afa8f0 00000024`805fd984 mysqld!ib::fatal::~fatal+0x41 [storage\innobase\ut\ut0ut.cc @ 792] 
      16 0000002a`44afa950 00000024`80732bc0 mysqld!buf_page_io_complete+0x804 [storage\innobase\buf\buf0buf.cc @ 6065] 
      17 0000002a`44afb070 00000024`80732426 mysqld!buf_read_page_low+0x250 [storage\innobase\buf\buf0rea.cc @ 215] 
      18 0000002a`44afb1f0 00000024`805fb3a3 mysqld!buf_read_page+0x46 [storage\innobase\buf\buf0rea.cc @ 439] 
      19 0000002a`44afb330 00000024`806237e2 mysqld!buf_page_get_gen+0xe73 [storage\innobase\buf\buf0buf.cc @ 4352] 
      1a 0000002a`44afb690 00000024`8070163d mysqld!btr_cur_search_to_nth_level+0x882 [storage\innobase\btr\btr0cur.cc @ 1249] 
      1b (Inline Function) --------`-------- mysqld!btr_pcur_open_with_no_init_func+0x60 [storage\innobase\include\btr0pcur.ic @ 525] 
      1c 0000002a`44afc740 00000024`806fe57f mysqld!row_sel_get_clust_rec_for_mysql+0x10d [storage\innobase\row\row0sel.cc @ 3360] 
      1d 0000002a`44afc8d0 00000024`805d9a2d mysqld!row_search_mvcc+0x180f [storage\innobase\row\row0sel.cc @ 5339] 
      1e 0000002a`44afd710 00000024`80526280 mysqld!ha_innobase::general_fetch+0x6d [storage\innobase\handler\ha_innodb.cc @ 9793] 
      1f 0000002a`44afd760 00000024`80361deb mysqld!handler::ha_index_next_same+0x90 [sql\handler.cc @ 2843] 
      20 0000002a`44afd7f0 00000024`803745b5 mysqld!join_read_next_same+0x2b [sql\sql_select.cc @ 19639] 
      21 0000002a`44afd820 00000024`80359f35 mysqld!sub_select+0x1f5 [sql\sql_select.cc @ 18841] 
      22 0000002a`44afd870 00000024`8035c78b mysqld!do_select+0x255 [sql\sql_select.cc @ 18366] 
      23 0000002a`44afd8d0 00000024`8035c13f mysqld!JOIN::exec_inner+0x5fb [sql\sql_select.cc @ 3628] 
      24 0000002a`44afd940 00000024`80367d0f mysqld!JOIN::exec+0x3f [sql\sql_select.cc @ 3422] 
      25 0000002a`44afd970 00000024`8035ff6d mysqld!mysql_select+0x2df [sql\sql_select.cc @ 3823] 
      26 0000002a`44afda10 00000024`804a294d mysqld!handle_select+0x10d [sql\sql_select.cc @ 365] 
      27 0000002a`44afdaa0 00000024`804a5ccc mysqld!execute_sqlcom_select+0x2fd [sql\sql_parse.cc @ 6347] 
      28 0000002a`44afdf60 00000024`804a9cb0 mysqld!mysql_execute_command+0xe6c [sql\sql_parse.cc @ 3571] 
      29 0000002a`44afea80 00000024`804a0a45 mysqld!mysql_parse+0x190 [sql\sql_parse.cc @ 7940] 
      2a 0000002a`44afeae0 00000024`804a1dc0 mysqld!dispatch_command+0xb55 [sql\sql_parse.cc @ 1855] 
      2b 0000002a`44affad0 00000024`805a8bc3 mysqld!do_command+0x1b0 [sql\sql_parse.cc @ 1395] 
      2c 0000002a`44affb40 00000024`805a8d60 mysqld!threadpool_process_request+0x53 [sql\threadpool_common.cc @ 382] 
      2d 0000002a`44affb70 00000024`83f39574 mysqld!tp_callback+0x70 [sql\threadpool_common.cc @ 195] 
      

      it crash.

      Attachments

        Issue Links

          Activity

            A similar bug was handled in MDEV-19978 and the fix applied to 10.2.26 and 10.3.17 already.
            shuodl, would it be possible for you to share the page dump? If not, I think that we should be able to construct one where the two CRC-32C checksums are identical.

            Side note: MDEV-12026 (MariaDB 10.4) introduced a better file format (innodb_checksum_algorithm=full_crc32) that computes checksums over the entire page contents.

            marko Marko Mäkelä added a comment - A similar bug was handled in MDEV-19978 and the fix applied to 10.2.26 and 10.3.17 already. shuodl , would it be possible for you to share the page dump? If not, I think that we should be able to construct one where the two CRC-32C checksums are identical. Side note: MDEV-12026 (MariaDB 10.4) introduced a better file format ( innodb_checksum_algorithm=full_crc32 ) that computes checksums over the entire page contents.

            shuodl, I appreciate your observation that the reason why innodb_checksum_algorithm=crc32 can result in a checksum value of 0 is that it is computing an exclusive or of two CRC-32C checksums.

            The value of a CRC (using any primitive polynomial in a binary Galois field) can only be 0 if the input is all 0. That is, unless the buffer is all zero, its CRC should not be 0.

            While fixing this bug, I think that we should check whether the MDEV-19978 fix could be skipped for innodb_checksum_algorithm=full_crc32 in MariaDB Server 10.4 and later.

            marko Marko Mäkelä added a comment - shuodl , I appreciate your observation that the reason why innodb_checksum_algorithm=crc32 can result in a checksum value of 0 is that it is computing an exclusive or of two CRC-32C checksums. The value of a CRC (using any primitive polynomial in a binary Galois field) can only be 0 if the input is all 0. That is, unless the buffer is all zero, its CRC should not be 0. While fixing this bug, I think that we should check whether the MDEV-19978 fix could be skipped for innodb_checksum_algorithm=full_crc32 in MariaDB Server 10.4 and later.

            I performed further analysis of this report. The intentionally crashing ib::fatal() call ought to be from the following code in buf_page_io_complete():

            				if (bpage->id.space() == TRX_SYS_SPACE) {
            					ib::fatal() << "Aborting because of"
            						" a corrupt database page.";
            				}
            

            That was reported as line number 6065. But the line number of the ib::fatal() call is 6110 in the reported affected version 10.2.29!

            The quoted code in buf_page_is_corrupted() was corrected in the fix of MDEV-19978 (10.1.41, 10.2.26, 10.3.17, 10.4.7) and broken earlier in MDEV-12711 (10.1.39, 10.2.24, 10.3.14, 10.4.4).

            Let me try to find the correct affectedVersion. It ought to be one of 10.2.24, 10.2.25, 10.3.14, 10.3.15, 10.3.16.
            Of these, only 10.2.25 matches (the line 6065 is the next statement after the above code). So, this report actually looks like a duplicate of MDEV-19978. I have corrected the affected version.

            marko Marko Mäkelä added a comment - I performed further analysis of this report. The intentionally crashing ib::fatal() call ought to be from the following code in buf_page_io_complete() : if (bpage->id.space() == TRX_SYS_SPACE) { ib::fatal() << "Aborting because of" " a corrupt database page." ; } That was reported as line number 6065. But the line number of the ib::fatal() call is 6110 in the reported affected version 10.2.29! The quoted code in buf_page_is_corrupted() was corrected in the fix of MDEV-19978 (10.1.41, 10.2.26, 10.3.17, 10.4.7) and broken earlier in MDEV-12711 (10.1.39, 10.2.24, 10.3.14, 10.4.4). Let me try to find the correct affectedVersion. It ought to be one of 10.2.24, 10.2.25, 10.3.14, 10.3.15, 10.3.16. Of these, only 10.2.25 matches (the line 6065 is the next statement after the above code). So, this report actually looks like a duplicate of MDEV-19978 . I have corrected the affected version.

            People

              marko Marko Mäkelä
              shuodl Shuode Li
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.