Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13512

buf_flush_update_zip_checksum() corrupts SPATIAL INDEX in ROW_FORMAT=COMPRESSED tables

Details

    • 10.2.10

    Description

      10.1 1ff65c271ff3edef5ccba72141047fda19170163

      2017-08-13 20:28:43 7f970a88fb00  InnoDB: Assertion failure in thread 140286693538560 in file page0zip.cc line 3035
      InnoDB: Failing assertion: !memcmp(FIL_PAGE_TYPE + page, FIL_PAGE_TYPE + page_zip->data, PAGE_HEADER - FIL_PAGE_TYPE)
       
      #5  0x00007f97085b53fa in abort () from /lib/x86_64-linux-gnu/libc.so.6
      #6  0x00007f9701baa253 in page_zip_decompress (page_zip=0x7f96ff3c0eb0, page=0x7f96ff6f4000 "\f%", all=0) at /data/src/10.1/storage/innobase/page/page0zip.cc:3033
      #7  0x00007f9701b945de in page_cur_insert_rec_zip (cursor=0x7f970a88ca68, index=0x7f96f24dc878, rec=0x7f96f240fa00 "\200", offsets=0x7f96f240fa38, mtr=0x7f970a88cae0) at /data/src/10.1/storage/innobase/page/page0cur.cc:1427
      #8  0x00007f97019f88a3 in page_cur_tuple_insert (cursor=0x7f970a88ca68, tuple=0x7f96f24dc5f8, index=0x7f96f24dc878, offsets=0x7f970a88ca20, heap=0x7f970a88ca30, n_ext=0, mtr=0x7f970a88cae0) at /data/src/10.1/storage/innobase/include/page0cur.ic:288
      #9  0x00007f97019fc7e0 in btr_cur_optimistic_insert (flags=0, cursor=0x7f970a88ca60, offsets=0x7f970a88ca20, heap=0x7f970a88ca30, entry=0x7f96f24dc5f8, rec=0x7f970a88ca38, big_rec=0x7f970a88ca28, n_ext=0, thr=0x7f96f2457b68, mtr=0x7f970a88cae0) at /data/src/10.1/storage/innobase/btr/btr0cur.cc:1478
      #10 0x00007f9701be9a52 in row_ins_clust_index_entry_low (flags=0, mode=2, index=0x7f96f24dc878, n_uniq=1, entry=0x7f96f24dc5f8, n_ext=0, thr=0x7f96f2457b68) at /data/src/10.1/storage/innobase/row/row0ins.cc:2518
      #11 0x00007f9701beaac7 in row_ins_clust_index_entry (index=0x7f96f24dc878, entry=0x7f96f24dc5f8, thr=0x7f96f2457b68, n_ext=0) at /data/src/10.1/storage/innobase/row/row0ins.cc:2929
      #12 0x00007f9701beadd5 in row_ins_index_entry (index=0x7f96f24dc878, entry=0x7f96f24dc5f8, thr=0x7f96f2457b68) at /data/src/10.1/storage/innobase/row/row0ins.cc:3027
      #13 0x00007f9701beb098 in row_ins_index_entry_step (node=0x7f96f2457780, thr=0x7f96f2457b68) at /data/src/10.1/storage/innobase/row/row0ins.cc:3104
      #14 0x00007f9701beb3ab in row_ins (node=0x7f96f2457780, thr=0x7f96f2457b68) at /data/src/10.1/storage/innobase/row/row0ins.cc:3244
      #15 0x00007f9701beb712 in row_ins_step (thr=0x7f96f2457b68) at /data/src/10.1/storage/innobase/row/row0ins.cc:3369
      #16 0x00007f9701c02b99 in row_insert_for_mysql (mysql_rec=0x7f96f2459688 "\242", prebuilt=0x7f96f2457078) at /data/src/10.1/storage/innobase/row/row0mysql.cc:1424
      #17 0x00007f9701af9ad1 in ha_innodb::write_row (this=0x7f96f2499088, record=0x7f96f2459688 "\242") at /data/src/10.1/storage/innobase/handler/ha_innodb.cc:8257
      #18 0x000055aea6b69831 in handler::ha_write_row (this=0x7f96f2499088, buf=0x7f96f2459688 "\242") at /data/src/10.1/sql/handler.cc:5945
      #19 0x000055aea6902690 in write_record (thd=0x7f96fefc9070, table=0x7f96f24b9e70, info=0x7f970a88d910) at /data/src/10.1/sql/sql_insert.cc:1914
      #20 0x000055aea6900244 in mysql_insert (thd=0x7f96fefc9070, table_list=0x7f96f24431a0, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false) at /data/src/10.1/sql/sql_insert.cc:1039
      #21 0x000055aea6924579 in mysql_execute_command (thd=0x7f96fefc9070) at /data/src/10.1/sql/sql_parse.cc:3926
      #22 0x000055aea692ef2b in mysql_parse (thd=0x7f96fefc9070, rawbuf=0x7f96f2443088 "INSERT INTO t1 (pk, i2) VALUES (NULL, -1013317632)", length=50, parser_state=0x7f970a88e5e0) at /data/src/10.1/sql/sql_parse.cc:7344
      #23 0x000055aea691de8a in dispatch_command (command=COM_QUERY, thd=0x7f96fefc9070, packet=0x7f9700f1c071 "INSERT INTO t1 (pk, i2) VALUES (NULL, -1013317632)", packet_length=50) at /data/src/10.1/sql/sql_parse.cc:1484
      #24 0x000055aea691cc27 in do_command (thd=0x7f96fefc9070) at /data/src/10.1/sql/sql_parse.cc:1106
      #25 0x000055aea6a54fa1 in do_handle_one_connection (thd_arg=0x7f96fefc9070) at /data/src/10.1/sql/sql_connect.cc:1349
      #26 0x000055aea6a54d05 in handle_one_connection (arg=0x7f96fefc9070) at /data/src/10.1/sql/sql_connect.cc:1261
      #27 0x000055aea6e0ccf2 in pfs_spawn_thread (arg=0x7f9700ee36f0) at /data/src/10.1/storage/perfschema/pfs.cc:1860
      #28 0x00007f970a51e494 in start_thread (arg=0x7f970a88fb00) at pthread_create.c:333
      #29 0x00007f970866993f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      

      The test case is attached. If it doesn't fail right away, try running with --repeat=N.

      The test is still rather long, but reducing it further seems to decrease the probability of failure.
      Please note there is a server restart in the middle of the test, after initial creation/population of the table. I couldn't reproduce the failure without it.

      Also reproducible on 10.3.

      Attachments

        Issue Links

          Activity

            jplindst Jan Lindström (Inactive) added a comment - https://github.com/MariaDB/server/commit/9b998c18283c8acd4ed3d8e6df503fe861af04f0

            I do not understand what the fix is supposed to achieve, apart from skipping the comparison that is failing.

            A design constraint of ROW_FORMAT=COMPRESSED which I designed and implemented in the InnoDB Plugin for MySQL 5.1 is that the compressed and uncompressed index page headers must match at all times. That is, if one of the headers is updated, then the other one must be update as well, while holding X-latch or SX-latch on block->lock. Once the latch is released, the two copies must be equivalent. Something is apparently breaking this constraint, and we must find and fix the cause of this. I do not think that the check should be relaxed without fully understanding the problem.

            I cannot repeat any problem with the included test case (the commit features an slightly extended version of t3.test), either in 10.1 or 10.2. I did verify that page_zip_decompress() is being successfully called after restart and that the data file is encrypted.

            If I could repeat the problem, I could better suggest how to fix this.

            marko Marko Mäkelä added a comment - I do not understand what the fix is supposed to achieve, apart from skipping the comparison that is failing. A design constraint of ROW_FORMAT=COMPRESSED which I designed and implemented in the InnoDB Plugin for MySQL 5.1 is that the compressed and uncompressed index page headers must match at all times. That is, if one of the headers is updated, then the other one must be update as well, while holding X-latch or SX-latch on block->lock. Once the latch is released, the two copies must be equivalent. Something is apparently breaking this constraint, and we must find and fix the cause of this. I do not think that the check should be relaxed without fully understanding the problem. I cannot repeat any problem with the included test case (the commit features an slightly extended version of t3.test ), either in 10.1 or 10.2. I did verify that page_zip_decompress() is being successfully called after restart and that the data file is encrypted. If I could repeat the problem, I could better suggest how to fix this.

            If the mismatch is caused by encryption updating only one of block->frame or block->page.zip.data for the buf_block_t* block of a ROW_FORMAT=COMPRESSED table, then the root cause should be fixed by updating both copies at the same time.

            Note that during buffer pool flushing, both the uncompressed and compressed page must exist in the buffer pool. This is because flushing is protected by buf_block_t::lock. At other times, the page might exist in compressed format only. If it contains changes that must be written to the *.ibd file, then the buf_page_t::state will be BUF_BLOCK_ZIP_DIRTY.

            marko Marko Mäkelä added a comment - If the mismatch is caused by encryption updating only one of block->frame or block->page.zip.data for the buf_block_t* block of a ROW_FORMAT=COMPRESSED table, then the root cause should be fixed by updating both copies at the same time. Note that during buffer pool flushing, both the uncompressed and compressed page must exist in the buffer pool. This is because flushing is protected by buf_block_t::lock. At other times, the page might exist in compressed format only. If it contains changes that must be written to the *.ibd file, then the buf_page_t::state will be BUF_BLOCK_ZIP_DIRTY.

            I found the problem by running the test on perro. On two subsequent runs, the problem occurred on page 4:10, giving some hope of determinism.
            So, I decided to run it inside --manual-gdb. Until the shutdown, there was no problem.
            Before the restart, I set a breakpoint:

            break buf_page_get_gen
            cond 1 space==4&&offset==10
            run
            finish
            disable
            display/x $1->page[24]@14
            display/x $1->page.zip.data[24]@14
            

            At this point, I noticed that both the uncompressed frame and the compressed page contain non-zero bytes. The mismatch that I had observed in the core dumps was that the block->page.zip.data contained zeroes while block->frame contained nonzero bytes.

            So, I started to suspect that the problem is that block->page.zip.data is zeroed out, and not that block->frame is being written with nonzero bytes without updating block->page.zip.data accordingly.

            I set a watchpoint:

            p (unsigned*)&$1->page.zip.data[28]
            watch *$2
            continue
            

            This watchpoint was triggered in a page flush:

            #1  0x000055555616dda7 in buf_flush_update_zip_checksum (
                page=0x7fffed4f8000 "\f%", zip_size=8192, lsn=2195866)
                at /home/mariadb/git/10.1-mdev13512/storage/xtradb/buf/buf0flu.cc:760
            #2  0x000055555616df0c in buf_flush_init_for_writing (
                page=0x7fffed4f4000 "\f%", page_zip_=0x7fffecfbfaf8, newest_lsn=2195866)
                at /home/mariadb/git/10.1-mdev13512/storage/xtradb/buf/buf0flu.cc:804
            #3  0x000055555616e3e3 in buf_flush_write_block_low (bpage=0x7fffecfbfae8, 
                flush_type=BUF_FLUSH_LIST, sync=false)
                at /home/mariadb/git/10.1-mdev13512/storage/xtradb/buf/buf0flu.cc:950
            #4  0x000055555616ea0e in buf_flush_page (buf_pool=0x7ffff0fb0478, 
                bpage=0x7fffecfbfae8, flush_type=BUF_FLUSH_LIST, sync=false)
                at /home/mariadb/git/10.1-mdev13512/storage/xtradb/buf/buf0flu.cc:1130
            #5  0x000055555616efe8 in buf_flush_try_neighbors (space=4, offset=10, 
                flush_type=BUF_FLUSH_LIST, n_flushed=1, n_to_flush=100)
                at /home/mariadb/git/10.1-mdev13512/storage/xtradb/buf/buf0flu.cc:1344
            #6  0x000055555616f308 in buf_flush_page_and_try_neighbors (
                bpage=0x7fffecfbfae8, flush_type=BUF_FLUSH_LIST, n_to_flush=100, 
                count=0x7fffe33feae8)
                at /home/mariadb/git/10.1-mdev13512/storage/xtradb/buf/buf0flu.cc:1433
            #7  0x000055555616fe48 in buf_do_flush_list_batch (buf_pool=0x7ffff0fb0478, 
                min_n=100, lsn_limit=18446744073709551615)
            

            buf_flush_update_zip_checksum() zeroes out the FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION field, which it perhaps should not do?

            	ib_uint32_t	checksum = static_cast<ib_uint32_t>(
            		page_zip_calc_checksum(
            			page, zip_size,
            			static_cast<srv_checksum_algorithm_t>(
            				srv_checksum_algorithm)));
             
            	mach_write_to_8(page + FIL_PAGE_LSN, lsn);
            	memset(page + FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION, 0, 8);
            	mach_write_to_4(page + FIL_PAGE_SPACE_OR_CHKSUM, checksum);
            

            Note that the page here is the compressed page frame (in this case, zip_size is half the srv_page_size aka innodb_page_size).

            Without encryption, this field would always be zero in both the compressed and uncompressed page header.
            Perhaps we should remove this memset() altogether. The field is excluded from the checksum calculation.
            This memset() became redundant with my MySQL Bug #53306 fix which added memset() to fsp_init_file_page_low(). So, it is safe to remove the memset().

            It looks like the memset() must be breaking SPATIAL INDEX in 10.2 for ROW_FORMAT=COMPRESSED, by zeroing out the SSN (split sequence number) field. In MySQL 5.7.5, two memset() were removed from buf0flu.cc in WL#6968 InnoDB GIS: R-tree index support.

            The proper fix is to remove the 2 offending memset() from both InnoDB and XtraDB, rather than to shoot or cripple the messenger (the consistency check).

            diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
            index 31feb322826..dacddfca385 100644
            --- a/storage/innobase/buf/buf0flu.cc
            +++ b/storage/innobase/buf/buf0flu.cc
            @@ -715,7 +715,6 @@ buf_flush_update_zip_checksum(
             				srv_checksum_algorithm)));
             
             	mach_write_to_8(page + FIL_PAGE_LSN, lsn);
            -	memset(page + FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION, 0, 8);
             	mach_write_to_4(page + FIL_PAGE_SPACE_OR_CHKSUM, checksum);
             }
             
            @@ -894,8 +893,6 @@ buf_flush_write_block_low(
             				bpage->newest_modification);
             
             		ut_a(page_zip_verify_checksum(frame, zip_size));
            -
            -		memset(frame + FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION, 0, 8);
             		break;
             	case BUF_BLOCK_FILE_PAGE:
             		frame = bpage->zip.data;
            

            I believe that the reason why I was unable to repeat the bug on my laptop is that there was not enough concurrency. On perro, there are more processor cores and more true concurrency between the page flush and the SQL execution thread.

            I tried to prove my claim that SPATIAL INDEX is broken on ROW_FORMAT=COMPRESSED tables, but we do not have the larger --suite=innodb_gis tests from 5.7 (until MDEV-13626 addresses that), and a modification of innodb.instant_alter in 10.3 to use ROW_FORMAT=COMPRESSED did not trigger any assertion failure. I suppose that the SSN only matters in certain SPATIAL INDEX operations.

            marko Marko Mäkelä added a comment - I found the problem by running the test on perro. On two subsequent runs, the problem occurred on page 4:10, giving some hope of determinism. So, I decided to run it inside --manual-gdb. Until the shutdown, there was no problem. Before the restart, I set a breakpoint: break buf_page_get_gen cond 1 space==4&&offset==10 run finish disable display/x $1->page[24]@14 display/x $1->page.zip.data[24]@14 At this point, I noticed that both the uncompressed frame and the compressed page contain non-zero bytes. The mismatch that I had observed in the core dumps was that the block->page.zip.data contained zeroes while block->frame contained nonzero bytes. So, I started to suspect that the problem is that block->page.zip.data is zeroed out, and not that block->frame is being written with nonzero bytes without updating block->page.zip.data accordingly. I set a watchpoint: p (unsigned*)&$1->page.zip.data[28] watch *$2 continue This watchpoint was triggered in a page flush: #1 0x000055555616dda7 in buf_flush_update_zip_checksum ( page=0x7fffed4f8000 "\f%", zip_size=8192, lsn=2195866) at /home/mariadb/git/10.1-mdev13512/storage/xtradb/buf/buf0flu.cc:760 #2 0x000055555616df0c in buf_flush_init_for_writing ( page=0x7fffed4f4000 "\f%", page_zip_=0x7fffecfbfaf8, newest_lsn=2195866) at /home/mariadb/git/10.1-mdev13512/storage/xtradb/buf/buf0flu.cc:804 #3 0x000055555616e3e3 in buf_flush_write_block_low (bpage=0x7fffecfbfae8, flush_type=BUF_FLUSH_LIST, sync=false) at /home/mariadb/git/10.1-mdev13512/storage/xtradb/buf/buf0flu.cc:950 #4 0x000055555616ea0e in buf_flush_page (buf_pool=0x7ffff0fb0478, bpage=0x7fffecfbfae8, flush_type=BUF_FLUSH_LIST, sync=false) at /home/mariadb/git/10.1-mdev13512/storage/xtradb/buf/buf0flu.cc:1130 #5 0x000055555616efe8 in buf_flush_try_neighbors (space=4, offset=10, flush_type=BUF_FLUSH_LIST, n_flushed=1, n_to_flush=100) at /home/mariadb/git/10.1-mdev13512/storage/xtradb/buf/buf0flu.cc:1344 #6 0x000055555616f308 in buf_flush_page_and_try_neighbors ( bpage=0x7fffecfbfae8, flush_type=BUF_FLUSH_LIST, n_to_flush=100, count=0x7fffe33feae8) at /home/mariadb/git/10.1-mdev13512/storage/xtradb/buf/buf0flu.cc:1433 #7 0x000055555616fe48 in buf_do_flush_list_batch (buf_pool=0x7ffff0fb0478, min_n=100, lsn_limit=18446744073709551615) buf_flush_update_zip_checksum() zeroes out the FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION field, which it perhaps should not do? ib_uint32_t checksum = static_cast <ib_uint32_t>( page_zip_calc_checksum( page, zip_size, static_cast <srv_checksum_algorithm_t>( srv_checksum_algorithm)));   mach_write_to_8(page + FIL_PAGE_LSN, lsn); memset (page + FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION, 0, 8); mach_write_to_4(page + FIL_PAGE_SPACE_OR_CHKSUM, checksum); Note that the page here is the compressed page frame (in this case, zip_size is half the srv_page_size aka innodb_page_size). Without encryption, this field would always be zero in both the compressed and uncompressed page header. Perhaps we should remove this memset() altogether. The field is excluded from the checksum calculation. This memset() became redundant with my MySQL Bug #53306 fix which added memset() to fsp_init_file_page_low() . So, it is safe to remove the memset(). It looks like the memset() must be breaking SPATIAL INDEX in 10.2 for ROW_FORMAT=COMPRESSED, by zeroing out the SSN (split sequence number) field. In MySQL 5.7.5, two memset() were removed from buf0flu.cc in WL#6968 InnoDB GIS: R-tree index support . The proper fix is to remove the 2 offending memset() from both InnoDB and XtraDB, rather than to shoot or cripple the messenger (the consistency check). diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc index 31feb322826..dacddfca385 100644 --- a/storage/innobase/buf/buf0flu.cc +++ b/storage/innobase/buf/buf0flu.cc @@ -715,7 +715,6 @@ buf_flush_update_zip_checksum( srv_checksum_algorithm))); mach_write_to_8(page + FIL_PAGE_LSN, lsn); - memset(page + FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION, 0, 8); mach_write_to_4(page + FIL_PAGE_SPACE_OR_CHKSUM, checksum); } @@ -894,8 +893,6 @@ buf_flush_write_block_low( bpage->newest_modification); ut_a(page_zip_verify_checksum(frame, zip_size)); - - memset(frame + FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION, 0, 8); break; case BUF_BLOCK_FILE_PAGE: frame = bpage->zip.data; I believe that the reason why I was unable to repeat the bug on my laptop is that there was not enough concurrency. On perro, there are more processor cores and more true concurrency between the page flush and the SQL execution thread. I tried to prove my claim that SPATIAL INDEX is broken on ROW_FORMAT=COMPRESSED tables, but we do not have the larger --suite=innodb_gis tests from 5.7 (until MDEV-13626 addresses that), and a modification of innodb.instant_alter in 10.3 to use ROW_FORMAT=COMPRESSED did not trigger any assertion failure. I suppose that the SSN only matters in certain SPATIAL INDEX operations.
            marko Marko Mäkelä added a comment - I removed the offending memset() .

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.