Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13228

Assertion `n < rec_offs_n_fields(offsets)' failed in rec_get_nth_field_offs upon crash recovery with compressed table

Details

    Description

      Note: It might be just a bad assertion failure, as nothing obviously bad seems to be happening on non-debug build, but it needs to be checked at least.

      --source include/have_innodb.inc
       
      CREATE TABLE t1 (pk INT PRIMARY KEY, i INT) ENGINE=InnoDB ROW_FORMAT=COMPRESSED;
      INSERT INTO t1 VALUES  (1, 10);
      UPDATE t1 SET pk = 2 WHERE pk = 1;
       
      --let $shutdown_timeout= 0
      --source include/restart_mysqld.inc
      

      10.2 92f1837a27f4b78a3e6c74ca33c3052211069af5

      2017-07-01 19:39:53 139914949132352 [Note] InnoDB: Highest supported file format is Barracuda.
      2017-07-01 19:39:53 139914949132352 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1633883
      2017-07-01 19:39:53 139914949132352 [Note] InnoDB: Starting final batch to recover 12 pages from redo log.
      mysqld: /data/src/10.2/storage/innobase/include/rem0rec.ic:1078: ulint rec_get_nth_field_offs(const ulint*, ulint, ulint*): Assertion `n < rec_offs_n_fields(offsets)' failed.
      170701 19:39:53 [ERROR] mysqld got signal 6 ;
       
      #7  0x00007f407aa19ee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
      #8  0x0000556a9a6f2666 in rec_get_nth_field_offs (offsets=0x7f406effbfb0, n=2, len=0x7f406effbeb0) at /data/src/10.2/storage/innobase/include/rem0rec.ic:1078
      #9  0x0000556a9a6fcb02 in page_zip_write_trx_id_and_roll_ptr (page_zip=0x7f4074143148, rec=0x7f407442007e "\200", offsets=0x7f406effbfb0, trx_id_col=1, trx_id=1286, roll_ptr=1125899928207632) at /data/src/10.2/storage/innobase/page/page0zip.cc:4185
      #10 0x0000556a9a7b0f08 in row_upd_rec_sys_fields_in_recovery (rec=0x7f407442007e "\200", page_zip=0x7f4074143148, offsets=0x7f406effbfb0, pos=1, trx_id=1286, roll_ptr=1125899928207632) at /data/src/10.2/storage/innobase/row/row0upd.cc:467
      #11 0x0000556a9a86243a in btr_cur_parse_del_mark_set_clust_rec (ptr=0x7f407418ceb5 "", end_ptr=0x7f407418ceb5 "", page=0x7f4074420000 "", page_zip=0x7f4074143148, index=0x7f4058001768) at /data/src/10.2/storage/innobase/btr/btr0cur.cc:4613
      #12 0x0000556a9a6b60d4 in recv_parse_or_apply_log_rec_body (type=MLOG_COMP_REC_CLUST_DELETE_MARK, ptr=0x7f407418cea4 "", end_ptr=0x7f407418ceb5 "", space_id=4, page_no=3, apply=true, block=0x7f4074143120, mtr=0x7f406effc590) at /data/src/10.2/storage/innobase/log/log0recv.cc:1272
      #13 0x0000556a9a6b7b36 in recv_recover_page (just_read_in=true, block=0x7f4074143120) at /data/src/10.2/storage/innobase/log/log0recv.cc:1814
      #14 0x0000556a9a89656e in buf_page_io_complete (bpage=0x7f4074143120, evict=false) at /data/src/10.2/storage/innobase/buf/buf0buf.cc:6021
      #15 0x0000556a9a922b58 in fil_aio_wait (segment=2) at /data/src/10.2/storage/innobase/fil/fil0fil.cc:5481
      #16 0x0000556a9a7cc90b in io_handler_thread (arg=0x556a9be53b90 <n+16>) at /data/src/10.2/storage/innobase/srv/srv0start.cc:343
      #17 0x00007f407c95e494 in start_thread (arg=0x7f406effd700) at pthread_create.c:333
      #18 0x00007f407aad693f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      

      The problem appeared in 10.2 tree with this revision:

      commit c436338d9d535aac7692d27ec1dc068e9ce6c9a9 53235cbb1f0d41654935563633a5c61a2caa9495
      Author: Marko Mäkelä <marko.makela@mariadb.com>
      Date:   Fri Jun 30 18:51:51 2017 +0300
       
          Assert that DB_TRX_ID must be set on delete-marked records
      

      Attachments

        Activity

          elenst Elena Stepanova added a comment - ATTN jplindst

          Sorry, this was a simple off-by-one error. I think that it should affect debug builds only, because the out-of-bounds access occurred inside a debug assertion:

          	ut_ad(field + DATA_TRX_ID_LEN
          	      == rec_get_nth_field(rec, offsets, trx_id_col + 1, &len));
          

          The fix is simple:

          diff --git a/storage/innobase/btr/btr0cur.cc b/storage/innobase/btr/btr0cur.cc
          index a672f451ea9..14b24bcd9fd 100644
          --- a/storage/innobase/btr/btr0cur.cc
          +++ b/storage/innobase/btr/btr0cur.cc
          @@ -4601,7 +4601,7 @@ btr_cur_parse_del_mark_set_clust_rec(
           		btr_rec_set_deleted_flag(rec, page_zip, val);
           		ut_ad(pos <= MAX_REF_PARTS);
           
          -		ulint offsets[REC_OFFS_HEADER_SIZE + MAX_REF_PARTS + 2];
          +		ulint offsets[REC_OFFS_HEADER_SIZE + MAX_REF_PARTS + 3];
           		rec_offs_init(offsets);
           		mem_heap_t*	heap	= NULL;
           
          @@ -4609,7 +4609,7 @@ btr_cur_parse_del_mark_set_clust_rec(
           			row_upd_rec_sys_fields_in_recovery(
           				rec, page_zip,
           				rec_get_offsets(rec, index, offsets,
          -						pos + 1, &heap),
          +						pos + 2, &heap),
           				pos, trx_id, roll_ptr);
           		} else {
           			/* In delete-marked records, DB_TRX_ID must
          

          marko Marko Mäkelä added a comment - Sorry, this was a simple off-by-one error. I think that it should affect debug builds only, because the out-of-bounds access occurred inside a debug assertion: ut_ad(field + DATA_TRX_ID_LEN == rec_get_nth_field(rec, offsets, trx_id_col + 1, &len)); The fix is simple: diff --git a/storage/innobase/btr/btr0cur.cc b/storage/innobase/btr/btr0cur.cc index a672f451ea9..14b24bcd9fd 100644 --- a/storage/innobase/btr/btr0cur.cc +++ b/storage/innobase/btr/btr0cur.cc @@ -4601,7 +4601,7 @@ btr_cur_parse_del_mark_set_clust_rec( btr_rec_set_deleted_flag(rec, page_zip, val); ut_ad(pos <= MAX_REF_PARTS); - ulint offsets[REC_OFFS_HEADER_SIZE + MAX_REF_PARTS + 2]; + ulint offsets[REC_OFFS_HEADER_SIZE + MAX_REF_PARTS + 3]; rec_offs_init(offsets); mem_heap_t* heap = NULL; @@ -4609,7 +4609,7 @@ btr_cur_parse_del_mark_set_clust_rec( row_upd_rec_sys_fields_in_recovery( rec, page_zip, rec_get_offsets(rec, index, offsets, - pos + 1, &heap), + pos + 2, &heap), pos, trx_id, roll_ptr); } else { /* In delete-marked records, DB_TRX_ID must

          It would be good if you could add comment why +3 is needed (e.g. in rem/rem0rec.cc: ulint offsets_[REC_OFFS_HEADER_SIZE + MAX_REF_PARTS + 2] and why pos + 2 .

          jplindst Jan Lindström (Inactive) added a comment - It would be good if you could add comment why +3 is needed (e.g. in rem/rem0rec.cc: ulint offsets_ [REC_OFFS_HEADER_SIZE + MAX_REF_PARTS + 2] and why pos + 2 .

          Sorry, I already pushed the patch. Initializing the first pos+2 elements of the offsets allows us to access the clustered index fields DB_TRX_ID (pos) and DB_ROLL_PTR (pos+1).
          On a second thought, the memory array size was sufficiently large for MAX_REF_PARTS+2 index fields.

          marko Marko Mäkelä added a comment - Sorry, I already pushed the patch. Initializing the first pos+2 elements of the offsets allows us to access the clustered index fields DB_TRX_ID (pos) and DB_ROLL_PTR (pos+1). On a second thought, the memory array size was sufficiently large for MAX_REF_PARTS+2 index fields.

          People

            marko Marko Mäkelä
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.