Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21347

innodb_log_optimize_ddl=OFF is not crash safe

    XMLWordPrintable

Details

    Description

      In MDEV-16809, MariaDB introduced a feature to enable redo log recording for bulk load index creation, which disabled redo log recording originally.

      This is a nice feature, it solves the issue for PXB backup with concurrent DDL, we ported this feature to our MySQL branch and report a feature request to MySQL upstream by https://bugs.mysql.com/bug.php?id=92099.

      But recently, we encountered several data corruption case in our pro env, and after some investigation, we found that this feature is not crash safe. If mysqld restart abnormally during DDL, crash happened during crash recovery. And I managed repro this bug using the latest mariadb (10.5, fetch source from github).

      Thread 7 "mysqld" received signal SIGSEGV, Segmentation fault.
       
      >>> bt
      #0  page_rec_find_owner_rec (rec=0x7fff7830c09d "\200") at /home/fungo/Projects/mariadb-server/storage/innobase/include/page0page.ic:770
      #1  page_cur_insert_rec_low (cur=cur@entry=0x7fff33ffd6f0, index=index@entry=0x7fff2c0016f0, rec=rec@entry=0x7fff33ffdab7 "\200", offsets=<optimized out>, mtr=mtr@entry=0x7fff33ffe420) at /home/fungo/Projects/mariadb-server/storage/innobase/page/page0cur.cc:1532
      #2  0x0000555556173aaa in page_cur_rec_insert (mtr=0x7fff33ffe420, offsets=<optimized out>, index=0x7fff2c0016f0, rec=0x7fff33ffdab7 "\200", cursor=0x7fff33ffd6f0) at /home/fungo/Projects/mariadb-server/storage/innobase/include/page0cur.ic:319
      #3  page_cur_parse_insert_rec (is_short=is_short@entry=false, ptr=<optimized out>, end_ptr=end_ptr@entry=0x55555ab30b51 "", block=block@entry=0x7fff78001590, index=0x7fff2c0016f0, mtr=mtr@entry=0x7fff33ffe420) at /home/fungo/Projects/mariadb-server/storage/innobase/page/page0cur.cc:1146
      #4  0x0000555556159559 in recv_parse_or_apply_log_rec_body (type=MLOG_COMP_REC_INSERT, ptr=<optimized out>, end_ptr=0x55555ab30b51 "", space_id=6, page_no=15, apply=apply@entry=true, block=block@entry=0x7fff78001590, mtr=mtr@entry=0x7fff33ffe420) at /home/fungo/Projects/mariadb-server/storage/innobase/log/log0recv.cc:1497
      #5  0x0000555556159d7b in recv_recover_page (block=block@entry=0x7fff78001590, mtr=..., p=..., init=init@entry=0x0) at /home/fungo/Projects/mariadb-server/storage/innobase/log/log0recv.cc:1951
      #6  0x0000555555b4fb25 in recv_recover_page (bpage=bpage@entry=0x7fff78001590) at /home/fungo/Projects/mariadb-server/storage/innobase/log/log0recv.cc:2048
      #7  0x000055555627945a in buf_page_io_complete (bpage=bpage@entry=0x7fff78001590, dblwr=dblwr@entry=true, evict=evict@entry=false) at /home/fungo/Projects/mariadb-server/storage/innobase/buf/buf0buf.cc:5993
      #8  0x00005555562d8098 in fil_aio_callback (cb=cb@entry=0x555557a60e60) at /home/fungo/Projects/mariadb-server/storage/innobase/fil/fil0fil.cc:4375
      #9  0x000055555616e363 in io_callback (cb=0x555557a60e60) at /home/fungo/Projects/mariadb-server/storage/innobase/os/os0file.cc:3880
      #10 0x0000555556350707 in tpool::task_group::execute (this=0x555557a3d480, t=0x555557a60ea8) at /home/fungo/Projects/mariadb-server/tpool/task_group.cc:55
      #11 0x000055555634efc1 in tpool::thread_pool_generic::worker_main (this=0x5555579fe090, thread_var=0x555557a0d8a0) at /home/fungo/Projects/mariadb-server/tpool/tpool_generic.cc:509
      #12 0x00007ffff6c36360 in ?? () from /lib64/libstdc++.so.6
      #13 0x00007ffff7bc6e25 in start_thread () from /lib64/libpthread.so.0
      #14 0x00007ffff6398f1d in clone () from /lib64/libc.so.6
      

      >>> p rec
      $1 = (rec_t *) 0x0
      
      

      Analysis:

      The reason is that
      1. redo log is recorded(page_cur_insert_rec_write_log())along with every record insertion in PageBulk::insert()
      2. the index page header info is fixed at BtrBulk::pageCommit() by invoking PageBulk::finishPage(), then we use PageBulk::commit() to commit the mtr (relese buf_fix_count, rwlock, and write local redo to global buffer).
      3. But we may commit the mtr too early by PageBulk::release()

      If the mtr is comimtted by PageBulk::release(), the rwlock is released, and the dirty page can be flush to disk. But the index page header info is not fixed.

      So if
      1. a checkpoint happened after PageBulk::release() and before PageBulk::commit().
      2. mysqld is killed (OOM or somehow) before InnoDB do a new checkpoint
      3. the crash recover after 2 will crash as showed in the previous stack

      Bellow is how I manually reproed this (need gdb asssitance):

      0. in my.cnf set innodb_sort_buffer_size=65536, make sure PageBulk::release() will be invoked.

      1. prepare data into t1
      create table t1(id int auto_increment, name varchar(30), primary key(id)) engine=innodb;

      insert into t1 values (1, "MySQL"), (2, "MariaDB"), (3, "AlisQL"), (4, "PolarDB"), (5, "hahaha");

      insert into t1(name) select a.name from t1 a, t1 b limit 5000;
      insert into t1(name) select a.name from t1 a, t1 b limit 5000;
      insert into t1(name) select a.name from t1 a, t1 b limit 5000;
      insert into t1(name) select a.name from t1 a, t1 b limit 5000;
      insert into t1(name) select a.name from t1 a, t1 b limit 5000;
      insert into t1(name) select a.name from t1 a, t1 b limit 5000;

      2. set gdb breakpoint at PageBulk::release()

      3. run
      optimize table t1;

      4. gdb will break at PageBulk::release()
      make PageBulk::release() finish, by 2 or more finish commands

      then using gdb maually call os_thread_sleep(30*1000000) to block current bulk load thread, and this will give page cleaner enougth time to flush all dirty pages and advance checkpoint

      5. after os_thread_sleep() return, using same gdb trick in step 4 to block page cleaner thread forever
      , such as call os_thread_sleep(3000*1000000)

      6. optimize query in step 3 will finish

      7. using `show engine innodb status`, we can see there is some un-checkpointed redo

      8. then kill -9 mysqld, and the crash recovery will crash

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              fungo Fungo Wang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.