Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23232

mysqld: storage/innobase/buf/buf0flu.cc:786: void buf_flush_init_for_writing(const buf_block_t*, byte*, void*, lsn_t, bool): Assertion `!newest_lsn || fil_page_get_type(page)' failed.

Details

    Description

      Assert hit during RQG testing
      mysqld: storage/innobase/buf/buf0flu.cc:786: void buf_flush_init_for_writing(const buf_block_t*, byte*, void*, lsn_t, bool): Assertion `!newest_lsn || fil_page_get_type(page)' failed.
      [ERROR] mysqld got signal 6 ;
       
      (rr) bt
      #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
      #1  0x00007ff322a288b1 in __GI_abort () at abort.c:79
      #2  0x00007ff322a1842a in __assert_fail_base (fmt=0x7ff322b9fa38 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x562dae4e99e0 "!newest_lsn || fil_page_get_type(page)", 
          file=file@entry=0x562dae4e9898 "/home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc", line=line@entry=786, 
          function=function@entry=0x562dae4ebac0 <buf_flush_init_for_writing(buf_block_t const*, unsigned char*, void*, unsigned long, bool)::__PRETTY_FUNCTION__> "void buf_flush_init_for_writing(const buf_block_t*, byte*, void*, lsn_t, bool)") at assert.c:92
      #3  0x00007ff322a184a2 in __GI___assert_fail (assertion=assertion@entry=0x562dae4e99e0 "!newest_lsn || fil_page_get_type(page)", file=file@entry=0x562dae4e9898 "/home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc", line=line@entry=786, 
          function=function@entry=0x562dae4ebac0 <buf_flush_init_for_writing(buf_block_t const*, unsigned char*, void*, unsigned long, bool)::__PRETTY_FUNCTION__> "void buf_flush_init_for_writing(const buf_block_t*, byte*, void*, lsn_t, bool)")
          at assert.c:101
      #4  0x0000562dade0a493 in buf_flush_init_for_writing (block=block@entry=0x7ff323a877c0, page=0x7ff323bd8000 "", page_zip_=0x0, newest_lsn=900817, use_full_checksum=<optimized out>) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:786
      #5  0x0000562dade0dbf0 in buf_flush_write_block_low (bpage=bpage@entry=0x7ff323a877c0, flush_type=flush_type@entry=BUF_FLUSH_LIST, sync=sync@entry=false) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:1019
      #6  0x0000562dade10ace in buf_flush_page (buf_pool=buf_pool@entry=0x562db0ac68d0, bpage=bpage@entry=0x7ff323a877c0, flush_type=flush_type@entry=BUF_FLUSH_LIST, sync=sync@entry=false) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:1197
      #7  0x0000562dade12877 in buf_flush_try_neighbors (page_id=page_id@entry=..., flush_type=flush_type@entry=BUF_FLUSH_LIST, n_flushed=4, n_to_flush=n_to_flush@entry=8) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:1423
      #8  0x0000562dade12b2b in buf_flush_page_and_try_neighbors (bpage=bpage@entry=0x7ff323a877c0, flush_type=flush_type@entry=BUF_FLUSH_LIST, n_to_flush=n_to_flush@entry=8, count=count@entry=0x7ff303bfea38)
          at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:1497
      #9  0x0000562dade12ddf in buf_do_flush_list_batch (buf_pool=buf_pool@entry=0x562db0ac68d0, min_n=min_n@entry=8, lsn_limit=lsn_limit@entry=18446744073709551615) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:1756
      #10 0x0000562dade13a09 in buf_flush_batch (buf_pool=buf_pool@entry=0x562db0ac68d0, flush_type=flush_type@entry=BUF_FLUSH_LIST, min_n=min_n@entry=8, lsn_limit=lsn_limit@entry=18446744073709551615, n=n@entry=0x7ff303bfeca0)
          at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:1826
      #11 0x0000562dade13d44 in buf_flush_do_batch (buf_pool=buf_pool@entry=0x562db0ac68d0, type=type@entry=BUF_FLUSH_LIST, min_n=8, lsn_limit=18446744073709551615, n=n@entry=0x7ff303bfeca0) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:1984
      #12 0x0000562dade1442a in pc_flush_slot () at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:2783
      #13 0x0000562dade159e3 in buf_flush_page_cleaner_coordinator () at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:3196
      #14 0x00007ff322de06db in start_thread (arg=0x7ff303bff700) at pthread_create.c:463
      #15 0x00007ff322b09a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      (rr)
       
      origin/bb-10.4-thiru a13662dac1df224c607983815b2fe85ee72749c2 2020-07-17T22:49:03+05:30
       
      RQG
      git clone https://github.com/mleich1/rqg --branch experimental RQG
      origin/experimental f3d72ce51b150ee1f52a93d18f2024af293d925e 2020-07-13T20:46:53+02:00
       
      perl rqg.pl \                        
      --grammar=conf/mariadb/table_stress_innodb_nocopy.yy \
      --gendata=conf/mariadb/table_stress.zz \
      --gendata_sql=conf/mariadb/table_stress.sql \
      --mysqld=--innodb_use_native_aio=0 \
      --mysqld=--innodb_lock_schedule_algorithm=fcfs \
      --mysqld=--loose-idle_write_transaction_timeout=0 \
      --mysqld=--loose-idle_transaction_timeout=0 \
      --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      --mysqld=--connect_timeout=60 \
      --mysqld=--interactive_timeout=28800 \
      --mysqld=--slave_net_timeout=60 \
      --mysqld=--net_read_timeout=30 \
      --mysqld=--net_write_timeout=60 \
      --mysqld=--loose-table_lock_wait_timeout=50 \
      --mysqld=--wait_timeout=28800 \
      --mysqld=--lock-wait-timeout=86400 \
      --mysqld=--innodb-lock-wait-timeout=50 \
      --no-mask \
      --queries=10000000 \
      --seed=random \
      --reporters=Backtrace \
      --reporters=ErrorLog \
      --reporters=Deadlock1 \
      --reporters=CrashRecovery1 \
      --validators=None \
      --mysqld=--log_output=none \
      --mysqld=--log-bin \
      --mysqld=--log_bin_trust_function_creators=1 \
      --mysqld=--loose-max-statement-time=30 \
      --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      --engine=InnoDB \
      --restart_timeout=120 \
      --duration=300 \
      --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      --threads=9 \
      --mysqld=--innodb_page_size=4K \
      --mysqld=--innodb-buffer-pool-size=5M \
      --duration=300 \
      --no_mask \
      --workdir=<local settings> \
      --vardir=<local settings> \
      --mtr-build-thread=<local settings> \
      --basedir1=<local settings> \
      --script_debug=_nix_ \
      --rr=Server \
      --rr_options=--chaos
      

      Attachments

        Issue Links

          Activity

            sdp:/RQG/storage/1595007801/TBR-617/dev/shm/vardir/1595007801/93/rr_trace
            _RR_TRACE_DIR="." rr replay --mark-stdio mysqld-0
            

            mleich Matthias Leich added a comment - sdp:/RQG/storage/1595007801/TBR-617/dev/shm/vardir/1595007801/93/rr_trace _RR_TRACE_DIR="." rr replay --mark-stdio mysqld-0

            My error tag for RQG
            [ 'TBR-617' , 'mysqld: .{1,150}buf0flu.cc:.{1,30}void buf_flush_init_for_writing.{1,200}: Assertion .\!newest_lsn \|\| fil_page_get_type\(page\). failed' ],
            

            mleich Matthias Leich added a comment - My error tag for RQG [ 'TBR-617' , 'mysqld: .{1,150}buf0flu.cc:.{1,30}void buf_flush_init_for_writing.{1,200}: Assertion .\!newest_lsn \|\| fil_page_get_type\(page\). failed' ],
            marko Marko Mäkelä added a comment - - edited

            It seems that the native ALTER TABLE in MariaDB 10.2.2 or later (probably due to inheriting this change in MySQL 5.7.8) is not entirely crash-safe, and the block allocation data structures in the .ibd file could be corrupted, or at the very least, allocated pages could be leaked (fail to be freed).

            At the time of the assertion failure, all innodb_page_size=4k bytes of the page are 0. InnoDB should never write such pages via the buffer pool page flushing interface. There is a block descriptor that is pointing to the all-zero page frame:

            bb-10.4-thiru a13662dac1df224c607983815b2fe85ee72749c2 (one commit ahead of 10.4 956f21c3b0beba89835e8d24fde4b67b589ab142)

            $1 = {id = {m_space = 6, m_page_no = 22}, hash = 0x0, buf_fix_count = {
                m_counter = {<std::__atomic_base<unsigned int>> = {
                    static _S_alignment = 4, _M_i = 0}, <No data fields>}}, 
              io_fix = BUF_IO_WRITE, state = BUF_BLOCK_FILE_PAGE, flush_type = 1, 
              buf_pool_index = 0, zip = {data = 0x0, m_start = 0, m_external = false, 
                m_end = 0, m_nonempty = 0, n_blobs = 0, ssize = 0}, real_size = 0, 
              slot = 0x0, in_page_hash = 1, in_zip_hash = 0, list = {
                prev = 0x7ff323a94a30, next = 0x7ff323a71840}, in_flush_list = 1, 
              in_free_list = 0, flush_observer = 0x0, newest_modification = 900817, 
              oldest_modification = 900809, LRU = {prev = 0x7ff323a963a0, 
                next = 0x7ff323a77270}, in_LRU_list = 1, old = 0, freed_page_clock = 1645, 
              access_time = 804496324, file_page_was_freed = 0}
            

            The all-zero page was added to the flush-list as part of a mini-transaction commit:

            (rr) when
            Current event: 120326
            (rr) bt
            …
            #6  0x0000562dadba0b9a in mtr_t::commit (this=this@entry=0x7ff3261c3070)
                at /home/mleich/bb-10.4-thiru/storage/innobase/mtr/mtr0mtr.cc:438
            #7  0x0000562dadcccdd8 in row_upd_clust_step (node=node@entry=0x7ff2d8066690, 
                thr=thr@entry=0x7ff2d8066e90)
                at /home/mleich/bb-10.4-thiru/storage/innobase/row/row0upd.cc:3131
            #8  0x0000562dadcce9ab in row_upd (node=node@entry=0x7ff2d8066690, 
                thr=thr@entry=0x7ff2d8066e90)
                at /home/mleich/bb-10.4-thiru/storage/innobase/row/row0upd.cc:3299
            #9  0x0000562dadccef3b in row_upd_step (thr=thr@entry=0x7ff2d8066e90)
                at /home/mleich/bb-10.4-thiru/storage/innobase/row/row0upd.cc:3443
            #10 0x0000562dadbf680f in que_thr_step (thr=0x7ff2d8066e90)
                at /home/mleich/bb-10.4-thiru/storage/innobase/que/que0que.cc:1021
            #11 que_run_threads_low (thr=thr@entry=0x7ff2d8066e90)
                at /home/mleich/bb-10.4-thiru/storage/innobase/que/que0que.cc:1101
            #12 0x0000562dadbf73f7 in que_run_threads (thr=0x7ff2d8066e90)
                at /home/mleich/bb-10.4-thiru/storage/innobase/que/que0que.cc:1141
            #13 0x0000562dadbf877a in que_eval_sql (info=info@entry=0x562da004b620, 
                sql=sql@entry=0x562dae4972c0 <row_merge_drop_index_dict(trx_t*, unsigned long)::sql> "PROCEDURE DROP_INDEX_PROC () IS\nBEGIN\nDELETE FROM SYS_FIELDS WHERE INDEX_ID=:indexid;\nDELETE FROM SYS_INDEXES WHERE ID=:indexid;\nEND;\n", 
            

            Earlier, the block had been subjected to a read:

            (rr) when
            Current event: 120322
            (rr) bt
            …
            #2  0x0000562daddedb61 in buf_page_init_for_read (
                err=err@entry=0x7ff3261c287c, mode=mode@entry=132, page_id=..., 
                zip_size=zip_size@entry=0, unzip=unzip@entry=false)
                at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0buf.cc:5422
            #3  0x0000562dade292af in buf_read_page_low (err=err@entry=0x7ff3261c287c, 
                sync=sync@entry=true, type=type@entry=0, mode=mode@entry=132, page_id=..., 
                zip_size=zip_size@entry=0, unzip=false, ignore_missing_space=false)
                at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0rea.cc:152
            …
            #6  0x0000562daddf2d21 in buf_page_get_gen (page_id=..., page_id@entry=..., 
                zip_size=zip_size@entry=0, rw_latch=rw_latch@entry=2, 
                guess=guess@entry=0x0, mode=mode@entry=10, 
                file=file@entry=0x562dae4cc368 "/home/mleich/bb-10.4-thiru/storage/innobase/btr/btr0btr.cc", line=1040, mtr=0x7ff3261c3070, err=0x0)
                at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0buf.cc:4942
            #7  0x0000562dadd650fe in btr_free_root_check (page_id=..., 
                zip_size=zip_size@entry=0, index_id=index_id@entry=55, 
                mtr=mtr@entry=0x7ff3261c3070)
                at /home/mleich/bb-10.4-thiru/storage/innobase/btr/btr0btr.cc:1039
            #8  0x0000562dadd699f6 in btr_free_if_exists (page_id=page_id@entry=..., 
                zip_size=zip_size@entry=0, index_id=index_id@entry=55, 
                mtr=mtr@entry=0x7ff3261c3070)
                at /home/mleich/bb-10.4-thiru/storage/innobase/btr/btr0btr.cc:1278
            #9  0x0000562dade414be in dict_drop_index_tree (rec=<optimized out>, 
                pcur=pcur@entry=0x7ff2d8066ae0, trx=trx@entry=0x6efc5ec4a830, 
                mtr=mtr@entry=0x7ff3261c3070)
            

            Yes, we are requesting the page 6:22 above. It is not coming from a read-ahead.

            As it was already reported in MDEV-22110, we would be unnecessarily adding the unmodified page to the flush list at mtr_t::commit().

            The index tree had been created earlier. Here is the write of the SYS_INDEXES.PAGE_NO:

            (rr) when
            Current event: 119607
            (rr) bt
            #0  mach_write_to_4 (n=22, b=0x7ff323c4acfd "")
                at /home/mleich/bb-10.4-thiru/storage/innobase/include/mach0data.ic:166
            #1  mlog_write_ulint (ptr=ptr@entry=0x7ff323c4acfd "", val=22, 
                type=type@entry=MLOG_4BYTES, mtr=mtr@entry=0x7ff3261c3850)
                at /home/mleich/bb-10.4-thiru/storage/innobase/mtr/mtr0log.cc:266
            #2  0x0000562dade43dbe in dict_create_index_tree_step (
                node=node@entry=0x562da004b620)
                at /home/mleich/bb-10.4-thiru/storage/innobase/dict/dict0crea.cc:863
            #3  0x0000562dade44382 in dict_create_index_step (thr=thr@entry=0x7ff2d8053e58)
                at /home/mleich/bb-10.4-thiru/storage/innobase/dict/dict0crea.cc:1272
            

            At that point of time, the root page is nonzero, and the block was in the flush list. It was later removed, apparently without actually writing the page, because the operation had been interrupted:

            (rr) when
            Current event: 120010
            (rr) bt
            #0  buf_flush_remove (bpage=bpage@entry=0x7ff323a76dd0)
                at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:597
            #1  0x0000562dade1b2b0 in buf_flush_or_remove_page (
                buf_pool=buf_pool@entry=0x562db0ac68d0, bpage=0x7ff323a76dd0, 
                flush=<optimized out>)
                at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0lru.cc:360
            #2  0x0000562dade1b97c in buf_flush_or_remove_pages (
                buf_pool=buf_pool@entry=0x562db0ac68d0, id=id@entry=6, 
                observer=observer@entry=0x7ff2d80199f0, first=first@entry=0)
                at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0lru.cc:436
            #3  0x0000562dade1ba8e in buf_flush_dirty_pages (buf_pool=0x562db0ac68d0, 
                id=id@entry=6, observer=observer@entry=0x7ff2d80199f0, first=first@entry=0)
                at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0lru.cc:530
            #4  0x0000562dade1c153 in buf_LRU_flush_or_remove_pages (id=6, 
                observer=observer@entry=0x7ff2d80199f0, first=first@entry=0)
                at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0lru.cc:561
            #5  0x0000562dade10249 in FlushObserver::flush (this=this@entry=0x7ff2d80199f0)
                at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:3767
            #6  0x0000562dadc5df55 in row_merge_build_indexes (trx=<optimized out>, 
                old_table=<optimized out>, new_table=0x7ff2cc092f70, 
                online=<optimized out>, indexes=<optimized out>, 
                key_numbers=<optimized out>, n_indexes=<optimized out>, 
                table=<optimized out>, defaults=<optimized out>, col_map=<optimized out>, 
            …
            (rr) frame 5
            #5  0x0000562dade10249 in FlushObserver::flush (this=this@entry=0x7ff2d80199f0)
                at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:3767
            3767		buf_LRU_flush_or_remove_pages(m_space->id, this);
            (rr) p *this
            $19 = {m_space = 0x7ff2d801e3b0, m_trx = 0x6efc5ec4dba8, 
              m_stage = 0x7ff2d805f400, m_flushed = 0x7ff2d404aee0, 
              m_removed = 0x7ff2d4040160, m_interrupted = true}
            

            Fixing MDEV-22110 would likely make this assertion failure go away.

            However, it seems very risky to discard the changes to the index root page in FlushObserver::flush(). In the best case, it could lead to pages never being freed in the tablespace. In the worst case, if we are also discarding changes to the page allocation data structures (which the BTR_SEG_TOP and BTR_SEG_LEAF are pointing to), this could lead to corrupting the .ibd file, especially if the server is killed soon after that, or mariabackup --backup is executing concurrently.

            marko Marko Mäkelä added a comment - - edited It seems that the native ALTER TABLE in MariaDB 10.2.2 or later (probably due to inheriting this change in MySQL 5.7.8 ) is not entirely crash-safe, and the block allocation data structures in the .ibd file could be corrupted, or at the very least, allocated pages could be leaked (fail to be freed). At the time of the assertion failure, all innodb_page_size=4k bytes of the page are 0. InnoDB should never write such pages via the buffer pool page flushing interface. There is a block descriptor that is pointing to the all-zero page frame: bb-10.4-thiru a13662dac1df224c607983815b2fe85ee72749c2 (one commit ahead of 10.4 956f21c3b0beba89835e8d24fde4b67b589ab142) $1 = {id = {m_space = 6, m_page_no = 22}, hash = 0x0, buf_fix_count = { m_counter = {<std::__atomic_base<unsigned int>> = { static _S_alignment = 4, _M_i = 0}, <No data fields>}}, io_fix = BUF_IO_WRITE, state = BUF_BLOCK_FILE_PAGE, flush_type = 1, buf_pool_index = 0, zip = {data = 0x0, m_start = 0, m_external = false, m_end = 0, m_nonempty = 0, n_blobs = 0, ssize = 0}, real_size = 0, slot = 0x0, in_page_hash = 1, in_zip_hash = 0, list = { prev = 0x7ff323a94a30, next = 0x7ff323a71840}, in_flush_list = 1, in_free_list = 0, flush_observer = 0x0, newest_modification = 900817, oldest_modification = 900809, LRU = {prev = 0x7ff323a963a0, next = 0x7ff323a77270}, in_LRU_list = 1, old = 0, freed_page_clock = 1645, access_time = 804496324, file_page_was_freed = 0} The all-zero page was added to the flush-list as part of a mini-transaction commit: (rr) when Current event: 120326 (rr) bt … #6 0x0000562dadba0b9a in mtr_t::commit (this=this@entry=0x7ff3261c3070) at /home/mleich/bb-10.4-thiru/storage/innobase/mtr/mtr0mtr.cc:438 #7 0x0000562dadcccdd8 in row_upd_clust_step (node=node@entry=0x7ff2d8066690, thr=thr@entry=0x7ff2d8066e90) at /home/mleich/bb-10.4-thiru/storage/innobase/row/row0upd.cc:3131 #8 0x0000562dadcce9ab in row_upd (node=node@entry=0x7ff2d8066690, thr=thr@entry=0x7ff2d8066e90) at /home/mleich/bb-10.4-thiru/storage/innobase/row/row0upd.cc:3299 #9 0x0000562dadccef3b in row_upd_step (thr=thr@entry=0x7ff2d8066e90) at /home/mleich/bb-10.4-thiru/storage/innobase/row/row0upd.cc:3443 #10 0x0000562dadbf680f in que_thr_step (thr=0x7ff2d8066e90) at /home/mleich/bb-10.4-thiru/storage/innobase/que/que0que.cc:1021 #11 que_run_threads_low (thr=thr@entry=0x7ff2d8066e90) at /home/mleich/bb-10.4-thiru/storage/innobase/que/que0que.cc:1101 #12 0x0000562dadbf73f7 in que_run_threads (thr=0x7ff2d8066e90) at /home/mleich/bb-10.4-thiru/storage/innobase/que/que0que.cc:1141 #13 0x0000562dadbf877a in que_eval_sql (info=info@entry=0x562da004b620, sql=sql@entry=0x562dae4972c0 <row_merge_drop_index_dict(trx_t*, unsigned long)::sql> "PROCEDURE DROP_INDEX_PROC () IS\nBEGIN\nDELETE FROM SYS_FIELDS WHERE INDEX_ID=:indexid;\nDELETE FROM SYS_INDEXES WHERE ID=:indexid;\nEND;\n", Earlier, the block had been subjected to a read: (rr) when Current event: 120322 (rr) bt … #2 0x0000562daddedb61 in buf_page_init_for_read ( err=err@entry=0x7ff3261c287c, mode=mode@entry=132, page_id=..., zip_size=zip_size@entry=0, unzip=unzip@entry=false) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0buf.cc:5422 #3 0x0000562dade292af in buf_read_page_low (err=err@entry=0x7ff3261c287c, sync=sync@entry=true, type=type@entry=0, mode=mode@entry=132, page_id=..., zip_size=zip_size@entry=0, unzip=false, ignore_missing_space=false) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0rea.cc:152 … #6 0x0000562daddf2d21 in buf_page_get_gen (page_id=..., page_id@entry=..., zip_size=zip_size@entry=0, rw_latch=rw_latch@entry=2, guess=guess@entry=0x0, mode=mode@entry=10, file=file@entry=0x562dae4cc368 "/home/mleich/bb-10.4-thiru/storage/innobase/btr/btr0btr.cc", line=1040, mtr=0x7ff3261c3070, err=0x0) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0buf.cc:4942 #7 0x0000562dadd650fe in btr_free_root_check (page_id=..., zip_size=zip_size@entry=0, index_id=index_id@entry=55, mtr=mtr@entry=0x7ff3261c3070) at /home/mleich/bb-10.4-thiru/storage/innobase/btr/btr0btr.cc:1039 #8 0x0000562dadd699f6 in btr_free_if_exists (page_id=page_id@entry=..., zip_size=zip_size@entry=0, index_id=index_id@entry=55, mtr=mtr@entry=0x7ff3261c3070) at /home/mleich/bb-10.4-thiru/storage/innobase/btr/btr0btr.cc:1278 #9 0x0000562dade414be in dict_drop_index_tree (rec=<optimized out>, pcur=pcur@entry=0x7ff2d8066ae0, trx=trx@entry=0x6efc5ec4a830, mtr=mtr@entry=0x7ff3261c3070) Yes, we are requesting the page 6:22 above. It is not coming from a read-ahead. As it was already reported in MDEV-22110 , we would be unnecessarily adding the unmodified page to the flush list at mtr_t::commit() . The index tree had been created earlier. Here is the write of the SYS_INDEXES.PAGE_NO : (rr) when Current event: 119607 (rr) bt #0 mach_write_to_4 (n=22, b=0x7ff323c4acfd "") at /home/mleich/bb-10.4-thiru/storage/innobase/include/mach0data.ic:166 #1 mlog_write_ulint (ptr=ptr@entry=0x7ff323c4acfd "", val=22, type=type@entry=MLOG_4BYTES, mtr=mtr@entry=0x7ff3261c3850) at /home/mleich/bb-10.4-thiru/storage/innobase/mtr/mtr0log.cc:266 #2 0x0000562dade43dbe in dict_create_index_tree_step ( node=node@entry=0x562da004b620) at /home/mleich/bb-10.4-thiru/storage/innobase/dict/dict0crea.cc:863 #3 0x0000562dade44382 in dict_create_index_step (thr=thr@entry=0x7ff2d8053e58) at /home/mleich/bb-10.4-thiru/storage/innobase/dict/dict0crea.cc:1272 At that point of time, the root page is nonzero, and the block was in the flush list. It was later removed, apparently without actually writing the page, because the operation had been interrupted: (rr) when Current event: 120010 (rr) bt #0 buf_flush_remove (bpage=bpage@entry=0x7ff323a76dd0) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:597 #1 0x0000562dade1b2b0 in buf_flush_or_remove_page ( buf_pool=buf_pool@entry=0x562db0ac68d0, bpage=0x7ff323a76dd0, flush=<optimized out>) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0lru.cc:360 #2 0x0000562dade1b97c in buf_flush_or_remove_pages ( buf_pool=buf_pool@entry=0x562db0ac68d0, id=id@entry=6, observer=observer@entry=0x7ff2d80199f0, first=first@entry=0) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0lru.cc:436 #3 0x0000562dade1ba8e in buf_flush_dirty_pages (buf_pool=0x562db0ac68d0, id=id@entry=6, observer=observer@entry=0x7ff2d80199f0, first=first@entry=0) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0lru.cc:530 #4 0x0000562dade1c153 in buf_LRU_flush_or_remove_pages (id=6, observer=observer@entry=0x7ff2d80199f0, first=first@entry=0) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0lru.cc:561 #5 0x0000562dade10249 in FlushObserver::flush (this=this@entry=0x7ff2d80199f0) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:3767 #6 0x0000562dadc5df55 in row_merge_build_indexes (trx=<optimized out>, old_table=<optimized out>, new_table=0x7ff2cc092f70, online=<optimized out>, indexes=<optimized out>, key_numbers=<optimized out>, n_indexes=<optimized out>, table=<optimized out>, defaults=<optimized out>, col_map=<optimized out>, … (rr) frame 5 #5 0x0000562dade10249 in FlushObserver::flush (this=this@entry=0x7ff2d80199f0) at /home/mleich/bb-10.4-thiru/storage/innobase/buf/buf0flu.cc:3767 3767 buf_LRU_flush_or_remove_pages(m_space->id, this); (rr) p *this $19 = {m_space = 0x7ff2d801e3b0, m_trx = 0x6efc5ec4dba8, m_stage = 0x7ff2d805f400, m_flushed = 0x7ff2d404aee0, m_removed = 0x7ff2d4040160, m_interrupted = true} Fixing MDEV-22110 would likely make this assertion failure go away. However, it seems very risky to discard the changes to the index root page in FlushObserver::flush() . In the best case, it could lead to pages never being freed in the tablespace. In the worst case, if we are also discarding changes to the page allocation data structures (which the BTR_SEG_TOP and BTR_SEG_LEAF are pointing to), this could lead to corrupting the .ibd file, especially if the server is killed soon after that, or mariabackup --backup is executing concurrently.

            mleich, did you ever repeat this on 10.5?

            Can you repeat this bug on any of 10.2,10.3,10.4 with innodb_log_optimize_ddl=OFF? I am thinking of whether we should change that parameter to that safe value by default. My microbenchmarks in MDEV-19747 suggest that the parameter should not yield any performance benefit on a properly configured system. Note: also MDEV-20755 seems to be related to that unsafe default value.

            marko Marko Mäkelä added a comment - mleich , did you ever repeat this on 10.5? Can you repeat this bug on any of 10.2,10.3,10.4 with innodb_log_optimize_ddl=OFF ? I am thinking of whether we should change that parameter to that safe value by default. My microbenchmarks in MDEV-19747 suggest that the parameter should not yield any performance benefit on a properly configured system. Note: also MDEV-20755 seems to be related to that unsafe default value.

            I cannot tell if I ever repeated this on 10.5.
            The files with aggregated testing results were lost when the access to some testing box stopped.
            There was no replay of that problem in the last three weeks.

            mleich Matthias Leich added a comment - I cannot tell if I ever repeated this on 10.5. The files with aggregated testing results were lost when the access to some testing box stopped. There was no replay of that problem in the last three weeks.

            The attempt to replay the problem and simplify the test resulted in 2 replays per > 6100 RQG runs.
            So roughly solid revealing that innodb_log_optimize_ddl=OFF fixes the problem would require 15000 till 20000 RQG runs.
            Therefore I give up except Marko insists in making that huge testing campaign.

            mleich Matthias Leich added a comment - The attempt to replay the problem and simplify the test resulted in 2 replays per > 6100 RQG runs. So roughly solid revealing that innodb_log_optimize_ddl=OFF fixes the problem would require 15000 till 20000 RQG runs. Therefore I give up except Marko insists in making that huge testing campaign.

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.