Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33819

The purge of committed history is mis-parsing some log

Details

    Description

      While the testing of MDEV-33588, mleich encountered an rr replay trace where the purge_coordinator_task is trying to parse arbitrary data in the middle of an undo log record. I narrowed it down to the following code in purge_sys_t::choose_next_log():

          const trx_undo_rec_t *undo_rec=
            trx_undo_page_get_first_rec(b, hdr_page_no, hdr_offset);
      

      Here, b is undo log page 13, and hdr_page_no and hdr_offset are related to that. I think that we would benefit from some testing with the following patch to catch the root cause of the corruption sooner:

      @@ -134,8 +134,9 @@ trx_undo_page_get_first_rec(const buf_block_t *block, uint32_t page_no,
                                   uint16_t offset)
       {
         uint16_t start= trx_undo_page_get_start(block, page_no, offset);
      -  return start == trx_undo_page_get_end(block, page_no, offset)
      -    ? nullptr : block->page.frame + start;
      +  uint16_t end= trx_undo_page_get_end(block, page_no, offset);
      +  ut_ad(start <= end);
      +  return start >= end ? nullptr : block->page.frame + start;
       }
       
       /** Get the last undo log record on a page.
      

      In the trace that I analyzed, we would be reading an invalid start value that points outside the page frame.

      Attachments

        Issue Links

          Activity

            ssh pluto
            rr replay /data/results/1711541897/TBR-2112/1/rr/latest-trace/
            

            At when 343065 the undo log header in page 13 was initialized in trx_undo_header_create() inside trx_undo_reuse_cached(). Slightly after that, at the same event counter, there is a call to buf_page_create() to create the undo log page 14. At when 348171 this undo log page 14 will be evicted from the buffer pool. At when 348202 it is being loaded back intact, with FIL_PAGE_LSN being 0x029c5eb8. The assertion fails, because trx_undo_page_get_start() is reading the value 0x2a98 from undo page 13. The bytes around that had been written by the following:

              mtr->memcpy<mtr_t::MAYBE_NOP>(*undo_page, free + TRX_UNDO_NEEDS_PURGE +
                                            undo_page->page.frame, buf, 4);
            

            The contents of buf is 0, 1, 0x2a, 0x98, so TRX_UNDO_LOG_START is really intended to contain the value 0x2a98, which is later being read. I think that I must check what happens to this undo log page 13 between the time it was written and the parsing fails. The analysis is complicated by the fact that also this page is being evicted from the buffer pool and reloaded again.

            marko Marko Mäkelä added a comment - ssh pluto rr replay /data/results/1711541897/TBR-2112/1/rr/latest-trace/ At when 343065 the undo log header in page 13 was initialized in trx_undo_header_create() inside trx_undo_reuse_cached() . Slightly after that, at the same event counter, there is a call to buf_page_create() to create the undo log page 14. At when 348171 this undo log page 14 will be evicted from the buffer pool. At when 348202 it is being loaded back intact, with FIL_PAGE_LSN being 0x029c5eb8. The assertion fails, because trx_undo_page_get_start() is reading the value 0x2a98 from undo page 13. The bytes around that had been written by the following: mtr-> memcpy <mtr_t::MAYBE_NOP>(*undo_page, free + TRX_UNDO_NEEDS_PURGE + undo_page->page.frame, buf, 4); The contents of buf is 0, 1, 0x2a, 0x98, so TRX_UNDO_LOG_START is really intended to contain the value 0x2a98, which is later being read. I think that I must check what happens to this undo log page 13 between the time it was written and the parsing fails. The analysis is complicated by the fact that also this page is being evicted from the buffer pool and reloaded again.

            The undo page 13 was evicted from the buffer pool at when 348199 and read back in identical state at when 348201. The contents of the page remained unchanged until the assertion failure at when 348216. The problematic offset 0x2a98 was read from 10718+TRX_UNDO_LOG_START = 0x29f0. That part of the page 13 did not change while some surroundings did:

             0029b0 04 01 00 00 00 48 01 6c 49 04 5b fe 00 00 4a 05
             0029c0 80 00 00 00 03 4e 01 75 52 01 73 53 01 05 54 f0
             0029d0 ff ff ff ff 00 06 00 00 00 00 02 25 1e b5 00 00
            -0029e0 00 00 00 00 02 0e 00 00 00 00 00 00 00 00 00 01
            +0029e0 00 00 00 00 02 0e 00 00 00 00 00 00 02 15 00 01
             0029f0 2a 98 00 00 00 00 00 00 00 00 00 00 00 00 00 56
            -002a00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
            +002a00 ff ff ff ff 00 00 00 00 00 0d 00 78 00 00 00 00
            +002a10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
             *
            

            Finally, I seem to have found the problem: We are actually passing the offset that is valid for undo page 13, to undo page 14:

            Thread 13 hit Breakpoint 6, trx_undo_page_get_start (block=block@entry=0x546f108f0af0, page_no=page_no@entry=13, offset=10718) at /data/Server/10.6-MDEV-33588/storage/innobase/include/buf0types.h:135
            135	  constexpr uint32_t page_no() const { return static_cast<uint32_t>(m_id); }
            (rr) when
            Current event: 348201
            (rr) c
            Continuing.
             
            Thread 13 hit Breakpoint 6, trx_undo_page_get_start (block=block@entry=0x546f108dc6c0, page_no=page_no@entry=14, offset=10718) at /data/Server/10.6-MDEV-33588/storage/innobase/include/buf0types.h:135
            135	  constexpr uint32_t page_no() const { return static_cast<uint32_t>(m_id); }
            (rr) when
            Current event: 348202
            (rr) bt
            #0  trx_undo_page_get_start (block=block@entry=0x546f108dc6c0, page_no=page_no@entry=14, offset=10718) at /data/Server/10.6-MDEV-33588/storage/innobase/include/buf0types.h:135
            #1  0x000055c52cac2aa0 in trx_undo_page_get_first_rec (block=0x546f108dc6c0, page_no=14, offset=<optimized out>) at /data/Server/10.6-MDEV-33588/storage/innobase/trx/trx0undo.cc:136
            #2  0x000055c52ca947a8 in purge_sys_t::choose_next_log (this=this@entry=0x55c52e064dc0 <purge_sys>) at /data/Server/10.6-MDEV-33588/storage/innobase/trx/trx0purge.cc:967
            #3  0x000055c52ca9b63c in purge_sys_t::get_next_rec (this=this@entry=0x55c52e064dc0 <purge_sys>, roll_ptr=roll_ptr@entry=30962247439089936) at /data/Server/10.6-MDEV-33588/storage/innobase/trx/trx0purge.cc:1051
            #4  0x000055c52ca94cf9 in purge_sys_t::fetch_next_rec (this=0x55c52e064dc0 <purge_sys>) at /data/Server/10.6-MDEV-33588/storage/innobase/trx/trx0purge.cc:1091
            #5  trx_purge_attach_undo_recs (n_purge_threads=n_purge_threads@entry=4, thd=thd@entry=0x55c52ed28668) at /data/Server/10.6-MDEV-33588/storage/innobase/trx/trx0purge.cc:1325
            #6  0x000055c52ca9554d in trx_purge (n_tasks=n_tasks@entry=4, history_size=62) at /data/Server/10.6-MDEV-33588/storage/innobase/trx/trx0purge.cc:1449
            #7  0x000055c52ca83e11 in purge_coordinator_state::do_purge (this=this@entry=0x55c52e063f20 <purge_state>) at /data/Server/10.6-MDEV-33588/storage/innobase/srv/srv0srv.cc:1638
            #8  0x000055c52ca82e73 in purge_coordinator_callback () at /data/Server/10.6-MDEV-33588/storage/innobase/srv/srv0srv.cc:1716
            

            marko Marko Mäkelä added a comment - The undo page 13 was evicted from the buffer pool at when 348199 and read back in identical state at when 348201. The contents of the page remained unchanged until the assertion failure at when 348216. The problematic offset 0x2a98 was read from 10718+ TRX_UNDO_LOG_START = 0x29f0. That part of the page 13 did not change while some surroundings did: 0029b0 04 01 00 00 00 48 01 6c 49 04 5b fe 00 00 4a 05 0029c0 80 00 00 00 03 4e 01 75 52 01 73 53 01 05 54 f0 0029d0 ff ff ff ff 00 06 00 00 00 00 02 25 1e b5 00 00 -0029e0 00 00 00 00 02 0e 00 00 00 00 00 00 00 00 00 01 +0029e0 00 00 00 00 02 0e 00 00 00 00 00 00 02 15 00 01 0029f0 2a 98 00 00 00 00 00 00 00 00 00 00 00 00 00 56 -002a00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +002a00 ff ff ff ff 00 00 00 00 00 0d 00 78 00 00 00 00 +002a10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * Finally, I seem to have found the problem: We are actually passing the offset that is valid for undo page 13, to undo page 14: Thread 13 hit Breakpoint 6, trx_undo_page_get_start (block=block@entry=0x546f108f0af0, page_no=page_no@entry=13, offset=10718) at /data/Server/10.6-MDEV-33588/storage/innobase/include/buf0types.h:135 135 constexpr uint32_t page_no() const { return static_cast<uint32_t>(m_id); } (rr) when Current event: 348201 (rr) c Continuing.   Thread 13 hit Breakpoint 6, trx_undo_page_get_start (block=block@entry=0x546f108dc6c0, page_no=page_no@entry=14, offset=10718) at /data/Server/10.6-MDEV-33588/storage/innobase/include/buf0types.h:135 135 constexpr uint32_t page_no() const { return static_cast<uint32_t>(m_id); } (rr) when Current event: 348202 (rr) bt #0 trx_undo_page_get_start (block=block@entry=0x546f108dc6c0, page_no=page_no@entry=14, offset=10718) at /data/Server/10.6-MDEV-33588/storage/innobase/include/buf0types.h:135 #1 0x000055c52cac2aa0 in trx_undo_page_get_first_rec (block=0x546f108dc6c0, page_no=14, offset=<optimized out>) at /data/Server/10.6-MDEV-33588/storage/innobase/trx/trx0undo.cc:136 #2 0x000055c52ca947a8 in purge_sys_t::choose_next_log (this=this@entry=0x55c52e064dc0 <purge_sys>) at /data/Server/10.6-MDEV-33588/storage/innobase/trx/trx0purge.cc:967 #3 0x000055c52ca9b63c in purge_sys_t::get_next_rec (this=this@entry=0x55c52e064dc0 <purge_sys>, roll_ptr=roll_ptr@entry=30962247439089936) at /data/Server/10.6-MDEV-33588/storage/innobase/trx/trx0purge.cc:1051 #4 0x000055c52ca94cf9 in purge_sys_t::fetch_next_rec (this=0x55c52e064dc0 <purge_sys>) at /data/Server/10.6-MDEV-33588/storage/innobase/trx/trx0purge.cc:1091 #5 trx_purge_attach_undo_recs (n_purge_threads=n_purge_threads@entry=4, thd=thd@entry=0x55c52ed28668) at /data/Server/10.6-MDEV-33588/storage/innobase/trx/trx0purge.cc:1325 #6 0x000055c52ca9554d in trx_purge (n_tasks=n_tasks@entry=4, history_size=62) at /data/Server/10.6-MDEV-33588/storage/innobase/trx/trx0purge.cc:1449 #7 0x000055c52ca83e11 in purge_coordinator_state::do_purge (this=this@entry=0x55c52e063f20 <purge_state>) at /data/Server/10.6-MDEV-33588/storage/innobase/srv/srv0srv.cc:1638 #8 0x000055c52ca82e73 in purge_coordinator_callback () at /data/Server/10.6-MDEV-33588/storage/innobase/srv/srv0srv.cc:1716

            The penultimate trx_undo_page_get_first_rec() call on page 13 returns nullptr, because the pointer at offset+TRX_UNDO_LOG_START is equal to the pointer at TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_FREE and the pointer at offset+TRX_UNDO_NEXT_LOG is 0. This logic seems to be the same in older versions; I checked MariaDB Server 10.4.

            The problem turns out to be isolated to the following part of purge_sys_t::choose_next_log():

                const trx_undo_rec_t *undo_rec=
                  trx_undo_page_get_first_rec(b, hdr_page_no, hdr_offset);
                if (!undo_rec)
                {
                  if (mach_read_from_2(b->page.frame + hdr_offset + TRX_UNDO_NEXT_LOG))
                    goto purge_nothing;
                  const uint32_t next=
                    mach_read_from_4(TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_NODE +
                                     FLST_NEXT + FIL_ADDR_PAGE + b->page.frame);
                  if (next == FIL_NULL)
                    goto purge_nothing;
                  id.set_page_no(next);
                  b= get_page(id);
                  if (!b)
                    goto purge_nothing;
                  undo_rec=
                    trx_undo_page_get_first_rec(b, page_no, hdr_offset);
            

            The last quoted line is passing the incorrect parameter purge_sys.page_no (14) instead of purge_sys.hdr_page_no, which would be 13. This regression was introduced in MDEV-32050. Before that change, there were 2 calls to trx_undo_page_get_first_rec() in the code base: One in trx_purge_read_undo_rec(), which was merged to the above code snippet, and another in trx_undo_truncate_start(), which would be invoked on the header page number in trx_purge_truncate_rseg_history(). The latter was renamed to purge_sys_t::iterator::free_history_rseg() in MDEV-33213.

            As far as I can tell, only the above code fails to pass the undo log header page number.

            I would have expected this to be caught earlier, for example, by MDEV-24402 CHECK TABLE…EXTENDED returning warnings about records that should have been purged. The reason could be that such warnings have been suppressed due to the open bug MDEV-29823 in this area.

            marko Marko Mäkelä added a comment - The penultimate trx_undo_page_get_first_rec() call on page 13 returns nullptr , because the pointer at offset+ TRX_UNDO_LOG_START is equal to the pointer at TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_FREE and the pointer at offset+ TRX_UNDO_NEXT_LOG is 0. This logic seems to be the same in older versions; I checked MariaDB Server 10.4. The problem turns out to be isolated to the following part of purge_sys_t::choose_next_log() : const trx_undo_rec_t *undo_rec= trx_undo_page_get_first_rec(b, hdr_page_no, hdr_offset); if (!undo_rec) { if (mach_read_from_2(b->page.frame + hdr_offset + TRX_UNDO_NEXT_LOG)) goto purge_nothing; const uint32_t next= mach_read_from_4(TRX_UNDO_PAGE_HDR + TRX_UNDO_PAGE_NODE + FLST_NEXT + FIL_ADDR_PAGE + b->page.frame); if (next == FIL_NULL) goto purge_nothing; id.set_page_no(next); b= get_page(id); if (!b) goto purge_nothing; undo_rec= trx_undo_page_get_first_rec(b, page_no, hdr_offset); The last quoted line is passing the incorrect parameter purge_sys.page_no (14) instead of purge_sys.hdr_page_no , which would be 13. This regression was introduced in MDEV-32050 . Before that change, there were 2 calls to trx_undo_page_get_first_rec() in the code base: One in trx_purge_read_undo_rec() , which was merged to the above code snippet, and another in trx_undo_truncate_start() , which would be invoked on the header page number in trx_purge_truncate_rseg_history() . The latter was renamed to purge_sys_t::iterator::free_history_rseg() in MDEV-33213 . As far as I can tell, only the above code fails to pass the undo log header page number. I would have expected this to be caught earlier, for example, by MDEV-24402 CHECK TABLE…EXTENDED returning warnings about records that should have been purged. The reason could be that such warnings have been suppressed due to the open bug MDEV-29823 in this area.

            For the record, here is the assertion failure that caught this bug:

            mariadbd: /data/Server/10.6-MDEV-33588/storage/innobase/include/mach0data.inl:539: ib_uint64_t mach_read_next_much_compressed(const byte**): Assertion `val == 0xFF' failed.
            …
            #7  0x000055c52ca902dd in mach_read_next_much_compressed (b=b@entry=0x7d4d2bc7da50) at /data/Server/10.6-MDEV-33588/storage/innobase/include/mach0data.inl:539
            #8  0x000055c52ca9509a in trx_undo_rec_get_table_id (rec=<optimized out>) at /data/Server/10.6-MDEV-33588/storage/innobase/include/trx0rec.h:298
            #9  trx_purge_attach_undo_recs (n_purge_threads=n_purge_threads@entry=4, thd=thd@entry=0x55c52ed28668) at /data/Server/10.6-MDEV-33588/storage/innobase/trx/trx0purge.cc:1335
            #10 0x000055c52ca9554d in trx_purge (n_tasks=n_tasks@entry=4, history_size=62) at /data/Server/10.6-MDEV-33588/storage/innobase/trx/trx0purge.cc:1449
            #11 0x000055c52ca83e11 in purge_coordinator_state::do_purge (this=this@entry=0x55c52e063f20 <purge_state>) at /data/Server/10.6-MDEV-33588/storage/innobase/srv/srv0srv.cc:1638
            #12 0x000055c52ca82e73 in purge_coordinator_callback () at /data/Server/10.6-MDEV-33588/storage/innobase/srv/srv0srv.cc:1716
            

            marko Marko Mäkelä added a comment - For the record, here is the assertion failure that caught this bug: mariadbd: /data/Server/10.6-MDEV-33588/storage/innobase/include/mach0data.inl:539: ib_uint64_t mach_read_next_much_compressed(const byte**): Assertion `val == 0xFF' failed. … #7 0x000055c52ca902dd in mach_read_next_much_compressed (b=b@entry=0x7d4d2bc7da50) at /data/Server/10.6-MDEV-33588/storage/innobase/include/mach0data.inl:539 #8 0x000055c52ca9509a in trx_undo_rec_get_table_id (rec=<optimized out>) at /data/Server/10.6-MDEV-33588/storage/innobase/include/trx0rec.h:298 #9 trx_purge_attach_undo_recs (n_purge_threads=n_purge_threads@entry=4, thd=thd@entry=0x55c52ed28668) at /data/Server/10.6-MDEV-33588/storage/innobase/trx/trx0purge.cc:1335 #10 0x000055c52ca9554d in trx_purge (n_tasks=n_tasks@entry=4, history_size=62) at /data/Server/10.6-MDEV-33588/storage/innobase/trx/trx0purge.cc:1449 #11 0x000055c52ca83e11 in purge_coordinator_state::do_purge (this=this@entry=0x55c52e063f20 <purge_state>) at /data/Server/10.6-MDEV-33588/storage/innobase/srv/srv0srv.cc:1638 #12 0x000055c52ca82e73 in purge_coordinator_callback () at /data/Server/10.6-MDEV-33588/storage/innobase/srv/srv0srv.cc:1716

            origin/st-10.6-MDEV-33588-MDEV-33819 ff6d6cca9542387ed1099006d93afbc7858b9fce 2024-04-05T14:19:45+03:00
            performed well in RQG testing.

            mleich Matthias Leich added a comment - origin/st-10.6- MDEV-33588 - MDEV-33819 ff6d6cca9542387ed1099006d93afbc7858b9fce 2024-04-05T14:19:45+03:00 performed well in RQG testing.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.