Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35227

Executing CHECK TABLE...EXTENDED right after server startup may attempt to access too old history

    XMLWordPrintable

Details

    Description

      While testing MDEV-34466, mleich produced an rr replay trace of a failure.

      ssh pluto
      rr replay /data/results/1728920463/MB-1782A/1_clone/rr/latest-trace
      

      A running server had been backed up and the backup restored. There is exactly one incomplete transaction that needs to be rolled back:

      2024-10-14 17:46:03 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 953 row operations to undo
      2024-10-14 17:46:03 0 [Note] InnoDB: Trx id counter is 885
      ...
      mariadbd: /data/Server/10.6-MDEV-34466/storage/innobase/trx/trx0rec.cc:510: const byte* trx_undo_rec_get_pars(const trx_undo_rec_t*, byte*, byte*, bool*, undo_no_t*, table_id_t*): Assertion `*table_id' failed.
      

      The assertion fails during the execution of CHECK TABLE...EXTENDED. The reason is that trx_undo_prev_version_build() keeps fetching previous record versions that are not safe to access. The newest version of that record belongs to the being-rolled-back recovered transaction 779 (0x30b). All older versions should not exist in the purge_sys.view or purge_sys.end_view.

      The reason for this failure is that purge_sys.end_view.m_low_limit_id had been reset from 885 to 0, because both purge_sys.head and purge_sys.tail are 0,0 at that point of time:

      #0  0x000055c1808858da in purge_sys_t::clone_oldest_view<true> (this=<optimized out>) at /data/Server/10.6-MDEV-34466/storage/innobase/include/trx0purge.h:427
      #1  trx_lists_init_at_db_start () at /data/Server/10.6-MDEV-34466/storage/innobase/trx/trx0trx.cc:794
      #2  0x000055c1808400a0 in srv_start (create_new_db=<optimized out>) at /data/Server/10.6-MDEV-34466/storage/innobase/srv/srv0start.cc:1514
      #3  0x000055c1805ed787 in innodb_init (p=<optimized out>) at /data/Server/10.6-MDEV-34466/storage/innobase/handler/ha_innodb.cc:4317
      

      During the execution, trx_purge() was invoked in the purge_coordinator_callback exactly once, and the execution is still in progress at the time the assertion fails. Also the rollback of the incomplete transaction 779 is in progress.

      I think that at the end of the purge batch, purge_sys.end_view should be adjusted to something better. Because the purge_sys.end_view is only used by CHECK TABLE...EXTENDED, this bug should be limited to executing that statement soon after starting the server (which is what our stress tests do).

      To fix this, I think that we must resolve the mystery of MDEV-22718.

      Attachments

        Issue Links

          Activity

            People

              vlad.lesin Vladislav Lesin
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.