Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24578

MariaDB 10.5 fails to join Galera cluster of MariaDB 10.1, root cause SST failure, symptom is InnoDB: Assertion failure

    XMLWordPrintable

Details

    Description

      (This was originally filed as https://jira.mariadb.org/browse/MDEV-24162, but mistakenly marked as duplicate of MDEV-23672, which it is not)

      While trying to join an existing Galera cluster of two MariaDB 10.1.48 nodes with a fresh install of MariaDB 10.5.8, the maria-backup part runs fine but once all files have been transferred and the mariadbd tries to start it fails on:

      tammi 12 20:20:51 mariadbd[63031]: WSREP_SST: [INFO] Evaluating /usr//bin/mariabackup  --move-back --no-version-check   --move-back --force-non-empty-directories --target-dir=${DATA} &> /var/lib/mysql//mariabackup.move.log (20210112 20:20:51.467)
      tammi 12 20:20:51 mariadbd[63031]: WSREP_SST: [INFO] Move successful, removing /var/lib/mysql//.sst (20210112 20:20:51.742)
      tammi 12 20:20:51 mariadbd[63031]: WSREP_SST: [INFO] Galera co-ords from recovery: 7f74fdea-cda3-11e9-ae2f-ab188b9a01f2:339777552 (20210112 20:20:51.752)
      tammi 12 20:20:51 mariadbd[63031]: WSREP_SST: [INFO] Total time on joiner: 0 seconds (20210112 20:20:51.756)
      tammi 12 20:20:51 mariadbd[63031]: WSREP_SST: [INFO] Removing the sst_in_progress file (20210112 20:20:51.759)
      tammi 12 20:20:51 mariadbd[63031]: 2021-01-12 20:20:51 3 [Note] WSREP: SST received
      tammi 12 20:20:51 mariadbd[63031]: 2021-01-12 20:20:51 3 [Note] WSREP: Server status change joiner -> initializing
      tammi 12 20:20:51 mariadbd[63031]: 2021-01-12 20:20:51 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      tammi 12 20:20:51 mariadbd[63031]: 2021-01-12 20:20:51 0 [Note] InnoDB: Uses event mutexes
      tammi 12 20:20:51 mariadbd[63031]: 2021-01-12 20:20:51 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      tammi 12 20:20:51 mariadbd[63031]: 2021-01-12 20:20:51 0 [Note] InnoDB: Number of pools: 1
      tammi 12 20:20:51 mariadbd[63031]: 2021-01-12 20:20:51 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      tammi 12 20:20:51 mariadbd[63031]: 2021-01-12 20:20:51 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts)
      tammi 12 20:20:51 mariadbd[63031]: 2021-01-12 20:20:51 0 [Note] InnoDB: Using Linux native AIO
      tammi 12 20:20:51 mariadbd[63031]: 2021-01-12 20:20:51 0 [Note] InnoDB: Initializing buffer pool, total size = 2147483648, chunk size = 134217728
      tammi 12 20:20:51 mariadbd[63031]: 2021-01-12 20:20:51 0 [Note] InnoDB: Completed initialization of buffer pool
      tammi 12 20:20:51 mariadbd[63031]: 2021-01-12 20:20:51 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
      tammi 12 20:20:51 mariadbd[63031]: 2021-01-12 20:20:51 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 100663296 bytes
      tammi 12 20:20:51 mariadbd[63031]: 2021-01-12 20:20:51 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
      tammi 12 20:20:51 mariadbd[63031]: 2021-01-12 20:20:51 0 [Note] InnoDB: New log file created, LSN=1586283788623
      tammi 12 20:20:52 mariadbd[63031]: 2021-01-12 20:20:52 0 [Note] InnoDB: 128 rollback segments are active.
      tammi 12 20:20:52 mariadbd[63031]: 2021-01-12 20:20:52 0 [Note] InnoDB: Creating sys_virtual system tables.
      tammi 12 20:20:52 mariadbd[63031]: 2021-01-12 20:20:52 0 [Note] InnoDB: Creating shared tablespace for temporary tables
      tammi 12 20:20:52 mariadbd[63031]: 2021-01-12 20:20:52 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
      tammi 12 20:20:52 mariadbd[63031]: 2021-01-12 20:20:52 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
      tammi 12 20:20:52 mariadbd[63031]: 2021-01-12 20:20:52 0 [Note] InnoDB: 10.5.9 started; log sequence number 0; transaction id 8247886884634323262
      tammi 12 20:20:52 mariadbd[63031]: 2021-01-12 20:20:52 0x7fd1737fe700  InnoDB: Assertion failure in file /build/mariadb-10.5-onlwrq/mariadb-10.5-10.5.9+maria~bionic~ubuntu18.04.1~1610445096.8de233af815.10.5/storage/innobase/include/fut0lst.h line 127
      tammi 12 20:20:52 mariadbd[63031]: InnoDB: Failing assertion: addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA
      tammi 12 20:20:52 mariadbd[63031]: InnoDB: We intentionally generate a memory trap.
      tammi 12 20:20:52 mariadbd[63031]: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      tammi 12 20:20:52 mariadbd[63031]: InnoDB: If you get repeated assertion failures or crashes, even
      tammi 12 20:20:52 mariadbd[63031]: InnoDB: immediately after the mysqld startup, there may be
      tammi 12 20:20:52 mariadbd[63031]: InnoDB: corruption in the InnoDB tablespace. Please refer to
      tammi 12 20:20:52 mariadbd[63031]: InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
      tammi 12 20:20:52 mariadbd[63031]: InnoDB: about forcing recovery.
      tammi 12 20:20:52 mariadbd[63031]: 210112 20:20:52 [ERROR] mysqld got signal 6 ;
      

      Because the daemon is managed by systemd, it automatically tries to restart. The second start fails on a slightly different error:

      tammi 12 20:21:03 sh[6248]: 2021-01-12 20:21:00 0 [ERROR] InnoDB: Page [page id: space=0, page number=975] log sequence number 1587610736226 is in the future! Current system log sequence number 1586283836448.
      tammi 12 20:21:03 sh[6248]: 2021-01-12 20:21:00 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
      tammi 12 20:21:03 sh[6248]: 2021-01-12 20:21:00 0 [ERROR] InnoDB: Page [page id: space=0, page number=977] log sequence number 1586526112239 is in the future! Current system log sequence number 1586283836448.
      tammi 12 20:21:03 sh[6248]: 2021-01-12 20:21:00 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
      tammi 12 20:21:03 sh[6248]: 2021-01-12 20:21:00 0x7f3eeaf0d800  InnoDB: Assertion failure in file /build/mariadb-10.5-onlwrq/mariadb-10.5-10.5.9+maria~bionic~ubuntu18.04.1~1610445096.8de233af815.10.5/storage/innobase/dict/dict0load.cc line 1816
      tammi 12 20:21:03 sh[6248]: InnoDB: Failing assertion: btr_pcur_is_on_user_rec(&pcur)
      

      Any further restarts repeat this latter error.

      Full journald output from both runs attached.

      I have also produced core dumps of both crashes and they are attached.

      In first process 63031 there is:

      #10 0x0000559d3cbdb3fe in ut_dbg_assertion_failed (expr=expr@entry=0x559d3d712a10 "addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA", file=file@entry=0x559d3d712980 "/build/mariadb-10.5-onlwrq/mariadb-10.5-10.5.9+maria~bionic~ubuntu18.04.1~1610445096.8de233af815.10.5/storage/innobase/include/fut0lst.h", line=line@entry=127) at ./storage/innobase/ut/ut0dbg.cc:60
      No locals.
      #11 0x0000559d3d2f7756 in flst_read_addr (faddr=<optimized out>) at ./storage/innobase/include/fut0lst.h:127
              addr = <optimized out>
              addr = <optimized out>
      #12 flst_get_prev_addr (node=<optimized out>) at ./storage/innobase/include/fut0lst.h:153
      No locals.
      #13 trx_purge_rseg_get_next_history_log (n_pages_handled=n_pages_handled@entry=0x7fd1737fdc78) at ./storage/innobase/trx/trx0purge.cc:856
              prev_log_addr = <optimized out>
              trx_no = <optimized out>
              mtr = {m_last = 0x0, m_last_offset = 0, m_log_mode = 0, m_modifications = 0, m_made_dirty = 0, m_inside_ibuf = 0, m_trim_pages = 0, m_memo = {m_heap = 0x0, m_list = {<ilist<mtr_buf_t::block_t, void>> = {sentinel_ = {next = 0x7fd1737fd278, prev = 0x7fd1737fd278}}, size_ = 1}, m_size = 16, m_first_block = {<ilist_node<void>> = {next = 0x7fd1737fd258, prev = 0x7fd1737fd258}, m_data = "\000l\000\004\322\177\000\000\001", '\000' <repeats 490 times>, m_used = 16}}, m_log = {m_heap = 0x0, m_list = {<ilist<mtr_buf_t::block_t, void>> = {sentinel_ = {next = 0x7fd1737fd4a8, prev = 0x7fd1737fd4a8}}, size_ = 1}, m_size = 0, m_first_block = {<ilist_node<void>> = {next = 0x7fd1737fd488, prev = 0x7fd1737fd488}, m_data = '\000' <repeats 496 times>..., m_used = 0}}, m_user_space = 0x0, m_commit_lsn = 0, m_freed_space = 0x0, m_freed_pages = 0x0}
              undo_page = <optimized out>
              log_hdr = <optimized out>
              empty = <optimized out>
              purge = <optimized out>
      

      The second process has:

      #6  0x000055a9f78373fe in ut_dbg_assertion_failed (expr=expr@entry=0x55a9f83a20e8 "btr_pcur_is_on_user_rec(&pcur)", file=file@entry=0x55a9f83a1848 "/build/mariadb-10.5-onlwrq/mariadb-10.5-10.5.9+maria~bionic~ubuntu18.04.1~1610445096.8de233af815.10.5/storage/innobase/dict/dict0load.cc", line=line@entry=1816) at ./storage/innobase/ut/ut0dbg.cc:60
      No locals.
      #7  0x000055a9f800d623 in dict_load_columns (heap=0x55a9fc21f518, table=0x55a9fc20f480) at ./storage/innobase/dict/dict0load.cc:1816
              err_msg = <optimized out>
              name = 0x0
              nth_v_col = 18446744073709551615
              sys_columns = <optimized out>
      

      The failure happened originally on 10.5.8. I created new builds based on latest 10.5 HEAD with debug symbols etc to ensure that this bug is in fact not fixed in git yet, but exists and if 10.5.9 was released today, it would still fail like this. The attached README.txt includes notes on how I created the debug build with symbols and installed it etc procedure.

      Attachments

        1. gdb-63031.log
          167 kB
          Otto Kekäläinen
        2. gdb-6520.log
          41 kB
          Otto Kekäläinen
        3. journald-mariadb.txt
          176 kB
          Otto Kekäläinen
        4. README.txt
          3 kB
          Otto Kekäläinen

        Issue Links

          Activity

            People

              jplindst Jan Lindström (Inactive)
              otto Otto Kekäläinen
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.