Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29321

Percona XtraDB 5.7 can't be upgrade to MariaDB 10.6 or above

Details

    Description

      If one first installs Percona XtraDB 5.7 and after that tries to upgrade to MariaDB 10.6 server SigSEGV with signal 11. Backtrace for the crash is:

      #0  trx_rseg_t::get (err=0x7ffeecd0122c, mtr=0x7ffeecd01840, this=0x5576091a1100 <trx_sys+16768>) at ./storage/innobase/trx/trx0rseg.cc:293
      #1  trx_rseg_update_wsrep_checkpoint (xid=<optimized out>, mtr=0x7ffeecd01840) at ./storage/innobase/trx/trx0rseg.cc:155
      #2  0x0000557608821946 in trx_rseg_array_init () at ./storage/innobase/trx/trx0rseg.cc:638
      #3  0x0000557608827d29 in trx_lists_init_at_db_start () at ./storage/innobase/trx/trx0trx.cc:718
      #4  0x00005576080f7cff in srv_start (create_new_db=<optimized out>) at ./storage/innobase/srv/srv0start.cc:1470
      #5  0x0000557608718611 in innodb_init (p=<optimized out>) at ./storage/innobase/handler/ha_innodb.cc:4317
      #6  0x000055760845a8b4 in ha_initialize_handlerton (plugin=0x55760b43cb20) at ./sql/handler.cc:659
      #7  0x000055760824712b in plugin_initialize (tmp_root=tmp_root@entry=0x7ffeecd072e0, plugin=plugin@entry=0x55760b43cb20, argc=argc@entry=0x55760922c7a0 <remaining_argc>, argv=argv@entry=0x55760b404930, 
          options_only=options_only@entry=false) at ./sql/sql_plugin.cc:1465
      #8  0x0000557608248423 in plugin_init (argc=argc@entry=0x55760922c7a0 <remaining_argc>, argv=<optimized out>, flags=0) at ./sql/sql_plugin.cc:1758
      #9  0x00005576081711cd in init_server_components () at ./sql/mysqld.cc:5091
      #10 0x0000557608176d27 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at ./sql/mysqld.cc:5708
      #11 0x00007fa7d6ddc09b in __libc_start_main (main=0x55760811dd90 <main(int, char**)>, argc=2, argv=0x7ffeecd09d48, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
          stack_end=0x7ffeecd09d38) at ../csu/libc-start.c:308
      #12 0x000055760816b21a in _start () at ./sql/mysqld.cc:4521
      

      This is probably something to WSRep (as in backtrace) but when runnign Percona XtraDB there is some additional WSRep plugin which is not available when running plain MariaDB.

      /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --wsrep-provider=/usr/lib/galera3/libgalera_smm.so --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1
      

      This bug has happen between current release and release before that. To test this it's easiest to use Podman or Docker Debian 10 (Buster) image and run script

      sed -i "s/101/0/g" -i /usr/sbin/policy-rc.d
      echo -e '#!/bin/sh\necho "N 5"' > /sbin/runlevel
      chmod +x /sbin/runlevel
      echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
      apt-get update
      apt-get install --no-install-recommends --yes gpg gpg-agent dirmngr ca-certificates
      apt-key adv --recv-keys --keyserver hkps://keyserver.ubuntu.com:443 9334A25F8507EFA5
      echo "deb https://repo.percona.com/apt/ buster main" > /etc/apt/sources.list.d/mysql.list
      apt-get update
      apt-get install -y percona-xtradb-cluster-full-57 percona-xtrabackup-24 percona-toolkit pmm2-client
       
      apt-get install apt-transport-https curl
      curl -o /etc/apt/trusted.gpg.d/mariadb_release_signing_key.asc 'https://mariadb.org/mariadb_release_signing_key.asc'
      sh -c "echo 'deb https://mirrors.xtom.ee/mariadb/repo/10.6/debian buster main' >>/etc/apt/sources.list"
       
      apt-get update
      apt-get install mariadb-server
      

      Server crashes event with `--help`

      root@0912b3e1389f:/build# mariadbd --user=root --help --verbose
      2022-08-18  7:48:44 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2022-08-18  7:48:44 0 [Note] InnoDB: Number of pools: 1
      2022-08-18  7:48:44 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2022-08-18  7:48:44 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts)
      2022-08-18  7:48:44 0 [Note] InnoDB: Initializing buffer pool, total size = 134217728, chunk size = 134217728
      2022-08-18  7:48:44 0 [Note] InnoDB: Completed initialization of buffer pool
      220818  7:48:44 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.6.10-MariaDB-1:10.6.10+maria~deb10
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=0
      max_threads=153
      thread_count=0
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467959 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x49000
      Printing to addr2line failed
      mariadbd(my_print_stacktrace+0x2e)[0x562d6424bc0e]
      mariadbd(handle_fatal_signal+0x485)[0x562d63d31215]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7fef84314730]
      mariadbd(+0xd724e3)[0x562d640f44e3]
      mariadbd(+0xd793c6)[0x562d640fb3c6]
      mariadbd(+0xd7f7a9)[0x562d641017a9]
      mariadbd(+0x64f95f)[0x562d639d195f]
      mariadbd(+0xc70371)[0x562d63ff2371]
      mariadbd(_Z24ha_initialize_handlertonP13st_plugin_int+0x84)[0x562d63d345b4]
      mariadbd(+0x79ee2b)[0x562d63b20e2b]
      mariadbd(_Z11plugin_initPiPPci+0xaac)[0x562d63b2218c]
      mariadbd(+0x6c8ecd)[0x562d63a4aecd]
      mariadbd(_Z11mysqld_mainiPPc+0x3f7)[0x562d63a50a27]
      /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fef83e5809b]
      mariadbd(_start+0x2a)[0x562d63a44f1a]
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      information that should help you find out what is causing the crash.
      Writing a core file...
      Working directory at /var/lib/mysql
      Resource Limits:
      Limit                     Soft Limit           Hard Limit           Units     
      Max cpu time              unlimited            unlimited            seconds   
      Max file size             unlimited            unlimited            bytes     
      Max data size             unlimited            unlimited            bytes     
      Max stack size            8388608              unlimited            bytes     
      Max core file size        unlimited            unlimited            bytes     
      Max resident set          unlimited            unlimited            bytes     
      Max processes             unlimited            unlimited            processes 
      Max open files            1048576              1048576              files     
      Max locked memory         65536                65536                bytes     
      Max address space         unlimited            unlimited            bytes     
      Max file locks            unlimited            unlimited            locks     
      Max pending signals       11539                11539                signals   
      Max msgqueue size         819200               819200               bytes     
      Max nice priority         0                    0                    
      Max realtime priority     0                    0                    
      Max realtime timeout      unlimited            unlimited            us        
      Core pattern: |/bin/false
       
      Kernel version: Linux version 5.14.21-150400.22-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.37.20211103-150100.7.29) #1 SMP PREEMPT_DYNAMIC Wed May 11 06:57:18 UTC 2022 (49db222)
       
      Segmentation fault (core dumped)
      

      Attachments

        Issue Links

          Activity

            The crash occurs in trx_rseg_format_upgrade(). I don’t think that this code has substantially changed since 10.5 (MDEV-12353), or 10.4 (MDEV-17138 introduced a MLOG_MEMSET record), or 10.3 (MDEV-12288).

            illuusio, what do you mean by "between current release and release before that"? What would happen with the recently released 10.6.9? Does the server start up if Galera is not being used? (Are you sure that you are using the correct version of Galera? Starting with MariaDB Server 10.4.2, libgalera4 is supposed to be used.

            marko Marko Mäkelä added a comment - The crash occurs in trx_rseg_format_upgrade() . I don’t think that this code has substantially changed since 10.5 ( MDEV-12353 ), or 10.4 ( MDEV-17138 introduced a MLOG_MEMSET record), or 10.3 ( MDEV-12288 ). illuusio , what do you mean by "between current release and release before that"? What would happen with the recently released 10.6.9? Does the server start up if Galera is not being used? (Are you sure that you are using the correct version of Galera? Starting with MariaDB Server 10.4.2 , libgalera4 is supposed to be used.

            This test if for Debian 10 (Buster) and Percona XtraDB 5.7 requires Galera 3.x and MariaDB 10.6 requires Galera 4.x (which is available on Backports).

            I'll do test build with https://salsa.debian.org/mariadb-team/mariadb-server/-/commit/f4d06db1293c3ec340ae2be6347971274ba38160 (Debian 10.6.8 commit) in Salsa-CI when it was working. It could be also something that Percona has made during this time.

            This run is 2 months ago https://salsa.debian.org/mariadb-team/mariadb-server/-/jobs/2798656 and it works as expected.

            illuusio Tuukka Pasanen added a comment - This test if for Debian 10 (Buster) and Percona XtraDB 5.7 requires Galera 3.x and MariaDB 10.6 requires Galera 4.x (which is available on Backports). I'll do test build with https://salsa.debian.org/mariadb-team/mariadb-server/-/commit/f4d06db1293c3ec340ae2be6347971274ba38160 (Debian 10.6.8 commit) in Salsa-CI when it was working. It could be also something that Percona has made during this time. This run is 2 months ago https://salsa.debian.org/mariadb-team/mariadb-server/-/jobs/2798656 and it works as expected.

            I do not see any message like InnoDB: Upgrading redo log:. There should have been one, unless the ib_logfile* were deleted before MariaDB was started.

            Please make sure that the log files are not deleted. Potentially, the XtraDB server hung on shutdown and was forcibly killed. If the log files are deleted (or innodb_force_recovery=6 is used), basically all bets are off.

            marko Marko Mäkelä added a comment - I do not see any message like InnoDB: Upgrading redo log: . There should have been one, unless the ib_logfile* were deleted before MariaDB was started. Please make sure that the log files are not deleted. Potentially, the XtraDB server hung on shutdown and was forcibly killed. If the log files are deleted (or innodb_force_recovery=6 is used), basically all bets are off.

            I attached mysql dir before MariaDB is upgraded and `mysql.log` fom Percona XtraDB launch

            illuusio Tuukka Pasanen added a comment - I attached mysql dir before MariaDB is upgraded and `mysql.log` fom Percona XtraDB launch

            I can repeat this on the varlibmysql-perconaxtradb57.tar.gz (which is actually in xz format, tar xJf). A message about the redo log upgrade was not written yet, but recv_log_recover_10_4() did recognize the logs as being logically empty (clean).

            This failure is related to an incompatible change that was made in MySQL 5.7 but not in MariaDB (see MDEV-12288).

            10.6 af552f2903b9764e3bc0634a0037c39d1a837f4f

            #2  trx_rseg_update_wsrep_checkpoint (xid=xid@entry=0x7ffd16bf1948, mtr=mtr@entry=0x7ffd16bf1490) at /mariadb/10.6/storage/innobase/trx/trx0rseg.cc:155
            155	      if (buf_block_t* block= trx_sys.rseg_array[rseg_id].get(mtr, &err))
            (rr) p rseg_id
            $2 = 1
            (rr) p trx_sys.rseg_array[1]
            $3 = {space = 0x0, latch = {pfs_psi = 0x0, lock = {writer = {lock = std::atomic<unsigned int> = { 0 }}, readers = std::atomic<unsigned int> = { 0 }}}, page_no = 0, history_size = 0, 
              ref = std::atomic<unsigned int> = { 0 }, curr_size = 0, undo_list = {count = 0, start = 0x0, end = 0x0, node = NULL, init = 0}, undo_cached = {count = 0, start = 0x0, end = 0x0, node = NULL, init = 0}, 
              last_page_no = 0, last_commit_and_offset = 0}
            

            We skipped this rollback segment in trx_rseg_array_init() because the page number was FIL_NULL.

            Some logic was changed in MDEV-13542. The fix is simple:

            diff --git a/storage/innobase/trx/trx0rseg.cc b/storage/innobase/trx/trx0rseg.cc
            index 34ec21d6013..760c4e707ce 100644
            --- a/storage/innobase/trx/trx0rseg.cc
            +++ b/storage/innobase/trx/trx0rseg.cc
            @@ -290,6 +290,11 @@ bool trx_rseg_read_wsrep_checkpoint(XID& xid)
             
             buf_block_t *trx_rseg_t::get(mtr_t *mtr, dberr_t *err) const
             {
            +  if (!space)
            +  {
            +    if (err) *err= DB_TABLESPACE_NOT_FOUND;
            +    return nullptr;
            +  }
               return buf_page_get_gen(page_id(), 0, RW_X_LATCH, nullptr,
                                       BUF_GET, mtr, err);
             }
            @@ -435,6 +440,8 @@ static dberr_t trx_undo_lists_init(trx_rseg_t *rseg, trx_id_t &max_trx_id,
             static dberr_t trx_rseg_mem_restore(trx_rseg_t *rseg, trx_id_t &max_trx_id,
                                                 mtr_t *mtr)
             {
            +  if (!rseg->space)
            +    return DB_TABLESPACE_NOT_FOUND;
               dberr_t err;
               const buf_block_t *rseg_hdr=
                 buf_page_get_gen(rseg->page_id(), 0, RW_S_LATCH, nullptr, BUF_GET, mtr,
            

            With that, the server starts up:

            10.6 af552f2903b9764e3bc0634a0037c39d1a837f4f with patch

            2022-08-18 15:58:01 0 [Note] InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
            2022-08-18 15:58:01 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
            2022-08-18 15:58:01 0 [Note] InnoDB: Using transactional memory
            2022-08-18 15:58:01 0 [Note] InnoDB: Number of pools: 1
            2022-08-18 15:58:01 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
            2022-08-18 15:58:01 0 [Note] InnoDB: Using liburing
            2022-08-18 15:58:01 0 [Note] InnoDB: Initializing buffer pool, total size = 3221225472, chunk size = 134217728
            2022-08-18 15:58:01 0 [Note] InnoDB: Completed initialization of buffer pool
            2022-08-18 15:58:01 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed
            2022-08-18 15:58:01 0 [Note] InnoDB: Upgrading redo log: 4294967296 bytes; LSN=2757622
            2022-08-18 15:58:01 0 [Note] InnoDB: Starting to delete and rewrite log file.
            

            marko Marko Mäkelä added a comment - I can repeat this on the varlibmysql-perconaxtradb57.tar.gz (which is actually in xz format, tar xJf ). A message about the redo log upgrade was not written yet, but recv_log_recover_10_4() did recognize the logs as being logically empty (clean). This failure is related to an incompatible change that was made in MySQL 5.7 but not in MariaDB (see MDEV-12288 ). 10.6 af552f2903b9764e3bc0634a0037c39d1a837f4f #2 trx_rseg_update_wsrep_checkpoint (xid=xid@entry=0x7ffd16bf1948, mtr=mtr@entry=0x7ffd16bf1490) at /mariadb/10.6/storage/innobase/trx/trx0rseg.cc:155 155 if (buf_block_t* block= trx_sys.rseg_array[rseg_id].get(mtr, &err)) (rr) p rseg_id $2 = 1 (rr) p trx_sys.rseg_array[1] $3 = {space = 0x0, latch = {pfs_psi = 0x0, lock = {writer = {lock = std::atomic<unsigned int> = { 0 }}, readers = std::atomic<unsigned int> = { 0 }}}, page_no = 0, history_size = 0, ref = std::atomic<unsigned int> = { 0 }, curr_size = 0, undo_list = {count = 0, start = 0x0, end = 0x0, node = NULL, init = 0}, undo_cached = {count = 0, start = 0x0, end = 0x0, node = NULL, init = 0}, last_page_no = 0, last_commit_and_offset = 0} We skipped this rollback segment in trx_rseg_array_init() because the page number was FIL_NULL . Some logic was changed in MDEV-13542 . The fix is simple: diff --git a/storage/innobase/trx/trx0rseg.cc b/storage/innobase/trx/trx0rseg.cc index 34ec21d6013..760c4e707ce 100644 --- a/storage/innobase/trx/trx0rseg.cc +++ b/storage/innobase/trx/trx0rseg.cc @@ -290,6 +290,11 @@ bool trx_rseg_read_wsrep_checkpoint(XID& xid) buf_block_t *trx_rseg_t::get(mtr_t *mtr, dberr_t *err) const { + if (!space) + { + if (err) *err= DB_TABLESPACE_NOT_FOUND; + return nullptr; + } return buf_page_get_gen(page_id(), 0, RW_X_LATCH, nullptr, BUF_GET, mtr, err); } @@ -435,6 +440,8 @@ static dberr_t trx_undo_lists_init(trx_rseg_t *rseg, trx_id_t &max_trx_id, static dberr_t trx_rseg_mem_restore(trx_rseg_t *rseg, trx_id_t &max_trx_id, mtr_t *mtr) { + if (!rseg->space) + return DB_TABLESPACE_NOT_FOUND; dberr_t err; const buf_block_t *rseg_hdr= buf_page_get_gen(rseg->page_id(), 0, RW_S_LATCH, nullptr, BUF_GET, mtr, With that, the server starts up: 10.6 af552f2903b9764e3bc0634a0037c39d1a837f4f with patch 2022-08-18 15:58:01 0 [Note] InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!! 2022-08-18 15:58:01 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 2022-08-18 15:58:01 0 [Note] InnoDB: Using transactional memory 2022-08-18 15:58:01 0 [Note] InnoDB: Number of pools: 1 2022-08-18 15:58:01 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions 2022-08-18 15:58:01 0 [Note] InnoDB: Using liburing 2022-08-18 15:58:01 0 [Note] InnoDB: Initializing buffer pool, total size = 3221225472, chunk size = 134217728 2022-08-18 15:58:01 0 [Note] InnoDB: Completed initialization of buffer pool 2022-08-18 15:58:01 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed 2022-08-18 15:58:01 0 [Note] InnoDB: Upgrading redo log: 4294967296 bytes; LSN=2757622 2022-08-18 15:58:01 0 [Note] InnoDB: Starting to delete and rewrite log file.
            illuusio Tuukka Pasanen added a comment - This is fixed now: https://salsa.debian.org/illuusio/mariadb-server/-/jobs/3122466

            People

              marko Marko Mäkelä
              illuusio Tuukka Pasanen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.