[MDEV-29321] Percona XtraDB 5.7 can't be upgrade to MariaDB 10.6 or above Created: 2022-08-18  Updated: 2022-09-05  Resolved: 2022-08-23

Status: Closed
Project: MariaDB Server
Component/s: Galera, Storage Engine - InnoDB
Affects Version/s: 10.6.8, 10.6
Fix Version/s: 10.6.10, 10.7.6, 10.8.5, 10.9.3, 10.10.2

Type: Bug Priority: Major
Reporter: Tuukka Pasanen Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: Buster, Debian, innodb

Attachments: File mysqld-perconaxtradb57.log.xz     File varlibmysql-perconaxtradb57.tar.gz    
Issue Links:
Problem/Incident
is caused by MDEV-13542 Crashing on a corrupted page is unhel... Closed

 Description   

If one first installs Percona XtraDB 5.7 and after that tries to upgrade to MariaDB 10.6 server SigSEGV with signal 11. Backtrace for the crash is:

#0  trx_rseg_t::get (err=0x7ffeecd0122c, mtr=0x7ffeecd01840, this=0x5576091a1100 <trx_sys+16768>) at ./storage/innobase/trx/trx0rseg.cc:293
#1  trx_rseg_update_wsrep_checkpoint (xid=<optimized out>, mtr=0x7ffeecd01840) at ./storage/innobase/trx/trx0rseg.cc:155
#2  0x0000557608821946 in trx_rseg_array_init () at ./storage/innobase/trx/trx0rseg.cc:638
#3  0x0000557608827d29 in trx_lists_init_at_db_start () at ./storage/innobase/trx/trx0trx.cc:718
#4  0x00005576080f7cff in srv_start (create_new_db=<optimized out>) at ./storage/innobase/srv/srv0start.cc:1470
#5  0x0000557608718611 in innodb_init (p=<optimized out>) at ./storage/innobase/handler/ha_innodb.cc:4317
#6  0x000055760845a8b4 in ha_initialize_handlerton (plugin=0x55760b43cb20) at ./sql/handler.cc:659
#7  0x000055760824712b in plugin_initialize (tmp_root=tmp_root@entry=0x7ffeecd072e0, plugin=plugin@entry=0x55760b43cb20, argc=argc@entry=0x55760922c7a0 <remaining_argc>, argv=argv@entry=0x55760b404930, 
    options_only=options_only@entry=false) at ./sql/sql_plugin.cc:1465
#8  0x0000557608248423 in plugin_init (argc=argc@entry=0x55760922c7a0 <remaining_argc>, argv=<optimized out>, flags=0) at ./sql/sql_plugin.cc:1758
#9  0x00005576081711cd in init_server_components () at ./sql/mysqld.cc:5091
#10 0x0000557608176d27 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at ./sql/mysqld.cc:5708
#11 0x00007fa7d6ddc09b in __libc_start_main (main=0x55760811dd90 <main(int, char**)>, argc=2, argv=0x7ffeecd09d48, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7ffeecd09d38) at ../csu/libc-start.c:308
#12 0x000055760816b21a in _start () at ./sql/mysqld.cc:4521

This is probably something to WSRep (as in backtrace) but when runnign Percona XtraDB there is some additional WSRep plugin which is not available when running plain MariaDB.

/usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --wsrep-provider=/usr/lib/galera3/libgalera_smm.so --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1

This bug has happen between current release and release before that. To test this it's easiest to use Podman or Docker Debian 10 (Buster) image and run script

sed -i "s/101/0/g" -i /usr/sbin/policy-rc.d
echo -e '#!/bin/sh\necho "N 5"' > /sbin/runlevel
chmod +x /sbin/runlevel
echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
apt-get update
apt-get install --no-install-recommends --yes gpg gpg-agent dirmngr ca-certificates
apt-key adv --recv-keys --keyserver hkps://keyserver.ubuntu.com:443 9334A25F8507EFA5
echo "deb https://repo.percona.com/apt/ buster main" > /etc/apt/sources.list.d/mysql.list
apt-get update
apt-get install -y percona-xtradb-cluster-full-57 percona-xtrabackup-24 percona-toolkit pmm2-client
 
apt-get install apt-transport-https curl
curl -o /etc/apt/trusted.gpg.d/mariadb_release_signing_key.asc 'https://mariadb.org/mariadb_release_signing_key.asc'
sh -c "echo 'deb https://mirrors.xtom.ee/mariadb/repo/10.6/debian buster main' >>/etc/apt/sources.list"
 
apt-get update
apt-get install mariadb-server

Server crashes event with `--help`

root@0912b3e1389f:/build# mariadbd --user=root --help --verbose
2022-08-18  7:48:44 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-08-18  7:48:44 0 [Note] InnoDB: Number of pools: 1
2022-08-18  7:48:44 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-08-18  7:48:44 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts)
2022-08-18  7:48:44 0 [Note] InnoDB: Initializing buffer pool, total size = 134217728, chunk size = 134217728
2022-08-18  7:48:44 0 [Note] InnoDB: Completed initialization of buffer pool
220818  7:48:44 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.6.10-MariaDB-1:10.6.10+maria~deb10
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=0
max_threads=153
thread_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467959 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
Printing to addr2line failed
mariadbd(my_print_stacktrace+0x2e)[0x562d6424bc0e]
mariadbd(handle_fatal_signal+0x485)[0x562d63d31215]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7fef84314730]
mariadbd(+0xd724e3)[0x562d640f44e3]
mariadbd(+0xd793c6)[0x562d640fb3c6]
mariadbd(+0xd7f7a9)[0x562d641017a9]
mariadbd(+0x64f95f)[0x562d639d195f]
mariadbd(+0xc70371)[0x562d63ff2371]
mariadbd(_Z24ha_initialize_handlertonP13st_plugin_int+0x84)[0x562d63d345b4]
mariadbd(+0x79ee2b)[0x562d63b20e2b]
mariadbd(_Z11plugin_initPiPPci+0xaac)[0x562d63b2218c]
mariadbd(+0x6c8ecd)[0x562d63a4aecd]
mariadbd(_Z11mysqld_mainiPPc+0x3f7)[0x562d63a50a27]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fef83e5809b]
mariadbd(_start+0x2a)[0x562d63a44f1a]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             unlimited            unlimited            processes 
Max open files            1048576              1048576              files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       11539                11539                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: |/bin/false
 
Kernel version: Linux version 5.14.21-150400.22-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.37.20211103-150100.7.29) #1 SMP PREEMPT_DYNAMIC Wed May 11 06:57:18 UTC 2022 (49db222)
 
Segmentation fault (core dumped)



 Comments   
Comment by Marko Mäkelä [ 2022-08-18 ]

The crash occurs in trx_rseg_format_upgrade(). I don’t think that this code has substantially changed since 10.5 (MDEV-12353), or 10.4 (MDEV-17138 introduced a MLOG_MEMSET record), or 10.3 (MDEV-12288).

illuusio, what do you mean by "between current release and release before that"? What would happen with the recently released 10.6.9? Does the server start up if Galera is not being used? (Are you sure that you are using the correct version of Galera? Starting with MariaDB Server 10.4.2, libgalera4 is supposed to be used.

Comment by Tuukka Pasanen [ 2022-08-18 ]

This test if for Debian 10 (Buster) and Percona XtraDB 5.7 requires Galera 3.x and MariaDB 10.6 requires Galera 4.x (which is available on Backports).

I'll do test build with https://salsa.debian.org/mariadb-team/mariadb-server/-/commit/f4d06db1293c3ec340ae2be6347971274ba38160 (Debian 10.6.8 commit) in Salsa-CI when it was working. It could be also something that Percona has made during this time.

This run is 2 months ago https://salsa.debian.org/mariadb-team/mariadb-server/-/jobs/2798656 and it works as expected.

Comment by Marko Mäkelä [ 2022-08-18 ]

I do not see any message like InnoDB: Upgrading redo log:. There should have been one, unless the ib_logfile* were deleted before MariaDB was started.

Please make sure that the log files are not deleted. Potentially, the XtraDB server hung on shutdown and was forcibly killed. If the log files are deleted (or innodb_force_recovery=6 is used), basically all bets are off.

Comment by Tuukka Pasanen [ 2022-08-18 ]

I attached mysql dir before MariaDB is upgraded and `mysql.log` fom Percona XtraDB launch

Comment by Marko Mäkelä [ 2022-08-18 ]

I can repeat this on the varlibmysql-perconaxtradb57.tar.gz (which is actually in xz format, tar xJf). A message about the redo log upgrade was not written yet, but recv_log_recover_10_4() did recognize the logs as being logically empty (clean).

This failure is related to an incompatible change that was made in MySQL 5.7 but not in MariaDB (see MDEV-12288).

10.6 af552f2903b9764e3bc0634a0037c39d1a837f4f

#2  trx_rseg_update_wsrep_checkpoint (xid=xid@entry=0x7ffd16bf1948, mtr=mtr@entry=0x7ffd16bf1490) at /mariadb/10.6/storage/innobase/trx/trx0rseg.cc:155
155	      if (buf_block_t* block= trx_sys.rseg_array[rseg_id].get(mtr, &err))
(rr) p rseg_id
$2 = 1
(rr) p trx_sys.rseg_array[1]
$3 = {space = 0x0, latch = {pfs_psi = 0x0, lock = {writer = {lock = std::atomic<unsigned int> = { 0 }}, readers = std::atomic<unsigned int> = { 0 }}}, page_no = 0, history_size = 0, 
  ref = std::atomic<unsigned int> = { 0 }, curr_size = 0, undo_list = {count = 0, start = 0x0, end = 0x0, node = NULL, init = 0}, undo_cached = {count = 0, start = 0x0, end = 0x0, node = NULL, init = 0}, 
  last_page_no = 0, last_commit_and_offset = 0}

We skipped this rollback segment in trx_rseg_array_init() because the page number was FIL_NULL.

Some logic was changed in MDEV-13542. The fix is simple:

diff --git a/storage/innobase/trx/trx0rseg.cc b/storage/innobase/trx/trx0rseg.cc
index 34ec21d6013..760c4e707ce 100644
--- a/storage/innobase/trx/trx0rseg.cc
+++ b/storage/innobase/trx/trx0rseg.cc
@@ -290,6 +290,11 @@ bool trx_rseg_read_wsrep_checkpoint(XID& xid)
 
 buf_block_t *trx_rseg_t::get(mtr_t *mtr, dberr_t *err) const
 {
+  if (!space)
+  {
+    if (err) *err= DB_TABLESPACE_NOT_FOUND;
+    return nullptr;
+  }
   return buf_page_get_gen(page_id(), 0, RW_X_LATCH, nullptr,
                           BUF_GET, mtr, err);
 }
@@ -435,6 +440,8 @@ static dberr_t trx_undo_lists_init(trx_rseg_t *rseg, trx_id_t &max_trx_id,
 static dberr_t trx_rseg_mem_restore(trx_rseg_t *rseg, trx_id_t &max_trx_id,
                                     mtr_t *mtr)
 {
+  if (!rseg->space)
+    return DB_TABLESPACE_NOT_FOUND;
   dberr_t err;
   const buf_block_t *rseg_hdr=
     buf_page_get_gen(rseg->page_id(), 0, RW_S_LATCH, nullptr, BUF_GET, mtr,

With that, the server starts up:

10.6 af552f2903b9764e3bc0634a0037c39d1a837f4f with patch

2022-08-18 15:58:01 0 [Note] InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
2022-08-18 15:58:01 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-08-18 15:58:01 0 [Note] InnoDB: Using transactional memory
2022-08-18 15:58:01 0 [Note] InnoDB: Number of pools: 1
2022-08-18 15:58:01 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-08-18 15:58:01 0 [Note] InnoDB: Using liburing
2022-08-18 15:58:01 0 [Note] InnoDB: Initializing buffer pool, total size = 3221225472, chunk size = 134217728
2022-08-18 15:58:01 0 [Note] InnoDB: Completed initialization of buffer pool
2022-08-18 15:58:01 0 [Note] InnoDB: Setting O_DIRECT on file ./ibdata1 failed
2022-08-18 15:58:01 0 [Note] InnoDB: Upgrading redo log: 4294967296 bytes; LSN=2757622
2022-08-18 15:58:01 0 [Note] InnoDB: Starting to delete and rewrite log file.

Comment by Tuukka Pasanen [ 2022-08-19 ]

This is fixed now: https://salsa.debian.org/illuusio/mariadb-server/-/jobs/3122466

Generated at Thu Feb 08 10:07:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.