[MDEV-17947] mariabackup SST encounters InnoDB tablespace errors during prepare, ignores them, server crashes Created: 2018-12-09  Updated: 2020-08-25  Resolved: 2019-07-06

Status: Closed
Project: MariaDB Server
Component/s: Galera, Galera SST, mariabackup, Storage Engine - InnoDB, wsrep
Affects Version/s: 10.1.35
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Geoff Montee (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Won't Fix Votes: 1
Labels: galera, innodb, mariabackup, sst, wsrep

Issue Links:
Relates
relates to MDEV-13542 Crashing on a corrupted page is unhel... Closed
relates to MDEV-16791 mariabackup : allow consistent backup... Closed

 Description   

A user ran into an issue where during the prepare stage of a mariabackup SST, mariabackup showed errors like the following:

181207 07:48:32 innobackupex: Starting the apply-log operation
 
IMPORTANT: Please check that the apply-log run completes successfully.
           At the end of a successful apply-log run innobackupex
           prints "completed OK!".
 
--innobackupex based on MariaDB server 10.1.35-MariaDB Linux (x86_64) 
mariabackup: cd to /mariadb_data/data/.sst/
Loading encryption plugin
	 Encryption plugin parameter :  '--file_key_management_encryption_algorithm=aes_ctr'
	 Encryption plugin parameter :  '--file_key_management_filekey=FILE:/mariadb_data/conf/certs/.key'
	 Encryption plugin parameter :  '--file_key_management_filename=/mariadb_data/conf/certs/private.enc'
	 Encryption plugin parameter :  '--apply-log'
	 Encryption plugin parameter :  '/mariadb_data/data//.sst'
mariabackup: This target seems to be not prepared yet.
mariabackup: xtrabackup_logfile detected: size=1008009216, start_lsn=(22040803312182)
mariabackup: using the following InnoDB configuration for recovery:
mariabackup:   innodb_data_home_dir = ./
mariabackup:   innodb_data_file_path = ibdata1:10M:autoextend
mariabackup:   innodb_log_group_home_dir = ./
mariabackup:   innodb_log_files_in_group = 1
mariabackup:   innodb_log_file_size = 1008009216
mariabackup: using the following InnoDB configuration for recovery:
mariabackup:   innodb_data_home_dir = ./
mariabackup:   innodb_data_file_path = ibdata1:10M:autoextend
mariabackup:   innodb_log_group_home_dir = ./
mariabackup:   innodb_log_files_in_group = 1
mariabackup:   innodb_log_file_size = 1008009216
mariabackup: Starting InnoDB instance for recovery.
mariabackup: Using 104857600 bytes for buffer pool (set by --use-memory parameter)
2018-12-07  7:48:32 139988873791520 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2018-12-07  7:48:32 139988873791520 [Note] InnoDB: The InnoDB memory heap is disabled
2018-12-07  7:48:32 139988873791520 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-12-07  7:48:32 139988873791520 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2018-12-07  7:48:32 139988873791520 [Note] InnoDB: Compressed tables use zlib 1.2.3
2018-12-07  7:48:32 139988873791520 [Note] InnoDB: Using SSE crc32 instructions
2018-12-07  7:48:32 139988873791520 [Note] InnoDB: Initializing buffer pool, size = 100.0M
2018-12-07  7:48:32 139988873791520 [Note] InnoDB: Completed initialization of buffer pool
2018-12-07  7:48:32 139988873791520 [Note] InnoDB: Highest supported file format is Barracuda.
2018-12-07  7:48:32 139988873791520 [Note] InnoDB: Starting crash recovery from checkpoint LSN=22040803312182
2018-12-07  7:48:38 139988873791520 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer...
2018-12-07  7:48:39 139988873791520 [Note] InnoDB: Starting a batch to recover 4147 pages from redo log
2018-12-07  7:48:42 139988873791520 [Note] InnoDB: Starting a batch to recover 2722 pages from redo log
2018-12-07  7:48:44 139988873791520 [Note] InnoDB: Starting a batch to recover 1954 pages from redo log
2018-12-07  7:48:46 139988873791520 [Note] InnoDB: Starting a batch to recover 1783 pages from redo log
2018-12-07  7:48:47 139988661688064 [Note] InnoDB: To recover: 594 pages from log
2018-12-07  7:48:48 139988873791520 [Note] InnoDB: Starting a batch to recover 1738 pages from redo log
2018-12-07  7:48:50 139988873791520 [Note] InnoDB: Starting a batch to recover 1817 pages from redo log
2018-12-07  7:48:51 139988873791520 [Note] InnoDB: Starting a batch to recover 1814 pages from redo log
2018-12-07  7:48:53 139988873791520 [Note] InnoDB: Starting a batch to recover 2903 pages from redo log
2018-12-07  7:48:56 139988873791520 [Note] InnoDB: Starting a batch to recover 1831 pages from redo log
2018-12-07  7:48:57 139988873791520 [Note] InnoDB: Starting a batch to recover 1784 pages from redo log
2018-12-07  7:49:00 139988873791520 [Note] InnoDB: Starting a batch to recover 1811 pages from redo log
2018-12-07  7:49:01 139988873791520 [Note] InnoDB: Starting a batch to recover 1957 pages from redo log
2018-12-07  7:49:02 139988651198208 [Note] InnoDB: To recover: 1794 pages from log
2018-12-07  7:49:03 139988873791520 [Note] InnoDB: Starting a batch to recover 1809 pages from redo log
2018-12-07  7:49:07 139988873791520 [Note] InnoDB: Starting a batch to recover 5132 pages from redo log
2018-12-07  7:49:11 139988873791520 [Note] InnoDB: Starting a batch to recover 1372 pages from redo log
2018-12-07  7:49:14 139988873791520 [Note] InnoDB: Starting a batch to recover 1211 pages from redo log
2018-12-07  7:49:17 139988873791520 [Note] InnoDB: Read redo log up to LSN=22041569886720
2018-12-07  7:49:17 139988873791520 [Note] InnoDB: Starting a batch to recover 1343 pages from redo log
2018-12-07  7:49:19 139988873791520 [Note] InnoDB: Starting a batch to recover 6122 pages from redo log
2018-12-07  7:49:22 139988873791520 [Note] InnoDB: Starting a batch to recover 6245 pages from redo log
2018-12-07  7:49:25 139988873791520 [Note] InnoDB: Starting a batch to recover 4655 pages from redo log
2018-12-07  7:49:28 139988873791520 [Note] InnoDB: Starting a batch to recover 6059 pages from redo log
2018-12-07  7:49:32 139988661688064 [Note] InnoDB: To recover: 145 pages from log
InnoDB: Last MySQL binlog file position 0 27018, file name /mariadb_data/binlogs/mysql-bin.444201
2018-12-07 07:49:32 7f51b3184820  InnoDB: Error: table 'DB1/TAB_TEMP'
InnoDB: in InnoDB data dictionary has tablespace id 1355633,
InnoDB: but a tablespace with that id does not exist. There is
InnoDB: a tablespace of name DB1/TAB_TEMP and id 1354713, though. Have
InnoDB: you deleted or moved .ibd files?
InnoDB: Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/innodb-troubleshooting-datadict.html
InnoDB: for how to resolve the issue.

However, these errors were ignored, and the prepare completed successfully:

2018-12-07  7:49:33 139988873791520 [Note] InnoDB: 128 rollback segment(s) are active.
2018-12-07  7:49:33 139988873791520 [Note] InnoDB: Waiting for purge to start
2018-12-07  7:49:33 139988873791520 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.39-83.1 started; log sequence number 22041699296409
InnoDB: Last MySQL binlog file position 0 27018, file name /mariadb_data/binlogs/mysql-bin.444201
 
mariabackup: starting shutdown with innodb_fast_shutdown = 1
2018-12-07  7:49:33 139988406630144 [Note] InnoDB: Dumping buffer pool(s) not yet started
2018-12-07  7:49:33 139988475823872 [Note] InnoDB: FTS optimize thread exiting.
2018-12-07  7:49:33 139988873791520 [Note] InnoDB: Starting shutdown...
2018-12-07  7:49:34 139988873791520 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool
2018-12-07  7:49:35 139988873791520 [Warning] InnoDB: Some resources were not cleaned up in shutdown: threads 0, events 17, os_mutexes 16, os_fast_mutexes 33
2018-12-07  7:49:35 139988873791520 [Note] InnoDB: Shutdown completed; log sequence number 22041699296419
mariabackup: using the following InnoDB configuration for recovery:
mariabackup:   innodb_data_home_dir = ./
mariabackup:   innodb_data_file_path = ibdata1:10M:autoextend
mariabackup:   innodb_log_group_home_dir = ./
mariabackup:   innodb_log_files_in_group = 2
mariabackup:   innodb_log_file_size = 2147483648
2018-12-07  7:49:35 139988873791520 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2018-12-07  7:49:35 139988873791520 [Note] InnoDB: The InnoDB memory heap is disabled
2018-12-07  7:49:35 139988873791520 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-12-07  7:49:35 139988873791520 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2018-12-07  7:49:35 139988873791520 [Note] InnoDB: Compressed tables use zlib 1.2.3
2018-12-07  7:49:35 139988873791520 [Note] InnoDB: Using SSE crc32 instructions
2018-12-07  7:49:35 139988873791520 [Note] InnoDB: Initializing buffer pool, size = 100.0M
2018-12-07  7:49:35 139988873791520 [Note] InnoDB: Completed initialization of buffer pool
2018-12-07  7:49:35 139988873791520 [Note] InnoDB: Setting log file ./ib_logfile101 size to 2048 MB
2018-12-07  7:49:38 139988873791520 [Note] InnoDB: Setting log file ./ib_logfile1 size to 2048 MB
2018-12-07  7:49:40 139988873791520 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
2018-12-07  7:49:40 139988873791520 [Warning] InnoDB: New log files created, LSN=22041699296419
2018-12-07  7:49:40 139988873791520 [Note] InnoDB: Highest supported file format is Barracuda.
2018-12-07  7:49:41 139988873791520 [Note] InnoDB: 128 rollback segment(s) are active.
2018-12-07  7:49:41 139988873791520 [Note] InnoDB: Waiting for purge to start
2018-12-07  7:49:41 139988873791520 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.39-83.1 started; log sequence number 22041699296780
mariabackup: starting shutdown with innodb_fast_shutdown = 1
2018-12-07  7:49:41 139988406630144 [Note] InnoDB: Dumping buffer pool(s) not yet started
2018-12-07  7:49:41 139988475823872 [Note] InnoDB: FTS optimize thread exiting.
2018-12-07  7:49:41 139988873791520 [Note] InnoDB: Starting shutdown...
2018-12-07  7:49:42 139988873791520 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool
2018-12-07  7:49:43 139988873791520 [Warning] InnoDB: Some resources were not cleaned up in shutdown: threads 0, events 33, os_mutexes 32, os_fast_mutexes 65
2018-12-07  7:49:43 139988873791520 [Note] InnoDB: Shutdown completed; log sequence number 22041699296790
181207 07:49:43 completed OK!

The error logs showed that the SST completes successfully.

joiner:

2018-12-07  5:37:49 139957175679744 [Note] WSREP: State transfer required: 
	Group state: 718503a8-3c6d-11e6-843e-9e8fbaf5936d:935552482
	Local state: 00000000-0000-0000-0000-000000000000:-1
2018-12-07  5:37:49 139957175679744 [Note] WSREP: New cluster view: global state: 718503a8-3c6d-11e6-843e-9e8fbaf5936d:935552482, view# 143: Primary, number of nodes: 3, my index: 0, protocol version 3
2018-12-07  5:37:49 139957175679744 [Warning] WSREP: Gap in state sequence. Need state transfer.
2018-12-07  5:37:49 139955380680448 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '10.1.1.66' --datadir '/mariadb_data/data/'   --parent '20391' --binlog '/mariadb_data/binlogs/mysql-bin' '
WSREP_SST: [INFO] Streaming with xbstream (20181207 05:37:49.935)
WSREP_SST: [INFO] Using socat as streamer (20181207 05:37:49.939)
WSREP_SST: [INFO] Using openssl based encryption with socat: with key and crt (20181207 05:37:49.947)
WSREP_SST: [INFO] Decrypting with cert=/mariadb_data/conf/mariadbSST.pem, key=/mariadb_data/conf/mariadbSST.pem, cafile=/mariadb_data/source/dbautils/templates//etc/ca.pem (20181207 05:37:49.950)
WSREP_SST: [INFO] Evaluating timeout -s9 100 socat -u openssl-listen:4444,reuseaddr,cert=/mariadb_data/conf/mariadbSST.pem,key=/mariadb_data/conf/mariadbSST.pem,cafile=/mariadb_data/source/dbautils/templates//etc/ca.pem stdio | mbstream -x; RC=( ${PIPESTATUS[@]} ) (20181207 05:37:49.999)
2018-12-07  5:37:50 139957175679744 [Note] WSREP: Prepared SST request: mariabackup|10.1.1.66:4444/xtrabackup_sst//1
2018-12-07  5:37:50 139957175679744 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-12-07  5:37:50 139957175679744 [Note] WSREP: REPL Protocols: 8 (3, 2)
2018-12-07  5:37:50 139957175679744 [Note] WSREP: Assign initial position for certification: 935552482, protocol version: 3
2018-12-07  5:37:50 139955472008960 [Note] WSREP: Service thread queue flushed.
2018-12-07  5:37:50 139957175679744 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (718503a8-3c6d-11e6-843e-9e8fbaf5936d): 1 (Operation not permitted)
	 at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2018-12-07  5:37:50 139955409929984 [Note] WSREP: Member 0.0 (server1.intranet.domain.com) requested state transfer from 'server2.intranet.domain.com'. Selected 1.1 (server2.intranet.domain.com)(SYNCED) as donor.
2018-12-07  5:37:50 139955409929984 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 935552529)
2018-12-07  5:37:50 139957175679744 [Note] WSREP: Requesting state transfer: success, donor: 1
2018-12-07  5:37:50 139957175679744 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 718503a8-3c6d-11e6-843e-9e8fbaf5936d:935552482
WSREP_SST: [INFO] Proceeding with SST (20181207 05:37:51.487)
WSREP_SST: [INFO] Evaluating socat -u openssl-listen:4444,reuseaddr,cert=/mariadb_data/conf/mariadbSST.pem,key=/mariadb_data/conf/mariadbSST.pem,cafile=/mariadb_data/source/dbautils/templates//etc/ca.pem stdio | mbstream -x; RC=( ${PIPESTATUS[@]} ) (20181207 05:37:51.488)
WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20181207 05:37:51.491)
removed `/mariadb_data/data/aria_log.00000001'
removed `/mariadb_data/data/aria_log_control'
removed `/mariadb_data/data/ib_logfile1'
removed `/mariadb_data/data/ibdata1'
removed `/mariadb_data/data/ib_logfile0'
WSREP_SST: [INFO] Cleaning the binlog directory /mariadb_data/binlogs as well (20181207 05:37:52.255)
removed `/mariadb_data/binlogs/mysql-bin.444210'
removed `/mariadb_data/binlogs/mysql-bin.444208'
removed `/mariadb_data/binlogs/mysql-bin.444209'
WSREP_SST: [INFO] Waiting for SST streaming to complete! (20181207 05:37:52.263)
2018-12-07  5:37:52 139955422619392 [Note] WSREP: (3b523391, 'tcp://0.0.0.0:4567') connection to peer 3b523391 with addr tcp://10.1.1.66:4567 timed out, no messages seen in PT3S
2018-12-07  5:37:52 139955422619392 [Note] WSREP: (3b523391, 'tcp://0.0.0.0:4567') turning message relay requesting off
2018-12-07  7:48:32 139955409929984 [Note] WSREP: 1.1 (server2.intranet.domain.com): State transfer to 0.0 (server1.intranet.domain.com) complete.
2018-12-07  7:48:32 139955409929984 [Note] WSREP: Member 1.1 (server2.intranet.domain.com) synced with group.
WSREP_SST: [INFO] Preparing the backup at /mariadb_data/data//.sst (20181207 07:48:32.348)
WSREP_SST: [INFO] Evaluating mariabackup --innobackupex   --apply-log $rebuildcmd ${DATA} &>${DATA}/innobackup.prepare.log (20181207 07:48:32.355)
rm: cannot remove `/mariadb_data/data//innobackup.prepare.log': No such file or directory
rm: cannot remove `/mariadb_data/data//innobackup.move.log': No such file or directory
WSREP_SST: [INFO] Moving the backup to /mariadb_data/data/ (20181207 07:49:43.315)
WSREP_SST: [INFO] Evaluating mariabackup --innobackupex       --move-back --force-non-empty-directories ${DATA} &>${DATA}/innobackup.move.log (20181207 07:49:43.318)
WSREP_SST: [INFO] Move successful, removing /mariadb_data/data//.sst (20181207 07:49:43.715)
WSREP_SST: [INFO] Galera co-ords from recovery: 718503a8-3c6d-11e6-843e-9e8fbaf5936d:935794098 (20181207 07:49:43.899)
WSREP_SST: [INFO] Total time on joiner: 0 seconds (20181207 07:49:43.904)
WSREP_SST: [INFO] Removing the sst_in_progress file (20181207 07:49:43.907)
2018-12-07  7:49:43 139957175994400 [Note] WSREP: SST complete, seqno: 935794098
2018-12-07  7:49:43 139957175994400 [ERROR] Plugin 'pam' already installed
2018-12-07  7:49:43 139957175994400 [ERROR] Plugin 'gssapi' already installed
2018-12-07  7:49:44 139957175994400 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2018-12-07  7:49:44 139957175994400 [Note] InnoDB: The InnoDB memory heap is disabled
2018-12-07  7:49:44 139957175994400 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-12-07  7:49:44 139957175994400 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2018-12-07  7:49:44 139957175994400 [Note] InnoDB: Compressed tables use zlib 1.2.3
2018-12-07  7:49:44 139957175994400 [Note] InnoDB: Using Linux native AIO
2018-12-07  7:49:44 139957175994400 [Note] InnoDB: Using SSE crc32 instructions
2018-12-07  7:49:44 139957175994400 [Note] InnoDB: Initializing buffer pool, size = 30.0G
2018-12-07  7:49:44 139957175994400 [Note] InnoDB: Completed initialization of buffer pool
2018-12-07  7:49:45 139957175994400 [Note] InnoDB: Highest supported file format is Barracuda.
2018-12-07  7:49:45 139957175994400 [Note] InnoDB: 128 rollback segment(s) are active.
2018-12-07  7:49:45 139957175994400 [Note] InnoDB: Waiting for purge to start
2018-12-07  7:49:45 139957175994400 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.39-83.1 started; log sequence number 22041699296790
2018-12-07  7:49:45 139957175994400 [Note] Plugin 'FEEDBACK' is disabled.
2018-12-07  7:49:45 139957175994400 [Note] mysqld: GSSAPI plugin : using principal name 'mariadb/server1.intranet.domain.com@INTRANET.domain.com'
2018-12-07  7:49:45 139919859083008 [Note] InnoDB: Dumping buffer pool(s) not yet started
181207  7:49:45 server_audit: MariaDB Audit Plugin version 1.4.4 STARTED.
181207  7:49:45 server_audit: Query cache is enabled with the TABLE events. Some table reads can be veiled.181207  7:49:45 server_audit: logging started to the syslog.
2018-12-07  7:49:45 139957175994400 [Note] Recovering after a crash using /mariadb_data/binlogs/mysql-bin
2018-12-07  7:49:45 139957175994400 [Note] Starting crash recovery...
2018-12-07  7:49:45 139957175994400 [Note] Crash recovery finished.
2018-12-07  7:49:45 139957175994400 [Note] Server socket created on IP: '0.0.0.0'.
2018-12-07  7:49:45 139957175994400 [Warning] 'proxies_priv' entry '@% root@gbrpsr000002512' ignored in --skip-name-resolve mode.
2018-12-07  7:49:45 139957175994400 [Note] WSREP: Signalling provider to continue.
2018-12-07  7:49:45 139957175994400 [Note] WSREP: SST received: 718503a8-3c6d-11e6-843e-9e8fbaf5936d:935794098
2018-12-07  7:49:45 139957175994400 [Note] /mariadb_data/bin/mysqld: ready for connections.
Version: '10.1.35-MariaDB'  socket: '/mariadb_data/mysql.sock'  port: 3306  MariaDB Server
2018-12-07  7:49:45 139955409929984 [Note] WSREP: 0.0 (server1.intranet.domain.com): State transfer from 1.1 (server2.intranet.domain.com) complete.
2018-12-07  7:49:45 139955409929984 [Note] WSREP: Shifting JOINER -> JOINED (TO: 935794284)
2018-12-07  7:49:47 139955409929984 [Note] WSREP: Member 0.0 (server1.intranet.domain.com) synced with group.
2018-12-07  7:49:47 139955409929984 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 935794290)

donor:

2018-12-07  5:37:50 140539684869888 [Note] WSREP: Member 0.0 (server1.intranet.domain.com) requested state transfer from 'server2.intranet.domain.com'. Selected 1.1 (server2.intranet.domain.com)(SYNCED) as donor.
2018-12-07  5:37:50 140539684869888 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 935552529)
2018-12-07  5:37:50 140541449579264 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2018-12-07  5:37:50 140465202214656 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'donor' --address '10.1.1.66:4444/xtrabackup_sst//1' --socket '/mariadb_data/mysql.sock' --datadir '/mariadb_data/data/'    --binlog '/mariadb_data/binlogs/mysql-bin' --gtid '718503a8-3c6d-11e6-843e-9e8fbaf5936d:935552529' --gtid-domain-id '0''
2018-12-07  5:37:50 140541449579264 [Note] WSREP: sst_donor_thread signaled with 0
WSREP_SST: [INFO] Streaming with xbstream (20181207 05:37:51.276)
WSREP_SST: [INFO] Using socat as streamer (20181207 05:37:51.280)
WSREP_SST: [INFO] Using openssl based encryption with socat: with key and crt (20181207 05:37:51.312)
WSREP_SST: [INFO] Encrypting with cert=/mariadb_data/conf/mariadbSST.pem, key=/mariadb_data/conf/mariadbSST.pem, cafile=/mariadb_data/source/dbautils/templates//etc/ca.pem (20181207 05:37:51.315)
WSREP_SST: [INFO] Using /tmp/tmp.MEZv59YEA0 as innobackupex temporary directory (20181207 05:37:51.328)
WSREP_SST: [INFO] Streaming GTID file before SST (20181207 05:37:51.331)
WSREP_SST: [INFO] Evaluating mbstream -c ${INFO_FILE} | socat -u stdio openssl-connect:10.1.1.66:4444,cert=/mariadb_data/conf/mariadbSST.pem,key=/mariadb_data/conf/mariadbSST.pem,cafile=/mariadb_data/source/dbautils/templates//etc/ca.pem; RC=( ${PIPESTATUS[@]} ) (20181207 05:37:51.335)
WSREP_SST: [INFO] Sleeping before data transfer for SST (20181207 05:37:51.419)
2018-12-07  5:37:52 140539697555200 [Note] WSREP: (5b5327f9, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] Streaming the backup to joiner at 10.1.1.66 4444 (20181207 05:38:01.424)
WSREP_SST: [INFO] Evaluating mariabackup --innobackupex      $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | socat -u stdio openssl-connect:10.1.1.66:4444,cert=/mariadb_data/conf/mariadbSST.pem,key=/mariadb_data/conf/mariadbSST.pem,cafile=/mariadb_data/source/dbautils/templates//etc/ca.pem; RC=( ${PIPESTATUS[@]} ) (20181207 05:38:01.428)
2018-12-07  7:48:25 140541392681728 [Note] WSREP: Provider paused at 718503a8-3c6d-11e6-843e-9e8fbaf5936d:935794098 (7426503)
2018-12-07  7:48:32 140541392681728 [Note] WSREP: resuming provider at 7426503
2018-12-07  7:48:32 140541392681728 [Note] WSREP: Provider resumed.
2018-12-07  7:48:32 140539684869888 [Note] WSREP: 1.1 (server2.intranet.domain.com): State transfer to 0.0 (server1.intranet.domain.com) complete.
2018-12-07  7:48:32 140539684869888 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 935794116)
WSREP_SST: [INFO] Total time on donor: 0 seconds (20181207 07:48:32.266)
2018-12-07  7:48:32 140539684869888 [Note] WSREP: Member 1.1 (server2.intranet.domain.com) synced with group.
2018-12-07  7:48:32 140539684869888 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 935794117)

But the joiner eventually crashes when the table with no tablespace file is accessed:

2018-12-08  2:19:46 139957174704896 [ERROR] InnoDB: Trying to do i/o to a tablespace which exists without .ibd data file. i/o type 10, space id 1355633, page no 0, i/o length 16384 bytes
2018-12-08  2:19:46 139957174704896 [ERROR] InnoDB: Trying to access tablespace [space=1355633: page=0] but the tablespace does not exist or is just being dropped.
2018-12-08  2:19:46 139957174704896 [ERROR] InnoDB: tablespace id is 1355633 in the data dictionary but in file ./DB1/TAB_TEMP.ibd it is 1354713!
 
2018-12-08  2:19:46 139957174704896 [ERROR] InnoDB: Trying to do i/o to a tablespace which exists without .ibd data file. i/o type 10, space id 1355633, page no 0, i/o length 16384 bytes
2018-12-08  2:19:46 139957174704896 [ERROR] InnoDB: Trying to access tablespace [space=1355633: page=0] but the tablespace does not exist or is just being dropped.
2018-12-08  2:19:46 139957174704896 [ERROR] InnoDB: Unable to read tablespace 1355633 page no 0 into the buffer pool after 100 attempts. The most probable cause of this error may be that the table has been corrupted. You can try to fix this problem by using innodb_force_recovery. Please see http://dev.mysql.com/doc/refman/5.6/en/ for more details. Aborting...
2018-12-08 02:19:46 7f4a51ae9b00  InnoDB: Assertion failure in thread 139957174704896 in file ha_innodb.cc line 22061
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
181208  2:19:46 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.1.35-MariaDB
key_buffer_size=16777216
read_buffer_size=262144
max_used_connections=5
max_threads=2002
thread_count=4
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1595581 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f41af9ed008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f4a51ae9140 thread_stack 0x48400
mysys/stacktrace.c:268(my_print_stacktrace)[0x55a87a7089bb]
sql/signal_handler.cc:168(handle_fatal_signal)[0x55a87a264d85]
/lib64/libpthread.so.0(+0x36aca0f7e0)[0x7f4a518167e0]
/lib64/libc.so.6(gsignal+0x35)[0x7f4a50086495]
/lib64/libc.so.6(abort+0x175)[0x7f4a50087c75]
handler/ha_innodb.cc:22034(ib_logf(ib_log_level_t, char const*, ...))[0x55a87a48fe53]
buf/buf0buf.cc:3268(buf_page_get_gen(unsigned long, unsigned long, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*, dberr_t*))[0x55a87a5d90eb]
fil/fil0fil.cc:3921(fsp_flags_try_adjust(unsigned long, unsigned long))[0x55a87a6352b3]
fil/fil0fil.cc:5492(fil_space_for_table_exists_in_mem(unsigned long, char const*, bool, bool, bool, mem_block_info_t*, unsigned long, unsigned long))[0x55a87a635bc5]
dict/dict0load.cc:2409(dict_load_table(char const*, unsigned long, dict_err_ignore_t))[0x55a87a618fb4]
dict/dict0dict.cc:1155(dict_table_open_on_name(char const*, unsigned long, unsigned long, dict_err_ignore_t))[0x55a87a606e30]
handler/ha_innodb.cc:6364(ha_innobase::open(char const*, int, unsigned int))[0x55a87a494753]
sql/handler.cc:2533(handler::ha_open(TABLE*, char const*, int, unsigned int))[0x55a87a26928e]
sql/table.cc:2975(open_table_from_share(THD*, TABLE_SHARE*, char const*, unsigned int, unsigned int, unsigned int, TABLE*, bool))[0x55a87a190023]
sql/sql_base.cc:2565(open_table(THD*, TABLE_LIST*, Open_table_context*))[0x55a87a0a61dd]
sql/sql_base.cc:4091(open_and_process_table)[0x55a87a0aa732]
sql/sql_show.cc:1166(mysqld_show_create_get_fields(THD*, TABLE_LIST*, List<Item>*, String*))[0x55a87a14b762]
sql/sql_show.cc:1260(mysqld_show_create(THD*, TABLE_LIST*))[0x55a87a14c0a7]
sql/sql_parse.cc:3740(mysql_execute_command(THD*))[0x55a87a0ed485]
sql/sql_parse.cc:7463(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x55a87a0f22e5]
sql/sql_parse.cc:7268(wsrep_mysql_parse)[0x55a87a0f23f4]
sql/sql_parse.cc:1493(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x55a87a0f4807]
sql/sql_parse.cc:1126(do_command(THD*))[0x55a87a0f52b7]
sql/sql_connect.cc:1330(do_handle_one_connection(THD*))[0x55a87a1b699f]
sql/sql_connect.cc:1244(handle_one_connection)[0x55a87a1b6ad7]
/lib64/libpthread.so.0(+0x36aca07aa1)[0x7f4a5180eaa1]
/lib64/libc.so.6(clone+0x6d)[0x7f4a5013cbdd]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f41b68d0020): show create table `TAB_TEMP`
Connection ID (thread ID): 108197
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off

Shouldn't the wsrep_sst_mariabackup script have failed when it encountered the tablespace errors rather than allowing the node to start with a bad backup? That seems better than allowing the server to use an inconsistent backup.

We suspect that the tablespace errors occur when a table is truncated when the backup is being taken on the donor for the SST.



 Comments   
Comment by Geoff Montee (Inactive) [ 2018-12-10 ]

Comments from wlad,

Yes, there was probably some “recreate” of the tablespace in question while backup was running (DROP/CREATE), as indicated by changed tablespace id, and the backup has the copy with the “old” tablespace id.

It is 10.1, and the story of DDL running in parallel with backup is not good there, i.e there exist absolutely no protection against DDL.

For 10.2+ there was a lot of work, especially https://jira.mariadb.org/browse/MDEV-16791

Why does mariabackup 10.1 allow inconsistency wrt tablespace ids?
“mariabackup –backup” allows them because it is unaware that DDL was running in parallel, in 10.1
“mariabackup –prepare” allows it is because innodb recovery does so. Mariabackup “prepare” is mostly innodb startup and shutdown, and startup does recovery, and if this does not finish with errors, mariabackup assumes all was OK.

On the application side, if those tables can be created with CREATE TEMPORARY TABLE, they should be created so. Mariabackup skips temp. tables, and would not run into these problems.

Comment by Geoff Montee (Inactive) [ 2019-02-15 ]

Should we go ahead and close this one as "Won't Fix"? I know that the DDL handling in 10.2 is unlikely to be backported to 10.1.

Comment by Julien Fritsch [ 2019-07-05 ]

jplindst so, you mean that this is not a bug that can be fixed? Shall we decide quickly now, what to do with this one?

Comment by Geoff Montee (Inactive) [ 2019-07-06 ]

The proper fix for this would be to make Mariabackup handle DDL during backups better. This has already been implemented in MariaDB 10.2+, but the changes are too large to backport to MariaDB 10.1. Users should upgrade if they need to be able to execute DDL during backups.

Closing this as "Won't Fix".

Generated at Thu Feb 08 08:40:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.