[MDEV-6660] missing grastate.dat + broken innodb prevents node startup Created: 2014-08-29  Updated: 2022-02-11  Resolved: 2021-12-23

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 5.5.39-galera
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Kolbe Kegel (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Won't Fix Votes: 0
Labels: galera


 Description   

In a crash when grastate.dat is not written and the InnoDB tablespace is damaged in some way, starting the server using mysqld_safe will fail.

mysqld_safe starts the server with the --wsrep-recover option. In that case, when grastate.dat is missing, InnoDB is initialized. If InnoDB can't start, the only resolution is to manually empty the datadir to force an SST.

example:

140828 13:31:29 [Warning] No argument was provided to --log-bin and neither --log-basename or --log-bin-index where used;  This may cause repliction to break when this server acts as a master and has its hostname changed! Please use '--log-basename=db1' or '--log-bin=db1-recover-bin' to avoid this problem.
140828 13:31:29 InnoDB: The InnoDB memory heap is disabled
140828 13:31:29 InnoDB: Mutexes and rw_locks use GCC atomic builtins
140828 13:31:29 InnoDB: Compressed tables use zlib 1.2.3
140828 13:31:29 InnoDB: Using Linux native AIO
140828 13:31:29 InnoDB: Initializing buffer pool, size = 128.0M
140828 13:31:29 InnoDB: Completed initialization of buffer pool
140828 13:31:29 InnoDB: highest supported file format is Barracuda.
InnoDB: Transaction 149EE8 was in the XA prepared state.
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 0 row operations to undo
InnoDB: Trx id counter is 14A000
140828 13:31:29  InnoDB: Waiting for the background threads to start
InnoDB: Starting in background the rollback of uncommitted transactions
140828 13:31:29  InnoDB: Rollback of non-prepared transactions completed
140828 13:31:30 Percona XtraDB (http://www.percona.com) 5.5.38-MariaDB-35.2 started; log sequence number 640493839
140828 13:31:30 [Note] Plugin 'FEEDBACK' is disabled.
140828 13:31:30  InnoDB: Starting recovery for XA transactions...
140828 13:31:30  InnoDB: Transaction 149EE8 in prepared state after recovery
140828 13:31:30  InnoDB: Transaction contains changes to 1 rows
140828 13:31:30  InnoDB: 1 transactions in prepared state after recovery
140828 13:31:30 [Note] Found 1 prepared transaction(s) in InnoDB
140828 13:31:30 [ERROR] Found 1 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions.
140828 13:31:30 [ERROR] Aborting



 Comments   
Comment by Michaël de groot [ 2020-11-25 ]

I experience this same issue when starting an instance with a restored backup that was created on MariaDB 10.1.22 using mariabackup 10.1.35.

Comment by Michaël de groot [ 2021-03-24 ]

I experience this same issue when starting an instance that was backed up using mariadb-backup 10.3.27 on a 10.3.27 node and restored using mariadb-backup 10.3.28 on a mariadb 10.3.28 node. This issue is NOT galera related.

Comment by Michaël de groot [ 2021-08-20 ]

I experience the same issue when restoring a backup that was created on 10.3.28 and restored on 10.3.30. Next to that, when restoring a backup that was created on 10.3.28 and restored on 10.3.31.

It seams to me that this issue always occurs. Perhaps it is worth to mention that:

  • I use socat to stream the backup to the restore host (like galera's SST process, but as said, this is not related to Galera, the source machine does not run Galera)
  • I use --no-lock

The backup switches I use are:
mariabackup --innobackupex --no-lock --stream=xbstream --parallel=4 --user=root --port=3306 --host=localhost --socket=/var/lib/mysql/mysql.sock --password=password --skip-rocksdb-backup /tmp | socat TCP:target_machine:4444 -

The restore switches I use are:
cd /var/lib/mysql && mariabackup --prepare --use-memory=2000M --target-dir=/var/lib/mysql

Comment by Michaël de groot [ 2022-01-23 ]

Why is this issue closed "won't fix" without a comment? The issue is still relevant, I experience it with >75% of the instances that I initiate from a backup created with those switches.

Removing --no-lock (assuming that this causes it) is not the solution, as it would cause an outage on the source node.

Comment by Michaël de groot [ 2022-02-11 ]

I still experience this on the newest 10.3.

Generated at Thu Feb 08 07:13:39 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.