Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Incomplete
-
10.6.4, 10.6
-
None
Description
We upgraded from 10.5.11 to 10.6.4 recently and after cluster run for few hours it crashed after a while and we are unable to bring it up again. It always fails with following
2021/08/11 13:29:20 Peer list updated |
was []
|
now [mysql-0.mysql.default.svc.cluster.local] |
2021/08/11 13:29:20 execing: /opt/galera/on-start.sh with stdin: mysql-0.mysql.default.svc.cluster.local |
2021/08/11 13:29:20 *** [Galera] Starting new cluster! |
2021/08/11 13:29:21 Peer finder exiting |
Galera - Determining recovery position...
|
galera-recovery.sh: Attempting to recover GTID positon...
|
2021-08-11 13:29:21 0 [Note] mysqld (server 10.6.4-MariaDB-1:10.6.4+maria~focal) starting as process 48 ... |
galera-recovery.sh: Failed to start mysqld for wsrep recovery: '2021-08-11 13:29:21 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 |
2021-08-11 13:29:21 0 [Note] InnoDB: Number of pools: 1 |
2021-08-11 13:29:21 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions |
2021-08-11 13:29:21 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts) |
2021-08-11 13:29:21 0 [Note] InnoDB: Using Linux native AIO |
2021-08-11 13:29:21 0 [Note] InnoDB: Initializing buffer pool, total size = 8858370048, chunk size = 134217728 |
2021-08-11 13:29:21 0 [Note] InnoDB: Completed initialization of buffer pool |
2021-08-11 13:29:21 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1533931390429,1533931390429 |
2021-08-11 13:29:22 0 [Note] InnoDB: Starting final batch to recover 4330 pages from redo log. |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Not applying DELETE_ROW_FORMAT_DYNAMIC due to corruption on [page id: space=14079, page number=78] |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption. |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Not applying INSERT_REUSE_DYNAMIC due to corruption on [page id: space=14079, page number=78] |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption. |
2021-08-11 13:29:22 0 [Warning] InnoDB: The last skipped log record LSN 1533931681138 is not equal to page LSN 1533931712960 |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Not applying DELETE_ROW_FORMAT_DYNAMIC due to corruption on [page id: space=14079, page number=79] |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption. |
2021-08-11 13:29:22 0 [Warning] InnoDB: The last skipped log record LSN 1533931662627 is not equal to page LSN 1533931682005 |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Not applying INSERT_HEAP_DYNAMIC due to corruption on [page id: space=14079, page number=79] |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption. |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Not applying DELETE_ROW_FORMAT_DYNAMIC due to corruption on [page id: space=14079, page number=82] |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption. |
2021-08-11 13:29:22 0 [Warning] InnoDB: The last skipped log record LSN 1533931658204 is not equal to page LSN 1533931773729 |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Not applying INSERT_REUSE_DYNAMIC due to corruption on [page id: space=14079, page number=82] |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption. |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Not applying DELETE_ROW_FORMAT_DYNAMIC due to corruption on [page id: space=14079, page number=94] |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption. |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Not applying INSERT_HEAP_DYNAMIC due to corruption on [page id: space=14079, page number=94] |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption. |
2021-08-11 13:29:22 0 [Warning] InnoDB: The last skipped log record LSN 1533931662825 is not equal to page LSN 1533931799000 |
2021-08-11 13:29:22 0 [Warning] InnoDB: The last skipped log record LSN 1533931640049 is not equal to page LSN 1533931730786 |
2021-08-11 13:29:22 0 [Warning] InnoDB: The last skipped log record LSN 1533931565000 is not equal to page LSN 1533931729834 |
2021-08-11 13:29:22 0 [Warning] InnoDB: The last skipped log record LSN 1533931482936 is not equal to page LSN 1533931483333 |
2021-08-11 13:29:22 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption |
2021-08-11 13:29:22 0 [Note] InnoDB: Starting shutdown... |
2021-08-11 13:29:23 0 [ERROR] Plugin 'InnoDB' init function returned error. |
2021-08-11 13:29:23 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. |
2021-08-11 13:29:23 0 [Note] Plugin 'FEEDBACK' is disabled. |
2021-08-11 13:29:23 0 [Note] Plugin 'unix_socket' is disabled. |
2021-08-11 13:29:23 0 [ERROR] Unknown/unsupported storage engine: InnoDB |
2021-08-11 13:29:23 0 [ERROR] Aborting' |
Galera recovery position:
|
2021-08-11 13:29:23 0 [Note] mysqld (server 10.6.4-MariaDB-1:10.6.4+maria~focal) starting as process 1 ... |
2021-08-11 13:29:23 0 [Note] WSREP: Loading provider /usr/lib/galera/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1 |
2021-08-11 13:29:23 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so' |
2021-08-11 13:29:23 0 [Note] WSREP: wsrep_load(): Galera 26.4.9(r819f29cb) by Codership Oy <info@codership.com> loaded successfully. |
2021-08-11 13:29:23 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration. |
2021-08-11 13:29:23 0 [Note] WSREP: Found saved state: a5925485-fa9e-11eb-b6f8-3ec6336b487c:-1, safe_to_bootstrap: 0 |
2021-08-11 13:29:23 0 [Note] WSREP: GCache DEBUG: opened preamble: |
Version: 2 |
UUID: a5925485-fa9e-11eb-b6f8-3ec6336b487c
|
Seqno: 1 - 50408 |
Offset: 1280 |
Synced: 1 |
2021-08-11 13:29:23 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: a5925485-fa9e-11eb-b6f8-3ec6336b487c, offset: 1280 |
2021-08-11 13:29:23 0 [Note] WSREP: GCache::RingBuffer initial scan... 0.0% ( 0/134217752 bytes) complete. |
2021-08-11 13:29:23 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete. |
2021-08-11 13:29:23 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 1-50408 |
2021-08-11 13:29:23 0 [Note] WSREP: GCache::RingBuffer unused buffers scan... 0.0% ( 0/104510296 bytes) complete. |
2021-08-11 13:29:23 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (104510296/104510296 bytes) complete. |
2021-08-11 13:29:23 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): found 13/50421 locked buffers |
2021-08-11 13:29:23 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): free space: 29718624/134217728 |
2021-08-11 13:29:23 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = mysql-0.mysql.default.svc.cluster.local; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_tim |
2021-08-11 13:29:23 0 [Note] WSREP: Start replication |
2021-08-11 13:29:23 0 [Note] WSREP: Connecting with bootstrap option: 1 |
2021-08-11 13:29:23 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1 |
2021-08-11 13:29:23 0 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 . |
2021-08-11 13:29:23 0 [ERROR] WSREP: wsrep::connect(gcomm://) failed: 7 |
2021-08-11 13:29:23 0 [ERROR] Aborting |
is there some workaround how to force this node to go up?
Before it got to this state it was throwing these errors on all 3 mariadb nodes
2021/08/11 11:54:49 Peer list updated |
was []
|
now [mysql-0.mysql.default.svc.cluster.local mysql-1.mysql.default.svc.cluster.local mysql-2.mysql.default.svc.cluster.local] |
2021/08/11 11:54:49 execing: /opt/galera/on-start.sh with stdin: mysql-0.mysql.default.svc.cluster.local |
mysql-1.mysql.default.svc.cluster.local |
mysql-2.mysql.default.svc.cluster.local |
2021/08/11 11:54:49 *** [Galera] Joining cluster: mysql-0.mysql.default.svc.cluster.local,mysql-1.mysql.default.svc.cluster.local |
2021/08/11 11:54:50 Peer finder exiting |
Galera - Determining recovery position...
|
galera-recovery.sh: Attempting to recover GTID positon...
|
2021-08-11 11:54:50 0 [Note] mysqld (server 10.6.4-MariaDB-1:10.6.4+maria~focal) starting as process 48 ... |
galera-recovery.sh: Failed to start mysqld for wsrep recovery: '2021-08-11 11:54:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.11 |
2021-08-11 11:54:50 0 [Note] InnoDB: Number of pools: 1 |
2021-08-11 11:54:50 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions |
2021-08-11 11:54:50 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts) |
2021-08-11 11:54:50 0 [Note] InnoDB: Using Linux native AIO |
2021-08-11 11:54:50 0 [Note] InnoDB: Initializing buffer pool, total size = 8858370048, chunk size = 134217728 |
2021-08-11 11:54:50 0 [Note] InnoDB: Completed initialization of buffer pool |
2021-08-11 11:54:50 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1533806354857,1533806354857 |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Page [page id: space=63, page number=6] log sequence number 1533966178233 is in the future! Current system log sequence number 1533923300737. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Refusing to load './storage/active_directory_job.ibd' (id=14344, flags=0x15); dictionary contains id=14343, flags=0x15 |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Operating system error number 2 in a file operation. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: The error means the system cannot find the path specified. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Could not find a valid tablespace file for storage/active_directory_job. Please refer to https://mariadb.com/kb/en/innodb-data-dictionary-troubleshooting/ for how to resolve the issue. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Refusing to load './storage/snapshot_restore_files_job.ibd' (id=14381, flags=0x15); dictionary contains id=14380, flags=0x15 |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Operating system error number 2 in a file operation. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: The error means the system cannot find the path specified. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Could not find a valid tablespace file for storage/snapshot_restore_files_job. Please refer to https://mariadb.com/kb/en/innodb-data-dictionary-troubleshooting/ for how to resolve the issue. |
2021-08-11 11:54:51 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 1 row operations to undo |
2021-08-11 11:54:51 0 [Note] InnoDB: Trx id counter is 367821890 |
2021-08-11 11:54:51 0 [Note] InnoDB: Starting final batch to recover 23492 pages from redo log. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Page [page id: space=44, page number=1] log sequence number 1533966146666 is in the future! Current system log sequence number 1533923300737. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Page [page id: space=44, page number=0] log sequence number 1533965880965 is in the future! Current system log sequence number 1533923300737. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Page [page id: space=44, page number=3] log sequence number 1533966175785 is in the future! Current system log sequence number 1533923300737. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Page [page id: space=44, page number=31] log sequence number 1533965763157 is in the future! Current system log sequence number 1533923300737. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Page [page id: space=44, page number=2] log sequence number 1533966173074 is in the future! Current system log sequence number 1533923300737. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Page [page id: space=44, page number=32] log sequence number 1533939541945 is in the future! Current system log sequence number 1533923300737. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Page [page id: space=44, page number=30] log sequence number 1533953268642 is in the future! Current system log sequence number 1533923300737. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Page [page id: space=44, page number=50] log sequence number 1533965931411 is in the future! Current system log sequence number 1533923300737. |
2021-08-11 11:54:51 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. |
The messages log sequence number … is in the future are already a sign of serious corruption. How exactly were the replicas set up? Was any kind of Galera snapshot transfer used?