Details
-
Task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
Description
When a DDL statement fails, the entire Galera cluster 'crashes' as all slaves terminate themselves. For those statements that fail due to system reasons, this is somewhat expected. However, DDL statements can also fail due to errors in userspace, such as running a faulty `ALTER TABLE`, in which case this is very impractical. Especially in multi-tenant environments.
Although this is expected and known behaviour (reported in, for example, https://jira.mariadb.org/browse/MDEV-8323), and I reported this on the mailing list in 2021 (https://lists.launchpad.net/maria-discuss/msg06168.html), I'd be much in favour of preventing this behaviour (without jeopardising data integrity, obviously).
Below is the transcript of a crash that just occurred.
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [ERROR] Slave SQL: Column 3 of table 'redirectpizza_0478_prod.teams' cannot be converted from type 'varchar(272 octets)' to type 'timestamp', Internal MariaDB error code: 1677
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Warning] WSREP: Event 3 Update_rows_v1 apply failed: 3, seqno 887337789
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Member 0(http-msh02.efw.ha.cyberfusion.cloud) initiates vote on 5d32c403-53c3-11ec-aa31-227f73623d32:887337789,ea893734039cc445:
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Votes over 5d32c403-53c3-11ec-aa31-227f73623d32:887337789:
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: ea893734039cc445: 1/3
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: Waiting for more votes.
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Member 1(http-msh03.efw.ha.cyberfusion.cloud) initiates vote on 5d32c403-53c3-11ec-aa31-227f73623d32:887337789,ea893734039cc445:
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Votes over 5d32c403-53c3-11ec-aa31-227f73623d32:887337789:
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: ea893734039cc445: 2/3
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: Winner: ea893734039cc445
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [ERROR] WSREP: Failed to apply write set: gtid: 5d32c403-53c3-11ec-aa31-227f73623d32:887337789 server_id: a1dfc631-c460-11ee-83ba-87587c55364f client_id: 7762049 trx_id: 326470056 flags: 3 (start_transaction | commit)
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Closing send monitor...
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Closed send monitor.
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: gcomm: terminating thread
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: gcomm: joining thread
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: gcomm: closing backend
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: view(view_id(NON_PRIM,179859b4-8c4e,691) memb {
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: #011179859b4-8c4e,0
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: } joined {
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: } left {
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: } partitioned {
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: #0115cc1a215-ad56,0
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: #011a1dfc631-83ba,0
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: })
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: PC protocol downgrade 1 -> 0
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: view((empty))
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: gcomm: closed
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Flow-control interval: [16, 16]
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Received NON-PRIMARY.
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 887337790)
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: New SELF-LEAVE.
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Flow-control interval: [0, 0]
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 887337790)
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: RECV thread exiting 0: Success
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: recv_thread() joined.
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Closing replication queue.
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Closing slave action queue.
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: ================================================
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: View:
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: id: 5d32c403-53c3-11ec-aa31-227f73623d32:887337790
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: status: non-primary
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: protocol_version: 4
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: final: no
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: own_index: 0
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: members(1):
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: #0110: 179859b4-baec-11ee-8c4e-8a01e9e422b8, http-msh02.efw.ha.cyberfusion.c
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: =================================================
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Non-primary view
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Server status change synced -> connected
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: ================================================
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: View:
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: id: 5d32c403-53c3-11ec-aa31-227f73623d32:887337790
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: status: non-primary
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: protocol_version: 4
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: final: yes
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: own_index: -1
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: members(0):
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: =================================================
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Non-primary view
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Server status change connected -> disconnected
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Service thread queue flushed.
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: 5
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Applier thread exiting ret: 0 thd: 2
|
Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Warning] Aborted connection 2 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)
|
... and after manual shutdown (not relevant, but better to provide more information than too little):
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] /usr/sbin/mariadbd (initiated by: unknown): Normal shutdown
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: Shutdown replication
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 1 [Note] WSREP: rollbacker thread exiting 1
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 1 [Warning] Aborted connection 1 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: dtor state: CLOSED
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: MemPool(TrxHandleSlave): hit ratio: 0.999248, misses: 440, in use: 0, in pool: 440
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: mon: entered 585064 oooe fraction 0 oool fraction 0
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: mon: entered 585064 oooe fraction 5.64041e-05 oool fraction 1.70921e-06
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: mon: entered 597396 oooe fraction 0 oool fraction 1.67393e-06
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: cert index usage at exit 0
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: cert trx map usage at exit 0
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: deps set usage at exit 0
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: avg deps dist 65.1557
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: avg cert interval 625897
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: cert index size 791
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: Service thread queue flushed.
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: wsdb trx map usage 0 conn query map usage 0
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: MemPool(LocalTrxHandle): hit ratio: 0.998554, misses: 2, in use: 0, in pool: 2
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: Flushing memory map to disk...
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: Deinitializing allowlist service v1
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] InnoDB: FTS optimize thread exiting.
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] InnoDB: Starting shutdown...
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] InnoDB: Restricted to 64880 pages due to innodb_buf_pool_dump_pct=25
|
Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] InnoDB: Buffer pool(s) dump completed at 240225 14:28:57
|
Feb 25 14:29:00 http-msh02 mariadbd[1453]: 2024-02-25 14:29:00 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
|
Feb 25 14:29:00 http-msh02 mariadbd[1453]: 2024-02-25 14:29:00 0 [Note] InnoDB: Shutdown completed; log sequence number 1194448333678; transaction id 1784414859
|
Feb 25 14:29:00 http-msh02 mariadbd[1453]: 2024-02-25 14:29:00 0 [Note] /usr/sbin/mariadbd: Shutdown complete
|
Feb 25 14:29:01 http-msh02 systemd[1]: mariadb.service: Succeeded.
|
Feb 25 14:29:01 http-msh02 systemd[1]: Stopped MariaDB 10.11.6 database server.
|
Feb 25 14:29:01 http-msh02 systemd[1]: mariadb.service: Consumed 1h 45min 15.039s CPU time.
|