Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33535

Don't let Galera crash on failed DDL

    XMLWordPrintable

Details

    • Task
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • Galera
    • None

    Description

      When a DDL statement fails, the entire Galera cluster 'crashes' as all slaves terminate themselves. For those statements that fail due to system reasons, this is somewhat expected. However, DDL statements can also fail due to errors in userspace, such as running a faulty `ALTER TABLE`, in which case this is very impractical. Especially in multi-tenant environments.

      Although this is expected and known behaviour (reported in, for example, https://jira.mariadb.org/browse/MDEV-8323), and I reported this on the mailing list in 2021 (https://lists.launchpad.net/maria-discuss/msg06168.html), I'd be much in favour of preventing this behaviour (without jeopardising data integrity, obviously).

      Below is the transcript of a crash that just occurred.

      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [ERROR] Slave SQL: Column 3 of table 'redirectpizza_0478_prod.teams' cannot be converted from type 'varchar(272 octets)' to type 'timestamp', Internal MariaDB error code: 1677
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Warning] WSREP: Event 3 Update_rows_v1 apply failed: 3, seqno 887337789
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Member 0(http-msh02.efw.ha.cyberfusion.cloud) initiates vote on 5d32c403-53c3-11ec-aa31-227f73623d32:887337789,ea893734039cc445:
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Votes over 5d32c403-53c3-11ec-aa31-227f73623d32:887337789:
      Feb 25 14:20:28 http-msh02 mariadbd[1453]:    ea893734039cc445:   1/3
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: Waiting for more votes.
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Member 1(http-msh03.efw.ha.cyberfusion.cloud) initiates vote on 5d32c403-53c3-11ec-aa31-227f73623d32:887337789,ea893734039cc445:
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Votes over 5d32c403-53c3-11ec-aa31-227f73623d32:887337789:
      Feb 25 14:20:28 http-msh02 mariadbd[1453]:    ea893734039cc445:   2/3
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: Winner: ea893734039cc445
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [ERROR] WSREP: Failed to apply write set: gtid: 5d32c403-53c3-11ec-aa31-227f73623d32:887337789 server_id: a1dfc631-c460-11ee-83ba-87587c55364f client_id: 7762049 trx_id: 326470056 flags: 3 (start_transaction | commit)
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Closing send monitor...
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Closed send monitor.
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: gcomm: terminating thread
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: gcomm: joining thread
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: gcomm: closing backend
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: view(view_id(NON_PRIM,179859b4-8c4e,691) memb {
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: #011179859b4-8c4e,0
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: } joined {
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: } left {
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: } partitioned {
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: #0115cc1a215-ad56,0
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: #011a1dfc631-83ba,0
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: })
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: PC protocol downgrade 1 -> 0
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: view((empty))
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: gcomm: closed
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Flow-control interval: [16, 16]
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Received NON-PRIMARY.
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 887337790)
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: New SELF-LEAVE.
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Flow-control interval: [0, 0]
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 887337790)
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: RECV thread exiting 0: Success
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: recv_thread() joined.
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Closing replication queue.
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Closing slave action queue.
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: ================================================
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: View:
      Feb 25 14:20:28 http-msh02 mariadbd[1453]:   id: 5d32c403-53c3-11ec-aa31-227f73623d32:887337790
      Feb 25 14:20:28 http-msh02 mariadbd[1453]:   status: non-primary
      Feb 25 14:20:28 http-msh02 mariadbd[1453]:   protocol_version: 4
      Feb 25 14:20:28 http-msh02 mariadbd[1453]:   capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
      Feb 25 14:20:28 http-msh02 mariadbd[1453]:   final: no
      Feb 25 14:20:28 http-msh02 mariadbd[1453]:   own_index: 0
      Feb 25 14:20:28 http-msh02 mariadbd[1453]:   members(1):
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: #0110: 179859b4-baec-11ee-8c4e-8a01e9e422b8, http-msh02.efw.ha.cyberfusion.c
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: =================================================
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Non-primary view
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Server status change synced -> connected
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: ================================================
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: View:
      Feb 25 14:20:28 http-msh02 mariadbd[1453]:   id: 5d32c403-53c3-11ec-aa31-227f73623d32:887337790
      Feb 25 14:20:28 http-msh02 mariadbd[1453]:   status: non-primary
      Feb 25 14:20:28 http-msh02 mariadbd[1453]:   protocol_version: 4
      Feb 25 14:20:28 http-msh02 mariadbd[1453]:   capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
      Feb 25 14:20:28 http-msh02 mariadbd[1453]:   final: yes
      Feb 25 14:20:28 http-msh02 mariadbd[1453]:   own_index: -1
      Feb 25 14:20:28 http-msh02 mariadbd[1453]:   members(0):
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: =================================================
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Non-primary view
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Server status change connected -> disconnected
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 0 [Note] WSREP: Service thread queue flushed.
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: 5
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Note] WSREP: Applier thread exiting ret: 0 thd: 2
      Feb 25 14:20:28 http-msh02 mariadbd[1453]: 2024-02-25 14:20:28 2 [Warning] Aborted connection 2 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)
      

      ... and after manual shutdown (not relevant, but better to provide more information than too little):

      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] /usr/sbin/mariadbd (initiated by: unknown): Normal shutdown
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: Shutdown replication
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 1 [Note] WSREP: rollbacker thread exiting 1
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 1 [Warning] Aborted connection 1 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: dtor state: CLOSED
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: MemPool(TrxHandleSlave): hit ratio: 0.999248, misses: 440, in use: 0, in pool: 440
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: mon: entered 585064 oooe fraction 0 oool fraction 0
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: mon: entered 585064 oooe fraction 5.64041e-05 oool fraction 1.70921e-06
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: mon: entered 597396 oooe fraction 0 oool fraction 1.67393e-06
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: cert index usage at exit 0
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: cert trx map usage at exit 0
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: deps set usage at exit 0
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: avg deps dist 65.1557
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: avg cert interval 625897
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: cert index size 791
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: Service thread queue flushed.
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: wsdb trx map usage 0 conn query map usage 0
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: MemPool(LocalTrxHandle): hit ratio: 0.998554, misses: 2, in use: 0, in pool: 2
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: Flushing memory map to disk...
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] WSREP: Deinitializing allowlist service v1
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] InnoDB: FTS optimize thread exiting.
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] InnoDB: Starting shutdown...
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] InnoDB: Restricted to 64880 pages due to innodb_buf_pool_dump_pct=25
      Feb 25 14:28:57 http-msh02 mariadbd[1453]: 2024-02-25 14:28:57 0 [Note] InnoDB: Buffer pool(s) dump completed at 240225 14:28:57
      Feb 25 14:29:00 http-msh02 mariadbd[1453]: 2024-02-25 14:29:00 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
      Feb 25 14:29:00 http-msh02 mariadbd[1453]: 2024-02-25 14:29:00 0 [Note] InnoDB: Shutdown completed; log sequence number 1194448333678; transaction id 1784414859
      Feb 25 14:29:00 http-msh02 mariadbd[1453]: 2024-02-25 14:29:00 0 [Note] /usr/sbin/mariadbd: Shutdown complete
      Feb 25 14:29:01 http-msh02 systemd[1]: mariadb.service: Succeeded.
      Feb 25 14:29:01 http-msh02 systemd[1]: Stopped MariaDB 10.11.6 database server.
      Feb 25 14:29:01 http-msh02 systemd[1]: mariadb.service: Consumed 1h 45min 15.039s CPU time.
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            wedwards William Edwards
            Votes:
            2 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.