Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34606

Node never joins back - WSREP: gcs_caused() returned -103 (Software caused connection abort) after WSREP: MDL BF-BF conflict

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.6.15
    • None
    • Galera
    • None
    • DISTRIB_ID=Ubuntu
      DISTRIB_RELEASE=18.04
      DISTRIB_CODENAME=bionic
      DISTRIB_DESCRIPTION="Ubuntu 18.04.6 LTS"

    Description

      We observed following warning message WSREP: gcs_caused() returned -103 (Software caused connection abort) on one of the nodes in 3 node galera cluster, after which that node wasnt able to join back. We had to restart it manually to add it back to the cluster.

      Logs-

      Node - 1 (flx11)

      2024-07-16  6:55:13 14 [Note] WSREP: MDL conflict db=db_01 table=GD_Change ticket=3 solved by abort
      2024-07-16  6:55:14 2 [Note] WSREP: MDL BF-BF conflict
      schema:  db_01
      request: (2 	seqno 15519555 	wsrep (high priority, exec, executing) cmd 0 161 	UPDATE `GD_Change` SET updated_at='2024-07-16 06:55:14.764867', gd_change_id=20188 WHERE `GD_Change`.id = 1R?f)
      granted: (9 	seqno 15519554 	wsrep (toi, exec, committed) cmd 0 9 	DROP TABLE `GD_Config`)
      2024-07-16  6:55:14 2 [ERROR] Aborting
      2024-07-16  6:55:58 116470 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort)
      2024-07-16  6:56:01 116472 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort)
      2024-07-16  6:56:04 116474 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort)
      2024-07-16  6:56:17 116479 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort)
      2024-07-16  6:56:19 116480 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort)
      

      Node - 2 (flx01)

      2024-07-16  6:55:14 127263 [Note] WSREP: MDL conflict db=db_01 table=GD_Config ticket=3 solved by abort
      2024-07-16  6:55:14 127263 [Note] WSREP: MDL conflict db=db_01 table=GD_Config ticket=3 solved by abort
      2024-07-16  6:55:14 0 [Note] WSREP: declaring ad6be48b-9120 at ssl://172.27.97.135:4567 stable
      2024-07-16  6:55:14 0 [Note] WSREP: forgetting 8fbd9334-8c5b (ssl://172.27.164.171:4567)
      2024-07-16  6:55:14 0 [Note] WSREP: Node 42b71455-b540 state prim
      2024-07-16  6:55:14 0 [Note] WSREP: view(view_id(PRIM,42b71455-b540,1488) memb {
      	42b71455-b540,0
      	ad6be48b-9120,0
      } joined {
      } left {
      } partitioned {
      	8fbd9334-8c5b,0
      })
      2024-07-16  6:55:14 0 [Note] WSREP: save pc into disk
      2024-07-16  6:55:14 0 [Note] WSREP: forgetting 8fbd9334-8c5b (ssl://172.27.164.171:4567)
      2024-07-16  6:55:14 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
      2024-07-16  6:55:14 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 5aa71c0c-4340-11ef-ad1c-9ed910f5095e
      2024-07-16  6:55:14 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 5aa71c0c-4340-11ef-ad1c-9ed910f5095e
      2024-07-16  6:55:14 0 [Note] WSREP: STATE EXCHANGE: got state msg: 5aa71c0c-4340-11ef-ad1c-9ed910f5095e from 0 (flx01)
      2024-07-16  6:55:14 0 [Note] WSREP: STATE EXCHANGE: got state msg: 5aa71c0c-4340-11ef-ad1c-9ed910f5095e from 1 (garb)
      2024-07-16  6:55:14 0 [Note] WSREP: 'garb' demoted SYNCED->PRIMARY due to gap in history: 15398793 - 15519556
      2024-07-16  6:55:14 0 [Note] WSREP: Quorum results:
      	version    = 6,
      	component  = PRIMARY,
      	conf_id    = 18,
      	members    = 1/2 (joined/total),
      	act_id     = 15519556,
      	last_appl. = 15519482,
      	protocols  = 2/10/4 (gcs/repl/appl),
      	vote policy= 0,
      	group UUID = 401d046b-ebcd-11ec-9284-2e763fda7f1a
      2024-07-16  6:55:14 0 [Note] WSREP: Flow-control interval: [424, 424]
      2024-07-16  6:55:14 16 [Note] WSREP: ####### processing CC 15519557, local, ordered
      2024-07-16  6:55:14 0 [Note] WSREP: Member 1.0 (garb) requested state transfer from '*any*'. Selected 0.0 (flx01)(SYNCED) as donor.
      2024-07-16  6:55:14 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 15519557)
      2024-07-16  6:55:14 0 [Note] WSREP: 1.0 (garb): State transfer from 0.0 (flx01) complete.
      2024-07-16  6:55:14 0 [Note] WSREP: Member 1.0 (garb) synced with group.
      2024-07-16  6:55:14 16 [Note] WSREP: ####### My UUID: 42b71455-3906-11ef-b540-2aac76f44406
      2024-07-16  6:55:14 16 [Note] WSREP: Skipping cert index reset
      2024-07-16  6:55:14 16 [Note] WSREP: REPL Protocols: 10 (5)
      2024-07-16  6:55:14 16 [Note] WSREP: ####### Adjusting cert position: 15519556 -> 15519557
      2024-07-16  6:55:14 0 [Note] WSREP: Service thread queue flushed.
      2024-07-16  6:55:15 16 [Note] WSREP: ================================================
      View:
        id: 401d046b-ebcd-11ec-9284-2e763fda7f1a:15519557
        status: primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 0
        members(2):
      	0: 42b71455-3906-11ef-b540-2aac76f44406, flx01.bos01.corp.akama
      	1: ad6be48b-388e-11ef-9120-ff0c187ecadb, garb
      =================================================
      2024-07-16  6:55:15 16 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2024-07-16  6:55:15 16 [Note] WSREP: Lowest cert index boundary for CC from group: 15519483
      2024-07-16  6:55:15 16 [Note] WSREP: Min available from gcache for CC from group: 15282316
      2024-07-16  6:55:15 16 [Note] WSREP: Detected STR version: 0, req_len: 9, req: trivial
      2024-07-16  6:55:15 0 [Note] WSREP: 0.0 (flx01): State transfer to 1.0 (garb) complete.
      2024-07-16  6:55:15 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 15519558)
      2024-07-16  6:55:15 0 [Note] WSREP: Processing event queue:... -nan% (0/0 events) complete.
      2024-07-16  6:55:15 0 [Note] WSREP: Member 0.0 (flx01) synced with group.
      2024-07-16  6:55:15 0 [Note] WSREP: Processing event queue:...100.0% (1/1 events) complete.
      2024-07-16  6:55:15 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 15519558)
      2024-07-16  6:55:15 13 [Note] WSREP: Server flx01 synced with group
      2024-07-16  6:55:20 0 [Note] WSREP:  cleaning up 8fbd9334-8c5b (ssl://172.27.164.171:4567)
      2024-07-16  6:56:03 127271 [Note] WSREP: MDL conflict db=db_01 table=GD_Config ticket=3 solved by abort
      

      Some more observations from Node 1 -

      root@flx11:~# ps -ef | grep mysql
      root      3075  2583  0 09:36 pts/0    00:00:00 grep --color=auto mysql
      mysql     9511     1  0 Jul03 ?        00:21:26 /usr/sbin/mariadbd --wsrep_start_position=401d046b-ebcd-11ec-9284-2e763fda7f1a:15262135,0-1-12403509
      root@flx11:~# 
       
      root@flx11:~# mysql -uroot -p'XXXX' -A --protocol=TCP -P3308 -e "show processlist; " 
      +--------+------------------+--------------------------------------------+------------------+---------+---------+-------------------------+------------------------------------------------------------------------------------------------------+----------+
      | Id     | User             | Host                                       | db               | Command | Time    | State                   | Info                                                                                                 | Progress |
      +--------+------------------+--------------------------------------------+------------------+---------+---------+-------------------------+------------------------------------------------------------------------------------------------------+----------+
      |      1 | system user      |                                            | NULL             | Sleep   | 1132171 | wsrep aborter idle      | NULL                                                                                                 |    0.000 |
      |      2 | system user      |                                            | NULL             | Sleep   |    9557 | Opening tables          | UPDATE `GD_Change` SET updated_at='2024-07-16 06:55:14.764867', gd_change_id=20188 WHERE ` |    0.000 |
      |      6 | system user      |                                            | NULL             | Sleep   |    9557 | After apply log event   | NULL                                                                                                 |    0.000 |
      |      7 | system user      |                                            | NULL             | Sleep   |    9557 | wsrep applier committed | NULL                                                                                                 |    0.000 |
      |      8 | system user      |                                            | NULL             | Sleep   |    9557 | After apply log event   | NULL                                                                                                 |    0.000 |
      |     10 | system user      |                                            | NULL             | Sleep   |    9557 | wsrep applier committed | NULL                                                                                                 |    0.000 |
      |      9 | system user      |                                            | db_01       | Sleep   |    9557 | Commit implicit         | DROP TABLE `GD_Config`                                                                               |    0.000 |
      |     13 | system user      |                                            | NULL             | Sleep   |    9557 | After apply log event   | NULL                                                                                                 |    0.000 |
      |     14 | system user      |                                            | NULL             | Sleep   |    9557 | After apply log event   | NULL                                                                                                 |    0.000 |
      | 116468 | _sentinel        | flx11:34148 | NULL             | Sleep   |      20 |                         | NULL                                                                                                 |    0.000 |
      | 117153 | db_02 | 198.19.18.88:43795                         | db_02 | Sleep   |     202 |                         | NULL                                                                                                 |    0.000 |
      | 117400 | root             | localhost:36090                            | NULL             | Query   |       0 | starting                | show processlist                                                                                     |    0.000 |
      +--------+------------------+--------------------------------------------+------------------+---------+---------+-------------------------+------------------------------------------------------------------------------------------------------+----------+
      root@flx11:~# 
       
       
      root@flx11:~# mysql -uroot -p'XXXX' -A --protocol=TCP -P3308 -e "show global status like 'wsrep%'" 
      +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
      | Variable_name                 | Value                                                                                                                                          |
      +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
      | wsrep_local_state_uuid        | 401d046b-ebcd-11ec-9284-2e763fda7f1a                                                                                                           |
      | wsrep_protocol_version        | 10                                                                                                                                             |
      | wsrep_last_committed          | 15519553                                                                                                                                       |
      | wsrep_replicated              | 120763                                                                                                                                         |
      | wsrep_replicated_bytes        | 139136808                                                                                                                                      |
      | wsrep_repl_keys               | 1413208                                                                                                                                        |
      | wsrep_repl_keys_bytes         | 14204600                                                                                                                                       |
      | wsrep_repl_data_bytes         | 116989796                                                                                                                                      |
      | wsrep_repl_other_bytes        | 0                                                                                                                                              |
      | wsrep_received                | 139487                                                                                                                                         |
      | wsrep_received_bytes          | 150254984                                                                                                                                      |
      | wsrep_local_commits           | 101989                                                                                                                                         |
      | wsrep_local_cert_failures     | 2                                                                                                                                              |
      | wsrep_local_replays           | 9                                                                                                                                              |
      | wsrep_local_send_queue        | 0                                                                                                                                              |
      | wsrep_local_send_queue_max    | 2                                                                                                                                              |
      | wsrep_local_send_queue_min    | 0                                                                                                                                              |
      | wsrep_local_send_queue_avg    | 0.000675643                                                                                                                                    |
      | wsrep_local_recv_queue        | 1                                                                                                                                              |
      | wsrep_local_recv_queue_max    | 5                                                                                                                                              |
      | wsrep_local_recv_queue_min    | 0                                                                                                                                              |
      | wsrep_local_recv_queue_avg    | 0.00232995                                                                                                                                     |
      | wsrep_local_cached_downto     | 15283108                                                                                                                                       |
      | wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
      | wsrep_flow_control_paused     | 0                                                                                                                                              |
      | wsrep_flow_control_sent       | 0                                                                                                                                              |
      | wsrep_flow_control_recv       | 0                                                                                                                                              |
      | wsrep_flow_control_active     | false                                                                                                                                          |
      | wsrep_flow_control_requested  | false                                                                                                                                          |
      | wsrep_cert_deps_distance      | 12.5772                                                                                                                                        |
      | wsrep_apply_oooe              | 0.122729                                                                                                                                       |
      | wsrep_apply_oool              | 0.00144511                                                                                                                                     |
      | wsrep_apply_window            | 1.12939                                                                                                                                        |
      | wsrep_apply_waits             | 384                                                                                                                                            |
      | wsrep_commit_oooe             | 0                                                                                                                                              |
      | wsrep_commit_oool             | 0                                                                                                                                              |
      | wsrep_commit_window           | 1.03898                                                                                                                                        |
      | wsrep_local_state             | 4                                                                                                                                              |
      | wsrep_local_state_comment     | Synced                                                                                                                                         |
      | wsrep_cert_index_size         | 46                                                                                                                                             |
      | wsrep_causal_reads            | 194551                                                                                                                                         |
      | wsrep_cert_interval           | 771.058                                                                                                                                        |
      | wsrep_open_transactions       | 0                                                                                                                                              |
      | wsrep_open_connections        | 0                                                                                                                                              |
      | wsrep_incoming_addresses      | 172.27.97.134:3308,172.27.164.171:3308,                                                                                                        |
      | wsrep_applier_thread_count    | 8                                                                                                                                              |
      | wsrep_cluster_capabilities    |                                                                                                                                                |
      | wsrep_cluster_conf_id         | 18                                                                                                                                             |
      | wsrep_cluster_size            | 3                                                                                                                                              |
      | wsrep_cluster_state_uuid      | 401d046b-ebcd-11ec-9284-2e763fda7f1a                                                                                                           |
      | wsrep_cluster_status          | non-Primary                                                                                                                                    |
      | wsrep_connected               | ON                                                                                                                                             |
      | wsrep_local_bf_aborts         | 54                                                                                                                                             |
      | wsrep_local_index             | 1                                                                                                                                              |
      | wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
      | wsrep_provider_name           | Galera                                                                                                                                         |
      | wsrep_provider_vendor         | Codership Oy info@codership.com                                                                                                              |
      | wsrep_provider_version        | 26.4.14(r06a0c285)                                                                                                                             |
      | wsrep_ready                   | OFF                                                                                                                                            |
      | wsrep_rollbacker_thread_count | 1                                                                                                                                              |
      | wsrep_thread_count            | 9                                                                                                                                              |
      +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
      

      Can you please help us understand what has happened here.
      thank you

      Attachments

        Activity

          People

            Unassigned Unassigned
            ngavali Nilesh
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.