Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29571

WSREP donor stuck in "donor/desynced" state

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Duplicate
    • 10.5.9
    • 10.5.14
    • wsrep
    • None
    • OS: Debian GNU/Linux 10 (buster)
      MariaDB: 10.5.9 (wrapped in Docker Image: bitnami-docker-mariadb-galera (release: 10.5.9-debian-10-r57))
      Galera Cluster: 3 nodes

    Description

      After donation, the donor node keeps its wsrep local state as 2 (donor/desynced) and never comes back to state 4 (synced)

      Related logs

      [Warning] WSREP: Could not find key from index
      [ERROR] WSREP: Failed to process action COMMIT_CUT, g: -1, l: 2243588112, ptr: 0x7f7020038140, size: 8: Access beyond record set end.: 1 (Operation not permitted)
      [Note ] WSREP: Applier thread exiting ret: 6 thd:6

      Attachments

        Activity

          cclin cc lin created issue -
          cclin cc lin made changes -
          Field Original Value New Value
          Summary WSREP donor stuck in "joined" state WSREP donor stuck in "donor/desynced" state
          cclin cc lin made changes -
          Description After donation, the donor node keeps its wsrep local state as 2 (joined) and never comes back to state 4 (synced)

          *Related logs*

          [Warning] WSREP: Could not find key from index
          [ERROR] WSREP: Failed to process action COMMIT_CUT, g: -1, l: 2243588112, ptr: 0x7f7020038140, size: 8: Access beyond record set end.: 1 (Operation not permitted)
          [Note ] WSREP: Applier thread exiting ret: 6 thd:6
          After donation, the donor node keeps its wsrep local state as 2 (donor/desynced) and never comes back to state 4 (synced)

          *Related logs*

          [Warning] WSREP: Could not find key from index
          [ERROR] WSREP: Failed to process action COMMIT_CUT, g: -1, l: 2243588112, ptr: 0x7f7020038140, size: 8: Access beyond record set end.: 1 (Operation not permitted)
          [Note ] WSREP: Applier thread exiting ret: 6 thd:6
          cclin cc lin added a comment - - edited

          wsrep-related server variable and status

          MariaDB [(none)]> show variables like '%wsrep%'\G

          Variable_name: wsrep_osu_method
          Value: TOI

          Variable_name: wsrep_sr_store
          Value: table

          Variable_name: wsrep_auto_increment_control
          Value: ON

          Variable_name: wsrep_causal_reads
          Value: OFF

          Variable_name: wsrep_certification_rules
          Value: strict

          Variable_name: wsrep_certify_nonpk
          Value: ON

          Variable_name: wsrep_cluster_address
          Value: gcomm://conductor-mariadb-server-a-headless.icmmon-a.svc.cluster.local

          Variable_name: wsrep_cluster_name
          Value: galera

          Variable_name: wsrep_convert_lock_to_trx
          Value: OFF

          Variable_name: wsrep_data_home_dir
          Value: /bitnami/mariadb/data/

          Variable_name: wsrep_dbug_option
          Value:

          Variable_name: wsrep_debug
          Value: NONE

          Variable_name: wsrep_desync
          Value: OFF

          Variable_name: wsrep_dirty_reads
          Value: OFF

          Variable_name: wsrep_drupal_282555_workaround
          Value: OFF

          Variable_name: wsrep_forced_binlog_format
          Value: NONE

          Variable_name: wsrep_gtid_domain_id
          Value: 1

          Variable_name: wsrep_gtid_mode
          Value: ON

          Variable_name: wsrep_gtid_seq_no
          Value: 0

          Variable_name: wsrep_ignore_apply_errors
          Value: 7

          Variable_name: wsrep_load_data_splitting
          Value: OFF

          Variable_name: wsrep_log_conflicts
          Value: OFF

          Variable_name: wsrep_max_ws_rows
          Value: 0

          Variable_name: wsrep_max_ws_size
          Value: 2147483647

          Variable_name: wsrep_mysql_replication_bundle
          Value: 0

          Variable_name: wsrep_node_address
          Value: 172.24.63.171

          Variable_name: wsrep_node_incoming_address
          Value: AUTO

          Variable_name: wsrep_node_name
          Value: conductor-mariadb-server-a-2

          Variable_name: wsrep_notify_cmd
          Value:

          Variable_name: wsrep_on
          Value: ON

          Variable_name: wsrep_patch_version
          Value: wsrep_26.22

          Variable_name: wsrep_provider
          Value: /opt/bitnami/mariadb/lib/libgalera_smm.so

          Variable_name: wsrep_provider_options
          Value: base_dir = /bitnami/mariadb/data/; base_host = 172.24.63.171; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.causal_keepalive_period = PT1S; evs.debug_log_mask = 0x1; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT7.5S; evs.join_retrans_period = PT1S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 1; evs.view_forget_timeout = P1D; gcache.dir = /bitnami/mariadb/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.segment = 0; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 172.24.63.171; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT20S; pc.npvo = false; pc.recovery = true; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 10; socket.checksum = 2; socket.recv_buf_size = auto; socket.send_buf_size = auto;

          Variable_name: wsrep_recover
          Value: OFF

          Variable_name: wsrep_reject_queries
          Value: NONE

          Variable_name: wsrep_replicate_myisam
          Value: ON

          Variable_name: wsrep_restart_slave
          Value: OFF

          Variable_name: wsrep_retry_autocommit
          Value: 1

          Variable_name: wsrep_slave_fk_checks
          Value: ON

          Variable_name: wsrep_slave_uk_checks
          Value: OFF

          Variable_name: wsrep_slave_threads
          Value: 4

          Variable_name: wsrep_sst_auth
          Value: ********

          Variable_name: wsrep_sst_donor
          Value:

          Variable_name: wsrep_sst_donor_rejects_queries
          Value: OFF

          Variable_name: wsrep_sst_method
          Value: rsync

          Variable_name: wsrep_sst_receive_address
          Value: AUTO

          Variable_name: wsrep_start_position
          Value: 00000000-0000-0000-0000-000000000000:-1

          Variable_name: wsrep_strict_ddl
          Value: OFF

          Variable_name: wsrep_sync_wait
          Value: 0

          Variable_name: wsrep_trx_fragment_size
          Value: 0

          Variable_name: wsrep_trx_fragment_unit
          Value: bytes
          51 rows in set (0.001 sec)

          MariaDB [(none)]> show status like '%wsrep%'\G

          Variable_name: wsrep_local_state_uuid
          Value: 345477eb-ae2e-11ec-8ba4-47b62b1ca1be

          Variable_name: wsrep_protocol_version
          Value: 10

          Variable_name: wsrep_last_committed
          Value: 4488690372

          Variable_name: wsrep_replicated
          Value: 940415

          Variable_name: wsrep_replicated_bytes
          Value: 20797693992

          Variable_name: wsrep_repl_keys
          Value: 2914115

          Variable_name: wsrep_repl_keys_bytes
          Value: 45882880

          Variable_name: wsrep_repl_data_bytes
          Value: 20688051030

          Variable_name: wsrep_repl_other_bytes
          Value: 0

          Variable_name: wsrep_received
          Value: 2242647698

          Variable_name: wsrep_received_bytes
          Value: 5470028916326

          Variable_name: wsrep_local_commits
          Value: 940415

          Variable_name: wsrep_local_cert_failures
          Value: 0

          Variable_name: wsrep_local_replays
          Value: 0

          Variable_name: wsrep_local_send_queue
          Value: 0

          Variable_name: wsrep_local_send_queue_max
          Value: 29

          Variable_name: wsrep_local_send_queue_min
          Value: 0

          Variable_name: wsrep_local_send_queue_avg
          Value: 0.00824793

          Variable_name: wsrep_local_recv_queue
          Value: 793801

          Variable_name: wsrep_local_recv_queue_max
          Value: 800397

          Variable_name: wsrep_local_recv_queue_min
          Value: 0

          Variable_name: wsrep_local_recv_queue_avg
          Value: 143.115

          Variable_name: wsrep_local_cached_downto
          Value: -1

          Variable_name: wsrep_flow_control_paused_ns
          Value: 140694363524876

          Variable_name: wsrep_flow_control_paused
          Value: 0.0337937

          Variable_name: wsrep_flow_control_sent
          Value: 254842

          Variable_name: wsrep_flow_control_recv
          Value: 392110

          Variable_name: wsrep_flow_control_active
          Value: true

          Variable_name: wsrep_flow_control_requested
          Value: true

          Variable_name: wsrep_cert_deps_distance
          Value: 19.0341

          Variable_name: wsrep_apply_oooe
          Value: 0.173456

          Variable_name: wsrep_apply_oool
          Value: 0.00571454

          Variable_name: wsrep_apply_window
          Value: 1.24731

          Variable_name: wsrep_commit_oooe
          Value: 0

          Variable_name: wsrep_commit_oool
          Value: 0

          Variable_name: wsrep_commit_window
          Value: 1.04421

          Variable_name: wsrep_local_state
          Value: 2

          Variable_name: wsrep_local_state_comment
          Value: Donor/Desynced

          Variable_name: wsrep_cert_index_size
          Value: 492

          Variable_name: wsrep_causal_reads
          Value: 0

          Variable_name: wsrep_cert_interval
          Value: 72.9897

          Variable_name: wsrep_open_transactions
          Value: 0

          Variable_name: wsrep_open_connections
          Value: 0

          Variable_name: wsrep_incoming_addresses
          Value: AUTO,AUTO,AUTO

          Variable_name: wsrep_cluster_weight
          Value: 3

          Variable_name: wsrep_desync_count
          Value: 0

          Variable_name: wsrep_evs_delayed
          Value:

          Variable_name: wsrep_evs_evict_list
          Value:

          Variable_name: wsrep_evs_repl_latency
          Value: 0/0/0/0/0

          Variable_name: wsrep_evs_state
          Value: OPERATIONAL

          Variable_name: wsrep_gcomm_uuid
          Value: b0d5ab73-12d1-11ed-ba3c-de66cd0ff288

          Variable_name: wsrep_gmcast_segment
          Value: 0

          Variable_name: wsrep_applier_thread_count
          Value: 3

          Variable_name: wsrep_cluster_capabilities
          Value:

          Variable_name: wsrep_cluster_conf_id
          Value: 17

          Variable_name: wsrep_cluster_size
          Value: 3

          Variable_name: wsrep_cluster_state_uuid
          Value: 345477eb-ae2e-11ec-8ba4-47b62b1ca1be

          Variable_name: wsrep_cluster_status
          Value: Primary

          Variable_name: wsrep_connected
          Value: ON

          Variable_name: wsrep_local_bf_aborts
          Value: 0

          Variable_name: wsrep_local_index
          Value: 2

          Variable_name: wsrep_provider_capabilities
          Value: :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO:

          Variable_name: wsrep_provider_name
          Value: Galera

          Variable_name: wsrep_provider_vendor
          Value: Codership Oy <info@codership.com>

          Variable_name: wsrep_provider_version
          Value: 4.7(rXXXX)

          Variable_name: wsrep_ready
          Value: ON

          Variable_name: wsrep_rollbacker_thread_count
          Value: 1

          Variable_name: wsrep_thread_count
          Value: 4
          68 rows in set (0.001 sec)

          cclin cc lin added a comment - - edited wsrep-related server variable and status MariaDB [(none)] > show variables like '%wsrep%'\G Variable_name: wsrep_osu_method Value: TOI Variable_name: wsrep_sr_store Value: table Variable_name: wsrep_auto_increment_control Value: ON Variable_name: wsrep_causal_reads Value: OFF Variable_name: wsrep_certification_rules Value: strict Variable_name: wsrep_certify_nonpk Value: ON Variable_name: wsrep_cluster_address Value: gcomm://conductor-mariadb-server-a-headless.icmmon-a.svc.cluster.local Variable_name: wsrep_cluster_name Value: galera Variable_name: wsrep_convert_lock_to_trx Value: OFF Variable_name: wsrep_data_home_dir Value: /bitnami/mariadb/data/ Variable_name: wsrep_dbug_option Value: Variable_name: wsrep_debug Value: NONE Variable_name: wsrep_desync Value: OFF Variable_name: wsrep_dirty_reads Value: OFF Variable_name: wsrep_drupal_282555_workaround Value: OFF Variable_name: wsrep_forced_binlog_format Value: NONE Variable_name: wsrep_gtid_domain_id Value: 1 Variable_name: wsrep_gtid_mode Value: ON Variable_name: wsrep_gtid_seq_no Value: 0 Variable_name: wsrep_ignore_apply_errors Value: 7 Variable_name: wsrep_load_data_splitting Value: OFF Variable_name: wsrep_log_conflicts Value: OFF Variable_name: wsrep_max_ws_rows Value: 0 Variable_name: wsrep_max_ws_size Value: 2147483647 Variable_name: wsrep_mysql_replication_bundle Value: 0 Variable_name: wsrep_node_address Value: 172.24.63.171 Variable_name: wsrep_node_incoming_address Value: AUTO Variable_name: wsrep_node_name Value: conductor-mariadb-server-a-2 Variable_name: wsrep_notify_cmd Value: Variable_name: wsrep_on Value: ON Variable_name: wsrep_patch_version Value: wsrep_26.22 Variable_name: wsrep_provider Value: /opt/bitnami/mariadb/lib/libgalera_smm.so Variable_name: wsrep_provider_options Value: base_dir = /bitnami/mariadb/data/; base_host = 172.24.63.171; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.causal_keepalive_period = PT1S; evs.debug_log_mask = 0x1; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT7.5S; evs.join_retrans_period = PT1S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 1; evs.view_forget_timeout = P1D; gcache.dir = /bitnami/mariadb/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.segment = 0; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 172.24.63.171; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT20S; pc.npvo = false; pc.recovery = true; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 10; socket.checksum = 2; socket.recv_buf_size = auto; socket.send_buf_size = auto; Variable_name: wsrep_recover Value: OFF Variable_name: wsrep_reject_queries Value: NONE Variable_name: wsrep_replicate_myisam Value: ON Variable_name: wsrep_restart_slave Value: OFF Variable_name: wsrep_retry_autocommit Value: 1 Variable_name: wsrep_slave_fk_checks Value: ON Variable_name: wsrep_slave_uk_checks Value: OFF Variable_name: wsrep_slave_threads Value: 4 Variable_name: wsrep_sst_auth Value: ******** Variable_name: wsrep_sst_donor Value: Variable_name: wsrep_sst_donor_rejects_queries Value: OFF Variable_name: wsrep_sst_method Value: rsync Variable_name: wsrep_sst_receive_address Value: AUTO Variable_name: wsrep_start_position Value: 00000000-0000-0000-0000-000000000000:-1 Variable_name: wsrep_strict_ddl Value: OFF Variable_name: wsrep_sync_wait Value: 0 Variable_name: wsrep_trx_fragment_size Value: 0 Variable_name: wsrep_trx_fragment_unit Value: bytes 51 rows in set (0.001 sec) MariaDB [(none)] > show status like '%wsrep%'\G Variable_name: wsrep_local_state_uuid Value: 345477eb-ae2e-11ec-8ba4-47b62b1ca1be Variable_name: wsrep_protocol_version Value: 10 Variable_name: wsrep_last_committed Value: 4488690372 Variable_name: wsrep_replicated Value: 940415 Variable_name: wsrep_replicated_bytes Value: 20797693992 Variable_name: wsrep_repl_keys Value: 2914115 Variable_name: wsrep_repl_keys_bytes Value: 45882880 Variable_name: wsrep_repl_data_bytes Value: 20688051030 Variable_name: wsrep_repl_other_bytes Value: 0 Variable_name: wsrep_received Value: 2242647698 Variable_name: wsrep_received_bytes Value: 5470028916326 Variable_name: wsrep_local_commits Value: 940415 Variable_name: wsrep_local_cert_failures Value: 0 Variable_name: wsrep_local_replays Value: 0 Variable_name: wsrep_local_send_queue Value: 0 Variable_name: wsrep_local_send_queue_max Value: 29 Variable_name: wsrep_local_send_queue_min Value: 0 Variable_name: wsrep_local_send_queue_avg Value: 0.00824793 Variable_name: wsrep_local_recv_queue Value: 793801 Variable_name: wsrep_local_recv_queue_max Value: 800397 Variable_name: wsrep_local_recv_queue_min Value: 0 Variable_name: wsrep_local_recv_queue_avg Value: 143.115 Variable_name: wsrep_local_cached_downto Value: -1 Variable_name: wsrep_flow_control_paused_ns Value: 140694363524876 Variable_name: wsrep_flow_control_paused Value: 0.0337937 Variable_name: wsrep_flow_control_sent Value: 254842 Variable_name: wsrep_flow_control_recv Value: 392110 Variable_name: wsrep_flow_control_active Value: true Variable_name: wsrep_flow_control_requested Value: true Variable_name: wsrep_cert_deps_distance Value: 19.0341 Variable_name: wsrep_apply_oooe Value: 0.173456 Variable_name: wsrep_apply_oool Value: 0.00571454 Variable_name: wsrep_apply_window Value: 1.24731 Variable_name: wsrep_commit_oooe Value: 0 Variable_name: wsrep_commit_oool Value: 0 Variable_name: wsrep_commit_window Value: 1.04421 Variable_name: wsrep_local_state Value: 2 Variable_name: wsrep_local_state_comment Value: Donor/Desynced Variable_name: wsrep_cert_index_size Value: 492 Variable_name: wsrep_causal_reads Value: 0 Variable_name: wsrep_cert_interval Value: 72.9897 Variable_name: wsrep_open_transactions Value: 0 Variable_name: wsrep_open_connections Value: 0 Variable_name: wsrep_incoming_addresses Value: AUTO,AUTO,AUTO Variable_name: wsrep_cluster_weight Value: 3 Variable_name: wsrep_desync_count Value: 0 Variable_name: wsrep_evs_delayed Value: Variable_name: wsrep_evs_evict_list Value: Variable_name: wsrep_evs_repl_latency Value: 0/0/0/0/0 Variable_name: wsrep_evs_state Value: OPERATIONAL Variable_name: wsrep_gcomm_uuid Value: b0d5ab73-12d1-11ed-ba3c-de66cd0ff288 Variable_name: wsrep_gmcast_segment Value: 0 Variable_name: wsrep_applier_thread_count Value: 3 Variable_name: wsrep_cluster_capabilities Value: Variable_name: wsrep_cluster_conf_id Value: 17 Variable_name: wsrep_cluster_size Value: 3 Variable_name: wsrep_cluster_state_uuid Value: 345477eb-ae2e-11ec-8ba4-47b62b1ca1be Variable_name: wsrep_cluster_status Value: Primary Variable_name: wsrep_connected Value: ON Variable_name: wsrep_local_bf_aborts Value: 0 Variable_name: wsrep_local_index Value: 2 Variable_name: wsrep_provider_capabilities Value: :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: Variable_name: wsrep_provider_name Value: Galera Variable_name: wsrep_provider_vendor Value: Codership Oy <info@codership.com> Variable_name: wsrep_provider_version Value: 4.7(rXXXX) Variable_name: wsrep_ready Value: ON Variable_name: wsrep_rollbacker_thread_count Value: 1 Variable_name: wsrep_thread_count Value: 4 68 rows in set (0.001 sec)
          elenst Elena Stepanova made changes -
          Fix Version/s 10.5 [ 23123 ]
          Assignee Jan Lindström [ jplindst ]

          cclin Can you provide full error log?

          jplindst Jan Lindström (Inactive) added a comment - cclin Can you provide full error log?
          jplindst Jan Lindström (Inactive) made changes -
          Status Open [ 1 ] Needs Feedback [ 10501 ]
          juan.vera Juan added a comment -

          Hi jplindst - here's details running on MariaDB 10.5.12 on Rocky. I've attached logs for two nodes, MDB-10-5-12-G-104, the joiner, and MDB-10-5-12-G-101, the donor. At Wed Jan 4 02:42:13 UTC 2023 I restart MDB-10-5-12-G-104 after deleting the datadir, forcing an SST. MDB-10-5-12-G-101 desyncs to provide the SST and stays desynced until I restart MDB-10-5-12-G-104 once again at Wed Jan 4 02:42:57 UTC 2023

          Attached are MDB-10-5-12-G-101-202301040240-state.log, a log watching the node state on the donor during this time-period, along with the complete logs for the donor, MDB-10-5-12-G-101-202301040245-mariadb-error-logs.tgz, and for the joiner, MDB-10-5-12-G-104-202301040245-mariadb-error-logs.tgz

          juan.vera Juan added a comment - Hi jplindst - here's details running on MariaDB 10.5.12 on Rocky. I've attached logs for two nodes, MDB-10-5-12-G-104, the joiner, and MDB-10-5-12-G-101, the donor. At Wed Jan 4 02:42:13 UTC 2023 I restart MDB-10-5-12-G-104 after deleting the datadir, forcing an SST. MDB-10-5-12-G-101 desyncs to provide the SST and stays desynced until I restart MDB-10-5-12-G-104 once again at Wed Jan 4 02:42:57 UTC 2023 Attached are MDB-10-5-12-G-101-202301040240-state.log, a log watching the node state on the donor during this time-period, along with the complete logs for the donor, MDB-10-5-12-G-101-202301040245-mariadb-error-logs.tgz, and for the joiner, MDB-10-5-12-G-104-202301040245-mariadb-error-logs.tgz
          stephanvos Stephan Vos added a comment -

          I can confirm the same happened with us recently on MariaDB 10.5.17 on Ubuntu 18.04
          Node 2 was full SST donor to Node 1 and after SST completed Node 2 was stuck in desync/donor state and I had to kill the process and restart it after which it managed to sync up with an IST.
          It used wsrep method = rsync and node 2 did have a huge queue after the 2.5 hours.

          stephanvos Stephan Vos added a comment - I can confirm the same happened with us recently on MariaDB 10.5.17 on Ubuntu 18.04 Node 2 was full SST donor to Node 1 and after SST completed Node 2 was stuck in desync/donor state and I had to kill the process and restart it after which it managed to sync up with an IST. It used wsrep method = rsync and node 2 did have a huge queue after the 2.5 hours.
          elenst Elena Stepanova made changes -
          Assignee Jan Lindström [ jplindst ] Julius Goryavsky [ sysprg ]
          Status Needs Feedback [ 10501 ] Open [ 1 ]
          sysprg Julius Goryavsky made changes -
          Assignee Julius Goryavsky [ sysprg ] Jan Lindström [ JIRAUSER53125 ]
          janlindstrom Jan Lindström made changes -
          Assignee Jan Lindström [ JIRAUSER53125 ] Alexey [ yurchenko ]
          Yurchenko Alexey added a comment -

          Appears to be a duplicate of https://jira.mariadb.org/browse/MDEV-27459

          Yurchenko Alexey added a comment - Appears to be a duplicate of https://jira.mariadb.org/browse/MDEV-27459
          Yurchenko Alexey made changes -
          Fix Version/s 10.5.14 [ 26809 ]
          Fix Version/s 10.5 [ 23123 ]
          Resolution Duplicate [ 3 ]
          Status Open [ 1 ] Closed [ 6 ]

          People

            Yurchenko Alexey
            cclin cc lin
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.