Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25880

rsync may be mistakenly killed when overlapping SST

Details

    Description

      This bug was originally seen in the galera_nbo_sst_slave mtr test for 10.6, however it is relevant for all versions and can lead to intermittent SST crashes via rsync on very fast server restarts - when a new SST process (for example, after starting a new server) overlaps the old SST process from the previous (already terminated) server. This overlap can result in the new rsync being killed instead of the old rsync, or the pid file from the new rsync being killed, which then lead to problems.
      For example:

      2021-06-09  3:28:56 0 [Warning] WSREP: 0.0 (panda): State transfer to 1.0 (panda) failed: -11 (Resource temporarily unavailable)
      2021-06-09  3:28:56 0 [ERROR] WSREP: /home/panda/galera-es-4.x/gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1205: Will never receive state. Need to abort.
      2021-06-09  3:28:56 0 [Note] WSREP: gcomm: terminating thread
      2021-06-09  3:28:56 0 [Note] WSREP: gcomm: joining thread
      2021-06-09  3:28:56 0 [Note] WSREP: gcomm: closing backend
      2021-06-09  3:28:56 2 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): discarded 24 bytes
      2021-06-09  3:28:56 2 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): found 1/2 locked buffers
      2021-06-09  3:28:57 0 [Note] WSREP: PC protocol downgrade 1 -> 0
      2021-06-09  3:28:57 0 [Note] WSREP: view((empty))
      2021-06-09  3:28:57 0 [Note] WSREP: gcomm: closed
      2021-06-09  3:28:57 0 [Note] WSREP: /home/panda/maria-10.6/build/sql/mariadbd: Terminated.
      2021-06-09  3:28:58 0 [Warning] WSREP: option --wsrep-causal-reads is deprecated
      2021-06-09  3:28:58 0 [Note] /home/panda/maria-10.6/build/sql/mariadbd (mysqld 10.6.1-1-MariaDB-debug-log) starting as process 410627 ...
      .................
      .................
      2021-06-09  3:28:58 0 [Note] WSREP: save pc into disk
      WSREP_SST: [ERROR] Parent mysqld process (PID: 410497) terminated unexpectedly. (20210609 03:28:58.800)
      /home/panda/maria-10.6/build/scripts/wsrep_sst_rsync: line 681: kill: (-410497) - No such process
      WSREP_SST: [INFO] Joiner cleanup: rsync PID=0, stunnel PID=410592 (20210609 03:28:58.803)
      

      Attachments

        Issue Links

          Activity

            sysprg Julius Goryavsky created issue -
            sysprg Julius Goryavsky made changes -
            Field Original Value New Value
            Summary rsync may be mistakenly killed when overlaying SST rsync may be mistakenly killed when overlapping SST
            sysprg Julius Goryavsky made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            sysprg Julius Goryavsky made changes -
            Assignee Julius Goryavsky [ sysprg ]
            sysprg Julius Goryavsky made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]
            sysprg Julius Goryavsky made changes -
            Assignee Julius Goryavsky [ sysprg ] Jan Lindström [ jplindst ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            sysprg Julius Goryavsky made changes -
            sysprg Julius Goryavsky made changes -
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Julius Goryavsky [ sysprg ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            sysprg Julius Goryavsky made changes -
            sysprg Julius Goryavsky made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            sysprg Julius Goryavsky made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            sysprg Julius Goryavsky made changes -
            Assignee Julius Goryavsky [ sysprg ] Jan Lindström [ jplindst ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Julius Goryavsky [ sysprg ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            sysprg Julius Goryavsky made changes -
            Fix Version/s 10.6.2 [ 25800 ]
            Fix Version/s 10.2.39 [ 25731 ]
            Fix Version/s 10.3.30 [ 25732 ]
            Fix Version/s 10.4.20 [ 25733 ]
            Fix Version/s 10.5.11 [ 25734 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -
            Fix Version/s 10.2.40 [ 26027 ]
            Fix Version/s 10.3.31 [ 26028 ]
            Fix Version/s 10.4.21 [ 26030 ]
            Fix Version/s 10.5.12 [ 26025 ]
            Fix Version/s 10.6.3 [ 25904 ]
            Fix Version/s 10.2.39 [ 25731 ]
            Fix Version/s 10.3.30 [ 25732 ]
            Fix Version/s 10.4.20 [ 25733 ]
            Fix Version/s 10.5.11 [ 25734 ]
            Fix Version/s 10.6.2 [ 25800 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 122547 ] MariaDB v4 [ 159378 ]

            People

              sysprg Julius Goryavsky
              sysprg Julius Goryavsky
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.