Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25880

rsync may be mistakenly killed when overlapping SST

    XMLWordPrintable

    Details

      Description

      This bug was originally seen in the galera_nbo_sst_slave mtr test for 10.6, however it is relevant for all versions and can lead to intermittent SST crashes via rsync on very fast server restarts - when a new SST process (for example, after starting a new server) overlaps the old SST process from the previous (already terminated) server. This overlap can result in the new rsync being killed instead of the old rsync, or the pid file from the new rsync being killed, which then lead to problems.
      For example:

      2021-06-09  3:28:56 0 [Warning] WSREP: 0.0 (panda): State transfer to 1.0 (panda) failed: -11 (Resource temporarily unavailable)
      2021-06-09  3:28:56 0 [ERROR] WSREP: /home/panda/galera-es-4.x/gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1205: Will never receive state. Need to abort.
      2021-06-09  3:28:56 0 [Note] WSREP: gcomm: terminating thread
      2021-06-09  3:28:56 0 [Note] WSREP: gcomm: joining thread
      2021-06-09  3:28:56 0 [Note] WSREP: gcomm: closing backend
      2021-06-09  3:28:56 2 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): discarded 24 bytes
      2021-06-09  3:28:56 2 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): found 1/2 locked buffers
      2021-06-09  3:28:57 0 [Note] WSREP: PC protocol downgrade 1 -> 0
      2021-06-09  3:28:57 0 [Note] WSREP: view((empty))
      2021-06-09  3:28:57 0 [Note] WSREP: gcomm: closed
      2021-06-09  3:28:57 0 [Note] WSREP: /home/panda/maria-10.6/build/sql/mariadbd: Terminated.
      2021-06-09  3:28:58 0 [Warning] WSREP: option --wsrep-causal-reads is deprecated
      2021-06-09  3:28:58 0 [Note] /home/panda/maria-10.6/build/sql/mariadbd (mysqld 10.6.1-1-MariaDB-debug-log) starting as process 410627 ...
      .................
      .................
      2021-06-09  3:28:58 0 [Note] WSREP: save pc into disk
      WSREP_SST: [ERROR] Parent mysqld process (PID: 410497) terminated unexpectedly. (20210609 03:28:58.800)
      /home/panda/maria-10.6/build/scripts/wsrep_sst_rsync: line 681: kill: (-410497) - No such process
      WSREP_SST: [INFO] Joiner cleanup: rsync PID=0, stunnel PID=410592 (20210609 03:28:58.803)
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              sysprg Julius Goryavsky
              Reporter:
              sysprg Julius Goryavsky
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration