Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21493

Galera test sporadic failure on galera.galera_shutdown_nonprim

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.3.22, 10.4.12
    • Fix Version/s: 10.3, 10.4
    • Component/s: Galera, Tests
    • Labels:
    • Environment:
      kvm-rpm-centos8-amd64

      Description

      galera.galera_shutdown_nonprim fails on bb.

      stdout.log:

      10.3.22 d531b4ee3a9bcd89a2fa6b49a2207eaf966f53e3

      galera.galera_shutdown_nonprim 'innodb'  w2 [ fail ]
              Test ended at 2020-01-14 17:17:09
       
      CURRENT_TEST: galera.galera_shutdown_nonprim
       
       
      Server [mysqld.2 - pid: 13652, winpid: 13652, exit: 256] failed during test run
      Server log from this test:
      ----------SERVER LOG START-----------
      2020-01-14 17:16:21 55 [Note] WSREP: turning isolation on
      2020-01-14 17:16:21 0 [Note] WSREP: (83cdfcd2, 'tcp://0.0.0.0:16005') turning message relay requesting on, nonlive peers: tcp://127.0.0.1:16002 
      2020-01-14 17:16:31 0 [Note] WSREP: evs::proto(83cdfcd2, OPERATIONAL, view_id(REG,347d190e,18)) suspecting node: 347d190e
      2020-01-14 17:16:31 0 [Note] WSREP: evs::proto(83cdfcd2, OPERATIONAL, view_id(REG,347d190e,18)) suspected node without join message, declaring inactive
      2020-01-14 17:16:32 0 [Note] WSREP: view(view_id(NON_PRIM,347d190e,18) memb {
      	83cdfcd2,0
      } joined {
      } left {
      } partitioned {
      	347d190e,0
      })
      2020-01-14 17:16:32 0 [Note] WSREP: 83cdfcd2 changing node 347d190e weight (trans) 1 -> 2
      2020-01-14 17:16:32 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2020-01-14 17:16:32 0 [Note] WSREP: view(view_id(NON_PRIM,83cdfcd2,19) memb {
      	83cdfcd2,0
      } joined {
      } left {
      } partitioned {
      	347d190e,0
      })
      2020-01-14 17:16:32 0 [Note] WSREP: Flow-control interval: [16, 16]
      2020-01-14 17:16:32 0 [Note] WSREP: Trying to continue unpaused monitor
      2020-01-14 17:16:32 0 [Note] WSREP: Received NON-PRIMARY.
      2020-01-14 17:16:32 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 280)
      2020-01-14 17:16:32 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2020-01-14 17:16:32 0 [Note] WSREP: Flow-control interval: [16, 16]
      2020-01-14 17:16:32 0 [Note] WSREP: Trying to continue unpaused monitor
      2020-01-14 17:16:32 0 [Note] WSREP: Received NON-PRIMARY.
      2020-01-14 17:16:32 2 [Note] WSREP: New cluster view: global state: ebfb42d8-36f0-11ea-9b4d-c6be5a807bef:280, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
      2020-01-14 17:16:32 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-01-14 17:16:32 2 [Note] WSREP: New cluster view: global state: ebfb42d8-36f0-11ea-9b4d-c6be5a807bef:280, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
      2020-01-14 17:16:32 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-01-14 17:16:32 0 [Note] /usr/sbin/mysqld (initiated by: root[root] @ localhost [127.0.0.1]): Normal shutdown
      2020-01-14 17:16:32 0 [Note] WSREP: Stop replication
      2020-01-14 17:16:32 0 [Note] WSREP: Closing send monitor...
      2020-01-14 17:16:32 0 [Note] WSREP: Closed send monitor.
      2020-01-14 17:16:32 0 [Note] WSREP: gcomm: terminating thread
      2020-01-14 17:16:32 0 [Note] WSREP: gcomm: joining thread
      2020-01-14 17:16:32 0 [Note] WSREP: gcomm: closing backend
      2020-01-14 17:16:32 0 [Note] WSREP: view((empty))
      2020-01-14 17:16:32 0 [Note] WSREP: gcomm: closed
      2020-01-14 17:16:32 0 [Note] WSREP: Received self-leave message.
      2020-01-14 17:16:32 0 [Note] WSREP: Flow-control interval: [0, 0]
      2020-01-14 17:16:32 0 [Note] WSREP: Trying to continue unpaused monitor
      2020-01-14 17:16:32 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      2020-01-14 17:16:32 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 280)
      2020-01-14 17:16:32 0 [Note] WSREP: RECV thread exiting 0: Success
      2020-01-14 17:16:32 2 [Note] WSREP: New cluster view: global state: ebfb42d8-36f0-11ea-9b4d-c6be5a807bef:280, view# -1: non-Primary, number of nodes: 0, my index: -1, protocol version 3
      2020-01-14 17:16:32 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2020-01-14 17:16:32 2 [Note] WSREP: applier thread exiting (code:0)
      2020-01-14 17:16:32 0 [Note] WSREP: recv_thread() joined.
      2020-01-14 17:16:32 0 [Note] WSREP: Closing replication queue.
      2020-01-14 17:16:32 0 [Note] WSREP: Closing slave action queue.
      2020-01-14 17:16:34 1 [Note] WSREP: rollbacker thread exiting
      2020-01-14 17:16:34 0 [Note] Event Scheduler: Purging the queue. 0 events
      2020-01-14 17:16:34 0 [Note] InnoDB: FTS optimize thread exiting.
      2020-01-14 17:16:35 0 [Note] WSREP: dtor state: CLOSED
      2020-01-14 17:16:35 0 [Note] WSREP: mon: entered 57 oooe fraction 0 oool fraction 0
      2020-01-14 17:16:35 0 [Note] WSREP: mon: entered 57 oooe fraction 0 oool fraction 0
      2020-01-14 17:16:35 0 [Note] WSREP: mon: entered 71 oooe fraction 0 oool fraction 0.0140845
      2020-01-14 17:16:35 0 [Note] WSREP: cert index usage at exit 0
      2020-01-14 17:16:35 0 [Note] WSREP: cert trx map usage at exit 57
      2020-01-14 17:16:35 0 [Note] WSREP: deps set usage at exit 0
      2020-01-14 17:16:35 0 [Note] WSREP: avg deps dist 1.05357
      2020-01-14 17:16:35 0 [Note] WSREP: avg cert interval 0
      2020-01-14 17:16:35 0 [Note] WSREP: cert index size 13
      2020-01-14 17:16:35 0 [Note] WSREP: Service thread queue flushed.
      2020-01-14 17:16:35 0 [Note] WSREP: wsdb trx map usage 0 conn query map usage 0
      2020-01-14 17:16:35 0 [Note] WSREP: MemPool(LocalTrxHandle): hit ratio: 0, misses: 15, in use: 0, in pool: 15
      2020-01-14 17:16:35 0 [Note] WSREP: MemPool(SlaveTrxHandle): hit ratio: 0.0232558, misses: 42, in use: 0, in pool: 42
      2020-01-14 17:16:35 0 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 280)
      2020-01-14 17:16:35 0 [Note] WSREP: Flushing memory map to disk...
      2020-01-14 17:16:35 0 [Note] InnoDB: Starting shutdown...
      2020-01-14 17:16:35 0 [Note] InnoDB: Dumping buffer pool(s) to /dev/shm/var/2/mysqld.2/data/ib_buffer_pool
      2020-01-14 17:16:35 0 [Note] InnoDB: Instance 0, restricted to 128 pages due to innodb_buf_pool_dump_pct=25
      2020-01-14 17:16:35 0 [Note] InnoDB: Buffer pool(s) dump completed at 200114 17:16:35
      2020-01-14 17:16:36 0 [Note] InnoDB: Shutdown completed; log sequence number 2485038; transaction id 810
      2020-01-14 17:16:36 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
      2020-01-14 17:16:36 0 [Note] /usr/sbin/mysqld: Shutdown complete
       
      2020-01-14 17:16:37 0 [Warning] WSREP: option --wsrep-causal-reads is deprecated
      2020-01-14 17:16:37 0 [Note] /usr/sbin/mysqld (mysqld 10.3.22-MariaDB-log) starting as process 13653 ...
      2020-01-14 17:16:37 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4186)
      2020-01-14 17:16:37 0 [Warning] Changed limits: max_open_files: 1024  max_connections: 151 (was 151)  table_cache: 421 (was 2000)
      2020-01-14 17:16:37 0 [Warning] WSREP: wsrep_sst_receive_address is set to '127.0.0.1:16007' which makes it impossible for another host to reach this one. Please set it to the address which this node can be connected at by other cluster members.
      2020-01-14 17:16:37 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
      2020-01-14 17:16:37 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
      2020-01-14 17:16:37 0 [Note] WSREP: wsrep_load(): Galera 3.23(rXXXX) by Codership Oy <info@codership.com> loaded successfully.
      2020-01-14 17:16:37 0 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
      2020-01-14 17:16:37 0 [Note] WSREP: Found saved state: ebfb42d8-36f0-11ea-9b4d-c6be5a807bef:280, safe_to_bootstrap: 0
      2020-01-14 17:16:37 0 [Note] WSREP: Passing config to GCS: base_dir = /dev/shm/var/2/mysqld.2/data/; base_host = 127.0.0.1; base_port = 16005; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT30S; evs.install_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT10S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /dev/shm/var/2/mysqld.2/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /dev/shm/var/2/mysqld.2/data//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmc
      2020-01-14 17:16:37 0 [Note] WSREP: GCache history reset: ebfb42d8-36f0-11ea-9b4d-c6be5a807bef:0 -> ebfb42d8-36f0-11ea-9b4d-c6be5a807bef:280
      2020-01-14 17:16:37 0 [Note] WSREP: Assign initial position for certification: 280, protocol version: -1
      2020-01-14 17:16:37 0 [Note] WSREP: wsrep_sst_grab()
      2020-01-14 17:16:37 0 [Note] WSREP: Start replication
      2020-01-14 17:16:37 0 [Note] WSREP: Setting initial position to ebfb42d8-36f0-11ea-9b4d-c6be5a807bef:280
      2020-01-14 17:16:37 0 [Note] WSREP: protonet asio version 0
      2020-01-14 17:16:37 0 [Note] WSREP: Using CRC-32C for message checksums.
      2020-01-14 17:16:37 0 [Note] WSREP: backend: asio
      2020-01-14 17:16:37 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 
      2020-01-14 17:16:37 0 [Warning] WSREP: access file(/dev/shm/var/2/mysqld.2/data//gvwstate.dat) failed(No such file or directory)
      2020-01-14 17:16:37 0 [Note] WSREP: restore pc from disk failed
      2020-01-14 17:16:37 0 [Note] WSREP: GMCast version 0
      2020-01-14 17:16:37 0 [Note] WSREP: (9f267396, 'tcp://0.0.0.0:16005') listening at tcp://0.0.0.0:16005
      2020-01-14 17:16:37 0 [Note] WSREP: (9f267396, 'tcp://0.0.0.0:16005') multicast: , ttl: 1
      2020-01-14 17:16:37 0 [Note] WSREP: EVS version 0
      2020-01-14 17:16:37 0 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '127.0.0.1:16002'
      2020-01-14 17:16:37 0 [Note] WSREP: (9f267396, 'tcp://0.0.0.0:16005') connection established to 347d190e tcp://127.0.0.1:16002
      2020-01-14 17:16:37 0 [Note] WSREP: (9f267396, 'tcp://0.0.0.0:16005') turning message relay requesting on, nonlive peers: 
      2020-01-14 17:16:37 0 [Note] WSREP: declaring 347d190e at tcp://127.0.0.1:16002 stable
      2020-01-14 17:16:37 0 [Note] WSREP: view(view_id(NON_PRIM,347d190e,20) memb {
      	347d190e,0
      	9f267396,0
      } joined {
      } left {
      } partitioned {
      	83cdfcd2,0
      })
      2020-01-14 17:16:40 0 [Note] WSREP: (9f267396, 'tcp://0.0.0.0:16005') turning message relay requesting off
      2020-01-14 17:17:08 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
      	 at gcomm/src/pc.cpp:connect():158
      2020-01-14 17:17:08 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -110 (Connection timed out)
      2020-01-14 17:17:08 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'my_wsrep_cluster' at 'gcomm://127.0.0.1:16002': -110 (Connection timed out)
      2020-01-14 17:17:08 0 [ERROR] WSREP: gcs connect failed: Connection timed out
      2020-01-14 17:17:08 0 [ERROR] WSREP: wsrep::connect(gcomm://127.0.0.1:16002) failed: 7
      2020-01-14 17:17:08 0 [ERROR] Aborting
       
      ----------SERVER LOG END-------------
      mysqltest failed but provided no output
      The result from queries just before the failure was:
      < snip >
      connection node_1;
      connection node_2;
      connection node_1;
      SET GLOBAL wsrep_provider_options = 'pc.weight=2';
      connection node_2;
      SET GLOBAL wsrep_provider_options = 'gmcast.isolate = 1';
      SET SESSION wsrep_sync_wait = 0;
       
       
       
       - saving '/dev/shm/var/2/log/galera.galera_shutdown_nonprim-innodb/' to '/dev/shm/var/log/galera.galera_shutdown_nonprim-innodb/'
       
      Retrying test galera.galera_shutdown_nonprim, attempt(2/3)...
       
      worker[2] > Restart  - not started
      worker[2] > Restart  - not started
      ***Warnings generated in error logs during shutdown after running tests: galera.galera_shutdown_nonprim
       
      2020-01-14 17:17:08 0 [ERROR] Aborting
       
      galera.galera_gtid 'innodb'              w1 [ pass ]   2023
      worker[1] > Restart [mysqld.1 - pid: 14423, winpid: 14423] - using different config file
      worker[1] > Restart [mysqld.2 - pid: 14460, winpid: 14460] - using different config file
      galera.galera_gtid_slave 'innodb'        w1 [ pass ]   2767
      worker[1] > Restart [mysqld.1 - pid: 14743, winpid: 14743] - using different config file
      worker[1] > Restart [mysqld.2 - pid: 14745, winpid: 14745] - using different config file
      worker[1] > Restart [mysqld.3 - pid: 14809, winpid: 14809] - using different config file
      galera.galera_shutdown_nonprim 'innodb'  w2 [ retry-pass ]  18962
       
      MTR's internal check of the test case 'galera.galera_shutdown_nonprim' failed.
      This means that the test case does not preserve the state that existed
      before the test case was executed.  Most likely the test case did not
      do a proper clean-up. It could also be caused by the previous test run
      by this thread, if the server wasn't restarted.
      This is the diff of the states of the servers before and after the
      test case was executed:
      mysqltest: Logging to '/dev/shm/var/2/tmp/check-mysqld_1.log'.
      mysqltest: Results saved in '/dev/shm/var/2/tmp/check-mysqld_1.result'.
      mysqltest: Connecting to server localhost:16000 (socket /dev/shm/var/tmp/2/mysqld.1.sock) as 'root', connection 'default', attempt 0 ...
      mysqltest: ... Connected.
      mysqltest: Start processing test commands from './include/check-testcase.test' ...
      mysqltest: ... Done processing test commands.
      --- /dev/shm/var/2/tmp/check-mysqld_1.result	2020-01-14 17:17:12.956163470 +0000
      +++ /dev/shm/var/2/tmp/check-mysqld_1.reject	2020-01-14 17:17:32.030163563 +0000
      @@ -653,7 +653,7 @@
       WSREP_OSU_METHOD	TOI
       WSREP_PATCH_VERSION	wsrep_25.24
       WSREP_PROVIDER	/usr/lib64/galera/libgalera_smm.so
      -WSREP_PROVIDER_OPTIONS	base_dir = /dev/shm/var/2/mysqld.1/data/; base_host = 127.0.0.1; base_port = 16002; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.causal_keepalive_period = PT1S; evs.debug_log_mask = 0x1; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT30S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT1S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT10S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = P1D; gcache.dir = /dev/shm/var/2/mysqld.1/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /dev/shm/var/2/mysqld.1/data//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 10M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = tcp://0.0.0.0:16002; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.segment = 0; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 127.0.0.1; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT20S; pc.npvo = false; pc.recovery = true; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT90S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 8; socket.checksum = 2; socket.recv_buf_size = 212992; 
      +WSREP_PROVIDER_OPTIONS	base_dir = /dev/shm/var/2/mysqld.1/data/; base_host = 127.0.0.1; base_port = 16002; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.causal_keepalive_period = PT1S; evs.debug_log_mask = 0x1; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT30S; evs.info_log_mask = 0; evs.install_timeout = PT15S; evs.join_retrans_period = PT1S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT10S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 0; evs.view_forget_timeout = P1D; gcache.dir = /dev/shm/var/2/mysqld.1/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /dev/shm/var/2/mysqld.1/data//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 10M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = tcp://0.0.0.0:16002; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.segment = 0; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 127.0.0.1; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT20S; pc.npvo = false; pc.recovery = true; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 2; protonet.backend = asio; protonet.version = 0; repl.causal_read_timeout = PT90S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 8; socket.checksum = 2; socket.recv_buf_size = 212992; 
       WSREP_RECOVER	OFF
       WSREP_REJECT_QUERIES	NONE
       WSREP_REPLICATE_MYISAM	OFF
       
      mysqltest: Result content mismatch
       
      not ok
       
       
      Retrying test galera.galera_shutdown_nonprim, attempt(3/3)...
       
      worker[2] > Restart  - not started
      worker[2] > Restart  - not started
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              jplindst Jan Lindström
              Reporter:
              stepan.patryshev Stepan Patryshev
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated: