[MDEV-23578] galera_3nodes.galera_ist_gcache_rollover MTR failed: WSREP: exception caused by message: {v=0,t=1,ut=255,o=4,s=0,sr=0,as=1,f=6,src=50524cfe,srcvid=view_id(REG,50524cfe,4),insvid=view_id(UNKNOWN,00000000,0),ru=00000000,r=[-1,-1],fs=75,nl=(} Created: 2020-08-25  Updated: 2024-01-16  Resolved: 2022-10-28

Status: Closed
Project: MariaDB Server
Component/s: Galera, Tests
Affects Version/s: 10.2.34, 10.3.25, 10.4.15, 10.5.6
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Stepan Patryshev (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: Test_disabled
Environment:

kvm-deb-jessie-x86



 Description   

galera_3nodes.galera_ist_gcache_rollover failed on BB10.2: WSREP: exception caused by message:

{v=0,t=1,ut=255,o=4,s=0,sr=0,as=1,f=6,src=50524cfe,srcvid=view_id(REG,50524cfe,4),insvid=view_id(UNKNOWN,00000000,0),ru=00000000,r=[-1,-1],fs=75,nl=(}

stdio.log:

10.2.34 8cf8ad86d4b6f3479d80f3d8e8c2bcf463966924

galera_3nodes.galera_ist_gcache_rollover 'innodb' w1 [ fail ]
        Test ended at 2020-08-25 11:48:06
 
CURRENT_TEST: galera_3nodes.galera_ipv6_xtrabackup-v2
CURRENT_TEST: galera_3nodes.galera_ipv6_xtrabackup-v2
CURRENT_TEST: galera_3nodes.galera_ist_gcache_rollover
 
 
Failed to start mysqld.3
 
 
 - skipping '/dev/shm/var/1/log/galera_3nodes.galera_ist_gcache_rollover-innodb/'
 
Retrying test galera_3nodes.galera_ist_gcache_rollover, attempt(2/3)...
 
***Warnings generated in error logs during shutdown after running tests: galera_3nodes.galera_ist_gcache_rollover
 
2020-08-25 11:48:04 3054057280 [ERROR] WSREP: caught exception in PC, state dump to stderr follows:
2020-08-25 11:48:04 3054057280 [ERROR] WSREP: exception caused by message: {v=0,t=1,ut=255,o=4,s=0,sr=0,as=1,f=6,src=50524cfe,srcvid=view_id(REG,50524cfe,4),insvid=view_id(UNKNOWN,00000000,0),ru=00000000,r=[-1,-1],fs=75,nl=(
 }2020-08-25 11:48:04 3054057280 [ERROR] WSREP: failed to open gcomm backend connection: 131: 505b83f7 last prims not consistent (FATAL)
2020-08-25 11:48:04 3054057280 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -131 (State not recoverable)
2020-08-25 11:48:04 3054057280 [ERROR] WSREP: gcs connect failed: State not recoverable
2020-08-25 11:48:04 3054057280 [ERROR] Aborting

On BB 10.3, BB 10.4 and BB 10.5 it failed with another output:

stdio.log:

10.3.25 95831888e8a89a4e141e76d51dbfc0701552c824

galera_3nodes.galera_ist_gcache_rollover 'innodb' w2 [ fail ]
        Test ended at 2020-08-25 17:51:09
 
CURRENT_TEST: galera_3nodes.galera_ipv6_mariabackup_section
CURRENT_TEST: galera_3nodes.galera_ipv6_rsync_section
CURRENT_TEST: galera_3nodes.galera_ist_gcache_rollover
 
 
Failed to start mysqld.3
 
 
worker[2] > Restart  - using different config file
worker[2] > Restart  - using different config file
worker[2] > Restart  - using different config file
 - skipping '/dev/shm/var/2/log/galera_3nodes.galera_ist_gcache_rollover-innodb/'
 
Retrying test galera_3nodes.galera_ist_gcache_rollover, attempt(2/3)...
 
***Warnings generated in error logs during shutdown after running tests: galera_3nodes.galera_ist_gcache_rollover
 
WSREP_SST: [ERROR] rsync daemon port '16031' has been taken (20200825 17:50:25.449)
2020-08-25 17:50:25 0 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '127.0.0.1:16031' --datadir '/dev/shm/var/2/mysqld.3/data/' --defaults-file '/dev/shm/var/2/my.cnf' --defaults-group-suffix '.3' --parent '32484' --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.3 --defaults-file=/dev/shm/var/2/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
2020-08-25 17:50:25 0 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '127.0.0.1:16031' --datadir '/dev/shm/var/2/mysqld.3/data/' --defaults-file '/dev/shm/var/2/my.cnf' --defaults-group-suffix '.3' --parent '32484' --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.3 --defaults-file=/dev/shm/var/2/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300: 16 (Device or resource busy)
2020-08-25 17:50:25 2 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
2020-08-25 17:50:25 2 [ERROR] Aborting
Warning: Memory not freed: 1381136
 
worker[2] mysql-test-run: WARNING: Waited 60 seconds for /dev/shm/var/2/run/mysqld.3.pid to be created, still waiting for 120 seconds...
worker[2] mysql-test-run: WARNING: Waited 120 seconds for /dev/shm/var/2/run/mysqld.3.pid to be created, still waiting for 60 seconds...
worker[2] mysql-test-run: WARNING: Timeout after mysql-test-run waited 180 seconds for the process [mysqld.3 - pid: 979, winpid: 979] to create a pid file.
galera_3nodes.galera_ipv6_rsync_section 'innodb' w2 [ retry-fail ]
        Test ended at 2020-08-25 17:54:14
 
CURRENT_TEST: galera_3nodes.galera_ipv6_mariabackup_section
CURRENT_TEST: galera_3nodes.galera_ipv6_rsync_section
CURRENT_TEST: galera_3nodes.galera_ist_gcache_rollover
CURRENT_TEST: galera_3nodes.galera_ipv6_rsync_section
 
 
Failed to start mysqld.3
 
 
worker[2] > Restart  - using different config file
worker[2] > Restart  - using different config file
worker[2] > Restart  - using different config file
 - skipping '/dev/shm/var/2/log/galera_3nodes.galera_ipv6_rsync_section-innodb/'
 
Test galera_3nodes.galera_ipv6_rsync_section has failed 2 times, no more retries!
 
***Warnings generated in error logs during shutdown after running tests: galera_3nodes.galera_ipv6_rsync_section
 
WSREP_SST: [ERROR] rsync daemon port '16031' has been taken (20200825 17:51:11.334)
2020-08-25 17:51:11 0 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address '[::1]:16031' --datadir '/dev/shm/var/2/mysqld.3/data/' --defaults-file '/dev/shm/var/2/my.cnf' --defaults-group-suffix '.3' --parent '980' --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.3 --defaults-file=/dev/shm/var/2/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --bind-address=:: --core-file --loose-debug-sync-timeout=300
2020-08-25 17:51:11 0 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address '[::1]:16031' --datadir '/dev/shm/var/2/mysqld.3/data/' --defaults-file '/dev/shm/var/2/my.cnf' --defaults-group-suffix '.3' --parent '980' --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.3 --defaults-file=/dev/shm/var/2/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --bind-address=:: --core-file --loose-debug-sync-timeout=300: 16 (Device or resource busy)
2020-08-25 17:51:11 1 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
2020-08-25 17:51:11 1 [ERROR] Aborting
 
wsrep.mysql_tzinfo_to_sql_symlink 'innodb' w2 [ pass ]     34
worker[2] > Restart [mysqld.1 - pid: 1307, winpid: 1307] - using different config file
galera_3nodes.galera_ist_gcache_rollover 'innodb' w2 [ skipped ]  debug_sync is not available
 
Retrying test galera_3nodes.galera_ist_gcache_rollover, attempt(3/3)...



 Comments   
Comment by Runzi [ 2024-01-16 ]

// Some comments here
public String getFoo()
{
    2024-01-03  0:10:32 0 [ERROR] WSREP: caught exception in PC, state dump to stderr follows:
pc::Proto{uuid=acbe31e8-9673,start_prim=0,npvo=0,ignore_sb=0,ignore_quorum=0,state=1,last_sent_seq=466,checksum=0,instances=
  3c167a24-b8f0,prim=1,un=0,last_seq=103,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
  4ebe7f59-8874,prim=1,un=0,last_seq=372,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
  acbe31e8-9673,prim=1,un=0,last_seq=466,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
,state_msgs=
  3c167a24-b8f0,pcmsg{ type=STATE, seq=0, flags= 0, node_map {  3c167a24-b8f0,prim=1,un=0,last_seq=103,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
  4ebe7f59-8874,prim=1,un=0,last_seq=372,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
  acbe31e8-9673,prim=1,un=0,last_seq=466,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
}}
  4ebe7f59-8874,pcmsg{ type=STATE, seq=0, flags= 0, node_map {  3c167a24-b8f0,prim=1,un=0,last_seq=103,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
  4ebe7f59-8874,prim=1,un=0,last_seq=372,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
  acbe31e8-9673,prim=1,un=0,last_seq=466,last_prim=view_id(PRIM,3c167a24-b8f0,64),to_seq=39886,weight=1,segment=0
}}
,current_view=view(view_id(REG,3c167a24-b8f0,75) memb {
  3c167a24-b8f0,0
  4ebe7f59-8874,0
  774206ed-985d,0
  91f27cc6-a5ee,0
  acbe31e8-9673,0
  be92f86c-94d1,0
} joined {
  774206ed-985d,0
  91f27cc6-a5ee,0
  be92f86c-94d1,0
} left {
} partitioned {
}),pc_view=view(view_id(PRIM,3c167a24-b8f0,64) memb {
  3c167a24-b8f0,0
  4ebe7f59-8874,0
  acbe31e8-9673,0
} joined {
} left {
} partitioned {
}),mtu=32636}
2024-01-03  0:10:32 0 [Note] WSREP: {v=1,t=1,ut=255,o=4,s=0,sr=0,as=-1,f=4,src=774206ed-985d,srcvid=view_id(REG,3c167a24-b8f0,75),insvid=view_id(UNKNOWN,00000000-0000,0),ru=00000000-0000,r=[-1,-1],fs=5786559,nl=(
)
} 168
2024-01-03  0:10:32 0 [ERROR] WSREP: exception caused by message: {v=1,t=3,ut=255,o=1,s=4,sr=-1,as=3,f=4,src=774206ed-985d,srcvid=view_id(REG,3c167a24-b8f0,75),insvid=view_id(UNKNOWN,00000000-0000,0),ru=00000000-0000,r=[-1,-1],fs=5786566,nl=(
)
}
 state after handling message: evs::proto(evs::proto(acbe31e8-9673, OPERATIONAL, view_id(REG,3c167a24-b8f0,75)), OPERATIONAL) {
current_view=view(view_id(REG,3c167a24-b8f0,75) memb {
  3c167a24-b8f0,0
  4ebe7f59-8874,0
  774206ed-985d,0
  91f27cc6-a5ee,0
  acbe31e8-9673,0
  be92f86c-94d1,0
} joined {
} left {
} partitioned {
}),
input_map=evs::input_map: {aru_seq=4,safe_seq=3,node_index=node: {idx=0,range=[5,4],safe_seq=4} node: {idx=1,range=[5,4],safe_seq=3} node: {idx=2,range=[5,4],safe_seq=3} node: {idx=3,range=[5,4],safe_seq=4} node: {idx=4,range=[5,4],safe_seq=4} node: {idx=5,range=[5,4],safe_seq=4} },
fifo_seq=3248089,
last_sent=4,
known:
3c167a24-b8f0 at tcp://10.*.*.*:4567
{o=1,s=0,i=1,fs=3061648,}
4ebe7f59-8874 at tcp://10.*.*.*:4567
{o=1,s=0,i=1,fs=3065890,}
774206ed-985d at tcp://10.*.*.*:35200
{o=1,s=0,i=1,fs=5786566,}
91f27cc6-a5ee at tcp://10.*.*.*:35200
{o=1,s=0,i=1,fs=5780462,}
acbe31e8-9673 at
{o=1,s=0,i=1,fs=-1,}
be92f86c-94d1 at tcp://10.*.*.*:4567
{o=1,s=0,i=1,fs=5740130,}
 }2024-01-03  0:10:32 0 [ERROR] WSREP: exception from gcomm, backend must be restarted: acbe31e8-9673 aborting due to conflicting prims: older overrides (FATAL)
   at ./gcomm/src/pc_proto.cpp:handle_state():1052
2024-01-03  0:10:32 0 [Note] WSREP: gcomm: terminating thread
2024-01-03  0:10:32 0 [Note] WSREP: gcomm: joining thread
2024-01-03  0:10:32 0 [Note] WSREP: gcomm: closing backend
2024-01-03  0:10:32 0 [Note] WSREP: Forced PC close
2024-01-03  0:10:32 0 [Warning] WSREP: discarding 28 messages from message index
2024-01-03  0:10:32 0 [Note] WSREP: gcomm: closed
2024-01-03  0:10:32 0 [Note] WSREP: New SELF-LEAVE.
2024-01-03  0:10:32 0 [Note] WSREP: Closing send monitor...
2024-01-03  0:10:32 0 [Note] WSREP: Closed send monitor.
2024-01-03  0:10:32 0 [Note] WSREP: Closing replication queue.
2024-01-03  0:10:32 0 [Note] WSREP: Closing slave action queue.
2024-01-03  0:10:32 1 [Note] WSREP: Applier thread exiting ret: 6 thd: 1
2024-01-03  0:10:32 9 [Note] WSREP: Applier thread exiting ret: 6 thd: 9
2024-01-03  0:10:32 6 [Note] WSREP: Applier thread exiting ret: 6 thd: 6
2024-01-03  0:10:32 0 [Note] WSREP: Shifting SYNCED -> CLOSED (TO: 35659)
2024-01-03  0:10:32 0 [Note] WSREP: RECV thread exiting -103: Software caused connection abort
2024-01-03  0:10:32 8 [Note] WSREP: recv_thread() already closing, joining thread.
2024-01-03  0:10:32 8 [Note] WSREP: recv_thread() joined.
2024-01-03  0:10:32 8 [Note] WSREP: ================================================
View:
  id: 00000000-0000-0000-0000-000000000000:-1
  status: non-primary
  protocol_version: -1
  capabilities:
  final: yes
  own_index: -1
  members(0):
=================================================
2024-01-03  0:10:32 8 [Note] WSREP: Non-primary view
2024-01-03  0:10:32 8 [Note] WSREP: Server status change synced -> disconnected
2024-01-03  0:10:32 8 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2024-01-03  0:10:32 8 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2024-01-03  0:10:32 8 [Note] WSREP: Applier thread exiting ret: 6 thd: 8
}

I have a three node galera cluster, one day suddenly appeared`WSREP has not yet prepared node for application use` error can not read and write data, this is the log content, this is a bug?

Generated at Thu Feb 08 09:23:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.