@Mark Reibert, we are experiencing the same issue from time to time and would like to set cert.optimistic_pa = no, however I'm not sure how to proceed safely applying this as a permanent configuration ?
Do I need to apply the setting on all nodes or can one node be set for testing?
Is adding such a line in /etc/mysql/mariadb.conf.d/50-server.cnf ok:
wsrep_provider_options = "cert.optimistic_pa = no"
|
Or should I copy/paste all existing values from an already running node, which looks like below, with a lot of options set:
wsrep_provider_options base_dir = /var/lib/mysql/; base_host = 192.168.0.6; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.causal_keepalive_period = PT1S; evs.debug_log_mask = 0x1; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.info_log_mask = 0; evs.install_timeout = PT7.5S; evs.join_retrans_period = PT1S; evs.keepalive_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.use_aggregate = true; evs.user_send_window = 2; evs.version = 1; evs.view_forget_timeout = P1D; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.keep_plaintext_size = 128M; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.fc_single_primary = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = tcp://0.0.0.0:4567; gmcast.mcast_addr = ; gmcast.mcast_ttl = 1; gmcast.peer_timeout = PT3S; gmcast.segment = 0; gmcast.time_wait = PT5S; gmcast.version = 0; ist.recv_addr = 192.168.0.6; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.linger = PT20S; pc.npvo = false; pc.recovery = true; pc.version = 0; pc.wait_prim = true; pc.wait_prim_timeout = PT30S; pc.weight = 1; protonet.version = 0; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.max_ws_size = 2147483647; repl.proto_max = 10; socket.checksum = 2; socket.recv_buf_size = auto; socket.send_buf_size = auto;
|
Thanks for any tips.
mreibert your first stacktrace attached (reibert_mysqld-abort-stacktrace.txt) has one replication applier thread (Thread 73) in the middle of processing of Foreign key constraint's cascade operation (row_ins_foreign_check_on_constraint). This suggest that this crash relates to issue
MDEV-26803, where the reason for the crashes was that cascading executions did not reliably record all rows manipulated through cascade operation in the replication write set. And this has resulted in unsafe parallel applying in replica nodes.The second stacktrace has also ongoing foreign key checking, but in different stages.