Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34856

Galera Cluster upgrade from 11.0 to 11.1 fails, eventually leads to crash

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Blocker
    • Resolution: Fixed
    • 11.1(EOL), 11.2, 11.3(EOL), 11.4
    • 11.2.6, 11.4.4
    • Galera, Galera SST
    • None

    Description

      Description below was for original 10.11 to 11.4 failure, further testing showed that the actual same problem occurs betweein 11.0 and 11.1. Rolling upgrades from 10.11 to 11.0 and from 11.1 to 11.4 work as expected.
      So it is only the 11.0 to 11.1 step that is critical.

      I'm installing a 10.11 two node cluster with rather basic configuration, wait for a minute, then try a rolling upgrade to 11.4, starting with the 2nd node.

      The upgraded node at first fails to join the cluster with

      Sep 02 12:58:49 node-2 mariadbd[3973]: 2024-09-02 12:58:49 0 [Note] WSREP: Member 1.0 (node-1) synced with group.
      Sep 02 12:58:49 node-2 -wsrep-sst-joiner[4188]: receiving process ended without creating magic file (/tmp/tmp.IJSXAIA9T4/mariadb_backup_galera_info)
      Sep 02 12:58:50 node-2 -wsrep-sst-joiner[4146]: logger: invalid option -- 'r'
      Sep 02 12:58:50 node-2 -wsrep-sst-joiner[4146]: Try 'logger --help' for more information.
      Sep 02 12:58:50 node-2 -wsrep-sst-joiner[4195]: Removing the sst_in_progress file
      Sep 02 12:58:50 node-2 mariadbd[3973]: 2024-09-02 12:58:50 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
      Sep 02 12:58:50 node-2 mariadbd[3973]: 2024-09-02 12:58:50 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address 'node-2' --datadir '/var/lib/mysql/' --parent 3973 --progress 0 [...]
      

      while the donor side does not report anything unusual, showing the state transfer as successful IST, not needding a ST:

      Sep 02 12:58:49 node-1 mariadbd[1878]: 2024-09-02 12:58:49 0 [Note] WSREP: Member 0.0 (node-2) requested state transfer from '*any*'. Selected 1.0 (node-1)(SYNCED) as donor.
      Sep 02 12:58:49 node-1 mariadbd[1878]: 2024-09-02 12:58:49 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 6)
      Sep 02 12:58:49 node-1 mariadbd[1878]: 2024-09-02 12:58:49 2 [Note] WSREP: Detected STR version: 1, req_len: 115, req: STRv1
      Sep 02 12:58:49 node-1 mariadbd[1878]: 2024-09-02 12:58:49 2 [Note] WSREP: IST request: b6d18f4d-692a-11ef-abf2-8bd2c43da2fb:4-6|tcp://node-2:4568
      Sep 02 12:58:49 node-1 mariadbd[1878]: 2024-09-02 12:58:49 2 [Note] WSREP: Server status change synced -> donor
      Sep 02 12:58:49 node-1 mariadbd[1878]: 2024-09-02 12:58:49 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      Sep 02 12:58:49 node-1 mariadbd[1878]: 2024-09-02 12:58:49 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'donor' --address 'node-2:4444/xtrabackup_sst//1' --local-por>
      Sep 02 12:58:49 node-1 mariadbd[1878]: 2024-09-02 12:58:49 0 [Note] WSREP: Donor monitor thread started to monitor
      Sep 02 12:58:49 node-1 mariadbd[1878]: 2024-09-02 12:58:49 2 [Note] WSREP: sst_donor_thread signaled with 0
      Sep 02 12:58:49 node-1 mariadbd[1878]: 2024-09-02 12:58:49 0 [Note] WSREP: async IST sender starting to serve tcp://node-2:4568 sending 2-6, preload starts from 2
      Sep 02 12:58:49 node-1 mariadbd[1878]: 2024-09-02 12:58:49 0 [Note] WSREP: IST sender 2 -> 6
      Sep 02 12:58:49 node-1 mariadbd[2359]: WSREP_SST: [INFO] mariabackup IST started on donor (20240902 12:58:49.919)
      Sep 02 12:58:49 node-1 mariadbd[2359]: WSREP_SST: [INFO] SSL configuration: CA='', CAPATH='', CERT='', KEY='', MODE='DISABLED', encrypt='0' (20240902 12:58:49.959)
      Sep 02 12:58:50 node-1 mariadbd[2359]: WSREP_SST: [INFO] Logging all stderr of SST/mariadb-backup to syslog (20240902 12:58:50.054)
      Sep 02 12:58:50 node-1 -wsrep-sst-donor[2543]: Streaming with mbstream
      Sep 02 12:58:50 node-1 -wsrep-sst-donor[2546]: Bypassing the SST for IST
      Sep 02 12:58:50 node-1 mariadbd[1878]: 2024-09-02 12:58:50 0 [Note] WSREP: SST sent: b6d18f4d-692a-11ef-abf2-8bd2c43da2fb:4
      Sep 02 12:58:50 node-1 mariadbd[1878]: 2024-09-02 12:58:50 0 [Note] WSREP: Server status change donor -> joined
      Sep 02 12:58:50 node-1 mariadbd[1878]: 2024-09-02 12:58:50 0 [Note] WSREP: 1.0 (node-1): State transfer to 0.0 (node-2) complete.
      Sep 02 12:58:50 node-1 mariadbd[1878]: 2024-09-02 12:58:50 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 6)
      Sep 02 12:58:50 node-1 mariadbd[1878]: 2024-09-02 12:58:50 0 [Note] WSREP: Processing event queue:... -nan% (0/0 events) complete.
      Sep 02 12:58:50 node-1 mariadbd[1878]: 2024-09-02 12:58:50 0 [Note] WSREP: Member 1.0 (node-1) synced with group.
      Sep 02 12:58:50 node-1 mariadbd[1878]: 2024-09-02 12:58:50 0 [Note] WSREP: Processing event queue:... 100.0% (1/1 events) complete.
      Sep 02 12:58:50 node-1 mariadbd[1878]: 2024-09-02 12:58:50 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 6)
      Sep 02 12:58:50 node-1 mariadbd[1878]: 2024-09-02 12:58:50 2 [Note] WSREP: Server node-1 synced with group
      Sep 02 12:58:50 node-1 mariadbd[1878]: 2024-09-02 12:58:50 2 [Note] WSREP: Server status change joined -> synced
      Sep 02 12:58:50 node-1 mariadbd[1878]: 2024-09-02 12:58:50 2 [Note] WSREP: Synchronized with group, ready for connections
      Sep 02 12:58:50 node-1 mariadbd[1878]: 2024-09-02 12:58:50 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      Sep 02 12:58:50 node-1 -wsrep-sst-donor[2550]: Total time on donor: 0 seconds
      Sep 02 12:58:50 node-1 -wsrep-sst-donor[2551]: mariabackup IST completed on donor
      Sep 02 12:58:50 node-1 mariadbd[1878]: 2024-09-02 12:58:50 0 [Note] WSREP: Donor monitor thread ended with total time 1 sec
      

      SystemD restarts the server process on node-2 and it runs into the same problem two more times before the restart starts to fail due to a crash instead:

      Sep 02 12:59:19 node-2 mariadbd[5558]: 2024-09-02 12:59:19 2 [Note] WSREP: Prepared IST receiver for 0-12, listening at: tcp://10.30.36.12:4568
      Sep 02 12:59:19 node-2 mariadbd[5558]: 2024-09-02 12:59:19 0 [Note] WSREP: Member 0.0 (node-2) requested state transfer from '*any*'. Selected 1.0 (node-1)(SYNCED) as donor.
      Sep 02 12:59:19 node-2 mariadbd[5558]: 2024-09-02 12:59:19 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 12)
      Sep 02 12:59:19 node-2 mariadbd[5558]: 2024-09-02 12:59:19 2 [Note] WSREP: Requesting state transfer: success, donor: 1
      Sep 02 12:59:19 node-2 mariadbd[5558]: 2024-09-02 12:59:19 2 [Note] WSREP: Resetting GCache seqno map due to different histories.
      Sep 02 12:59:19 node-2 mariadbd[5558]: munmap_chunk(): invalid pointer
      Sep 02 12:59:19 node-2 mariadbd[5558]: 240902 12:59:19 [ERROR] mysqld got signal 6 ;
      Sep 02 12:59:19 node-2 mariadbd[5558]: Sorry, we probably made a mistake, and this is a bug.
      Sep 02 12:59:19 node-2 mariadbd[5558]: Your assistance in bug reporting will enable us to fix this for the next release.
      Sep 02 12:59:19 node-2 mariadbd[5558]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
      Sep 02 12:59:19 node-2 mariadbd[5558]: We will try our best to scrape up some info that will hopefully help
      Sep 02 12:59:19 node-2 mariadbd[5558]: diagnose the problem, but since we have already crashed,
      Sep 02 12:59:19 node-2 mariadbd[5558]: something is definitely wrong and this may fail.
      Sep 02 12:59:19 node-2 mariadbd[5558]: Server version: 11.4.3-MariaDB-ubu2004-log source revision: 5ab81ffe0097a22a774957df28c5223cf0201de3
      Sep 02 12:59:19 node-2 mariadbd[5558]: key_buffer_size=0
      Sep 02 12:59:19 node-2 mariadbd[5558]: read_buffer_size=131072
      Sep 02 12:59:19 node-2 mariadbd[5558]: max_used_connections=0
      Sep 02 12:59:19 node-2 mariadbd[5558]: max_threads=153
      Sep 02 12:59:19 node-2 mariadbd[5558]: thread_count=3
      Sep 02 12:59:19 node-2 mariadbd[5558]: It is possible that mysqld could use up to
      Sep 02 12:59:19 node-2 mariadbd[5558]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 336996 K  bytes of memory
      Sep 02 12:59:19 node-2 mariadbd[5558]: Hope that's ok; if not, decrease some variables in the equation.
      Sep 02 12:59:19 node-2 mariadbd[5558]: Thread pointer: 0x7f2954000c58
      Sep 02 12:59:19 node-2 mariadbd[5558]: Attempting backtrace. You can use the following information to find out
      Sep 02 12:59:19 node-2 mariadbd[5558]: where mysqld died. If you see no messages after this, something went
      Sep 02 12:59:19 node-2 mariadbd[5558]: terribly wrong...
      Sep 02 12:59:19 node-2 mariadbd[5558]: stack_bottom = 0x7f297875cd88 thread_stack 0x49000
      Sep 02 12:59:19 node-2 mariadbd[5558]: /usr/sbin/mariadbd(my_print_stacktrace+0x32)[0x5615938a01c2]
      Sep 02 12:59:19 node-2 mariadbd[5558]: /usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x56159335a875]
      Sep 02 12:59:19 node-2 mariadbd[5558]: sigaction.c:0(__restore_rt)[0x7f298056a420]
      Sep 02 12:59:19 node-2 mariadbd[5774]: addr2line: DWARF error: section .debug_info is larger than its filesize! (0x93f189 vs 0x531098)
      Sep 02 12:59:19 node-2 mariadbd[5558]: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f298007000b]
      Sep 02 12:59:19 node-2 mariadbd[5558]: /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f298004f859]
      Sep 02 12:59:19 node-2 mariadbd[5558]: /lib/x86_64-linux-gnu/libc.so.6(+0x8d26e)[0x7f29800ba26e]
      Sep 02 12:59:19 node-2 mariadbd[5558]: /lib/x86_64-linux-gnu/libc.so.6(+0x952fc)[0x7f29800c22fc]
      Sep 02 12:59:19 node-2 mariadbd[5558]: /lib/x86_64-linux-gnu/libc.so.6(+0x9554c)[0x7f29800c254c]
      Sep 02 12:59:19 node-2 mariadbd[5558]: src/gu_deqmap.hpp:117(gu::DeqMap<long, void const*, std::allocator<void const*> >::index_back() const)[0x7f297a22083e]
      Sep 02 12:59:19 node-2 mariadbd[5558]: src/GCache_seqno.cpp:32(gcache::GCache::seqno_reset(gu::GTID const&))[0x7f297a2055da]
      Sep 02 12:59:19 node-2 mariadbd[5558]: src/replicator_str.cpp:917(galera::ReplicatorSMM::request_state_transfer(void*, int, wsrep_uuid const&, long, void const*, long))[0x7f297a0e6a10]
      Sep 02 12:59:20 node-2 mariadbd[5558]: src/replicator_smm.cpp:2722(galera::ReplicatorSMM::process_st_required(void*, int, wsrep_view_info const*))[0x7f297a0cfcc7]
      Sep 02 12:59:20 node-2 mariadbd[5558]: src/replicator_smm.cpp:3012(galera::ReplicatorSMM::process_prim_conf_change(void*, gcs_act_cchange const&, int, void*))[0x7f297a0d06ff]
      Sep 02 12:59:20 node-2 mariadbd[5558]: src/galera_gcs.hpp:186(galera::Gcs::resume_recv())[0x7f297a0d0d4d]
      Sep 02 12:59:20 node-2 mariadbd[5558]: bits/stl_vector.h:677(std::vector<gcs_act_cchange::member, std::allocator<gcs_act_cchange::member> >::~vector())[0x7f297a10236b]
      Sep 02 12:59:20 node-2 mariadbd[5558]: src/gcs_action_source.cpp:29(Release::~Release())[0x7f297a10285f]
      Sep 02 12:59:20 node-2 mariadbd[5558]: src/replicator_smm.cpp:404(galera::ReplicatorSMM::async_recv(void*))[0x7f297a0cef10]
      Sep 02 12:59:20 node-2 mariadbd[5558]: src/wsrep_provider.cpp:292(galera_recv)[0x7f297a0a2581]
      Sep 02 12:59:20 node-2 mariadbd[5558]: /usr/sbin/mariadbd(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x56159396cbf2]
      Sep 02 12:59:20 node-2 mariadbd[5558]: /usr/sbin/mariadbd(+0xdcc9a1)[0x56159362e9a1]
      Sep 02 12:59:20 node-2 mariadbd[5558]: /usr/sbin/mariadbd(_Z15start_wsrep_THDPv+0x26b)[0x56159361c3bb]
      Sep 02 12:59:20 node-2 mariadbd[5558]: /usr/sbin/mariadbd(+0xd413a6)[0x5615935a33a6]
      Sep 02 12:59:20 node-2 mariadbd[5558]: nptl/pthread_create.c:478(start_thread)[0x7f298055e609]
      Sep 02 12:59:20 node-2 mariadbd[5778]: addr2line: DWARF error: section .debug_info is larger than its filesize! (0x93f189 vs 0x531098)
      Sep 02 12:59:20 node-2 mariadbd[5558]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f298014c353]
      Sep 02 12:59:20 node-2 mariadbd[5558]: Trying to get some variables.
      Sep 02 12:59:20 node-2 mariadbd[5558]: Some pointers may be invalid and cause the dump to abort.
      Sep 02 12:59:20 node-2 mariadbd[5558]: Query (0x0): (null)
      Sep 02 12:59:20 node-2 mariadbd[5558]: Connection ID (thread ID): 2
      Sep 02 12:59:20 node-2 mariadbd[5558]: Status: NOT_KILLED
      Sep 02 12:59:20 node-2 mariadbd[5558]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on>
      Sep 02 12:59:20 node-2 mariadbd[5558]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains
      Sep 02 12:59:20 node-2 mariadbd[5558]: information that should help you find out what is causing the crash.
      Sep 02 12:59:20 node-2 mariadbd[5558]: We think the query pointer is invalid, but we will try to print it anyway.
      Sep 02 12:59:20 node-2 mariadbd[5558]: Query:
      Sep 02 12:59:20 node-2 mariadbd[5558]: Writing a core file...
      

      Attachments

        Issue Links

          Activity

            People

              janlindstrom Jan Lindström
              hholzgra Hartmut Holzgraefe
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.