[MDEV-30515] stop slave causes the master server crash with semi-sync replication Created: 2023-01-31  Updated: 2023-02-13

Status: Open
Project: MariaDB Server
Component/s: Replication, Server
Affects Version/s: None
Fix Version/s: 10.5, 10.6

Type: Bug Priority: Major
Reporter: Allen Lee (Inactive) Assignee: Oleksandr Byelkin
Resolution: Unresolved Votes: 1
Labels: None
Environment:

centos 7


Attachments: File master.7z.001     File master.7z.002     File maxscale.cnf     File server.cnf     Zip Archive slave.zip    

 Description   

The customer reported that stop slave caused the master down/crash. During my repro, I was able to see that master got crashed once slave stopped manually.
The followings are the error log from master and slave when slave was stopped manually.
For testing purpose, sysbech was run against backend server via 2 maxscale servers.

  • 200 threads with 90:10 (write/read workload) - port : 2582
  • 200 threads with 1000 (read only workload) - port : 2482
  • on master

    2023-01-31  8:55:59 4481 [ERROR] Semi-sync master failed on net_flush() before waiting for slave reply
    2023-01-31  8:55:59 4481 [Note] Stop semi-sync binlog_dump to slave (server_id: 2)
    2023-01-31  8:56:00 4481 [Warning] Aborted connection 4481 to db: 'unconnected' user: 'repl' host: '192.168.254.53' (Failed to run hook 'after_send_event')
    2023-01-31  8:56:09 4204 [Warning] Timeout waiting for reply of binlog (file: mariadb-bin.000001, pos: 214503559), semi-sync up to file mariadb-bin.000001, position 214503085.
    2023-01-31  8:56:09 4204 [Note] Semi-sync replication switched OFF.
    2023-01-31  8:56:09 4901 [Warning] Aborted connection 4901 to db: 'unconnected' user: 'unauthenticated' host: '192.168.254.53' (Got an error reading communication packets)
    2023-01-31  8:56:09 4901 [Warning] Aborted connection 4901 to db: 'unconnected' user: 'unauthenticated' host: '192.168.254.53' (This connection closed normally without authentication)
    2023-01-31  8:56:09 4346 [Warning] Aborted connection 4346 to db: 'unconnected' user: 'mxs' host: '192.168.254.54' (Got an error reading communication packets)
    2023-01-31  8:56:09 4903 [Warning] Aborted connection 4903 to db: 'unconnected' user: 'unauthenticated' host: '192.168.254.55' (Got an error reading communication packets)
    2023-01-31  8:56:09 4903 [Warning] Aborted connection 4903 to db: 'unconnected' user: 'unauthenticated' host: '192.168.254.55' (This connection closed normally without authentication)
    2023-01-31  8:56:09 4904 [Warning] Aborted connection 4904 to db: 'unconnected' user: 'unauthenticated' host: '192.168.254.54' (Got an error reading communication packets)
    2023-01-31  8:56:09 4904 [Warning] Aborted connection 4904 to db: 'unconnected' user: 'unauthenticated' host: '192.168.254.54' (This connection closed normally without authentication)
    2023-01-31  8:56:09 4347 [Warning] Aborted connection 4347 to db: 'unconnected' user: 'mxs' host: '192.168.254.55' (Got an error reading communication packets)
    230131  8:56:10 [ERROR] mysqld got signal 11 ;
    This could be because you hit a bug. It is also possible that this binary
    or one of the libraries it was linked against is corrupt, improperly built,
    or misconfigured. This error can also be caused by malfunctioning hardware.
     
    To report this bug, see https://mariadb.com/kb/en/reporting-bugs
     
    We will try our best to scrape up some info that will hopefully help
    diagnose the problem, but since we have already crashed,
    something is definitely wrong and this may fail.
     
    Server version: 10.5.16-11-MariaDB-enterprise-log
    key_buffer_size=33554432
    read_buffer_size=131072
    max_used_connections=408
    max_threads=65537
    thread_count=301
    It is possible that mysqld could use up to
    key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 143409586 K  bytes of memory
    Hope that's ok; if not, decrease some variables in the equation.
     
    Thread pointer: 0x7f4388437248
    Attempting backtrace. You can use the following information to find out
    where mysqld died. If you see no messages after this, something went
    terribly wrong...
    2023-01-31  8:56:15 0 [Note] mariadbd: Aria engine: starting recovery
    recovered pages: 0% 25% 59% 92% 100% (0.0 seconds); tables to flush: 1 0
     (0.0 seconds);
    2023-01-31  8:56:15 0 [Note] mariadbd: Aria engine: recovery done
    

  • on slave

    2023-01-31  8:55:59 6387 [Note] Error reading relay log event: slave SQL thread was killed
    2023-01-31  8:55:59 6387 [Note] Slave SQL thread exiting, replication stopped in log 'mariadb-bin.000001' at position 214503085; GTID position '0-1-370185'
    2023-01-31  8:55:59 6387 [Note] master was 192.168.254.52:3306
    2023-01-31  8:55:59 6386 [Note] Slave I/O thread exiting, read up to log 'mariadb-bin.000001', position 214503085; GTID position 0-1-370185
    2023-01-31  8:55:59 6386 [Note] master was 192.168.254.52:3306
    2023-01-31  8:56:01 6386 [Note] cannot connect to master to kill slave io_thread's connection
    2023-01-31  9:01:12 6340 [Warning] Aborted connection 6340 to db: 'unconnected' user: 'root' host: 'localhost' (Got timeout reading communication packets)
    

This can be reproducible in 10.5.x and 10.6.x latest versions.



 Comments   
Comment by Andrei Elkin [ 2023-02-07 ]

sanja, here is the crashed stack (which should've been in the description) that does not have anything to do with replication

Thread 1 (Thread 0x7fb1242c7700 (LWP 74620)):
#0  0x00007fb32a311aa1 in pthread_kill () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x0000555c91447fae in handle_fatal_signal (sig=11) at /usr/src/debug/MariaDB-/src_0/sql/signal_handler.cc:344
        curr_time = 1675228633
        tm = {tm_sec = 13, tm_min = 17, tm_hour = 14, tm_mday = 1, tm_mon = 1, tm_year = 123, tm_wday = 3, tm_yday = 31, tm_isdst = 0, tm_gmtoff = 32400, tm_zone = 0x555c9447cc70 "KST"}
        thd = 0x7fb0e43dab58
        print_invalid_query_pointer = true
#2  <signal handler called>
No symbol table info available.
#3  append_query_string (csinfo=0x79f12, to=to@entry=0x7fb1242c5e70, str=0x7fb0ec384198 <incomplete sequence \345>, len=8, no_backslash=<optimized out>) at /usr/src/debug/MariaDB-/src_0/sql/log_event_server.cc:527
        beg = 0x7fb1242c5e8e ""
        ptr = 0x7fb1242c5e8e ""
        orig_len = <optimized out>
#4  0x0000555c9146dbc6 in Item_param::value_query_val_str (this=0x7fb0ec4de030, thd=0x7fb0e43dab58, str=0x7fb1242c5e70) at /usr/src/debug/MariaDB-/src_0/sql/item.cc:4963
No locals.
#5  0x0000555c9146dd96 in Item_param::query_val_str (this=<optimized out>, thd=<optimized out>, str=str@entry=0x7fb1242c5e70) at /usr/src/debug/MariaDB-/src_0/sql/item.cc:4980
No locals.
#6  0x0000555c9146e06b in Item_param::append_for_log (this=<optimized out>, thd=<optimized out>, str=0x7fb1242c60a0) at /usr/src/debug/MariaDB-/src_0/sql/item.cc:5279
        buf = {<String> = {<Charset> = {m_charset = 0x555c9251f300 <my_charset_bin>}, <Binary_string> = {<Static_binary_string> = {<Sql_alloc> = {<No data fields>}, Ptr = 0x7fb1242c5e8e "", str_length = 0}, Alloced_length = 80, extra_alloc = 0, alloced = false, thread_specific = false}, <No data fields>}, buff = "\000\000\240\000\000\000\000\000\000\000\370\320M\354\260\177\000\000\037\000\000\000\000\000\000\000\060\340M\354\260\177\000\000\320^,$\261\177\000\000U\v,\221\\U\000\000\240`,$\261\177\000\000\037\000\000\000\000\000\000\000 _,$\261\177\000\000\302\025,\221\\U"}
        val = <optimized out>
#7  0x0000555c9124bc38 in append (p=0x7fb0ec4de0c8, this=<synthetic pointer>) at /usr/src/debug/MariaDB-/src_0/sql/item.h:595
No locals.
#8  insert_params_with_log (stmt=<optimized out>, null_array=0x7fb0e414cac2 "", read_pos=0x7fb0e414cacd "def", data_end=0x7fb0e414cacc "", query=0x7fb1242c60a0) at /usr/src/debug/MariaDB-/src_0/sql/sql_prepare.cc:802
        param = 0x7fb0ec4de030
        it = 0x7fb0ec4dec78
        thd = 0x7fb0e43dab58
        begin = 0x7fb0ec4dec78
        end = 0x7fb0ec4dec80
        acc = {thd = 0x7fb0e43dab58, src = 0x7fb0ec4dd0f8 "SELECT c FROM sbtest1 WHERE id=?", src_len = 32, from = 0, dst = <optimized out>}
#9  0x0000555c912503ab in Prepared_statement::set_parameters (this=this@entry=0x7fb0ec29f928, expanded_query=expanded_query@entry=0x7fb1242c60a0, packet=0x7fb0e414cac2 "", packet_end=<optimized out>) at /usr/src/debug/MariaDB-/src_0/sql/sql_prepare.cc:4444
        null_array = 0x7fb0e414cac2 ""
        is_sql_ps = false
        res = false
#10 0x0000555c912537c6 in Prepared_statement::execute_loop (this=0x7fb0ec29f928, expanded_query=0x7fb1242c60a0, open_cursor=<optimized out>, packet=<optimized out>, packet_end=<optimized out>) at /usr/src/debug/MariaDB-/src_0/sql/sql_prepare.cc:4520
No locals.
#11 0x0000555c91254e60 in mysql_stmt_execute_common (thd=0x7fb0e43dab58, stmt_id=<optimized out>, packet=0x7fb0e414cac2 "", packet_end=0x7fb0e414cacc "", cursor_flags=0, bulk_op=bulk_op@entry=false, read_types=read_types@entry=false) at /usr/src/debug/MariaDB-/src_0/sql/sql_prepare.cc:3487
        expanded_query = {<Charset> = {m_charset = 0x555c9251f300 <my_charset_bin>}, <Binary_string> = {<Static_binary_string> = {<Sql_alloc> = {<No data fields>}, Ptr = 0x7fb0ec377498 "SELECT c FROM sbtest1 WHERE id=", str_length = 31}, Alloced_length = 160, extra_alloc = 128, alloced = true, thread_specific = false}, <No data fields>}
        stmt = 0x7fb0ec29f928
        save_protocol = 0x7fb0e43db0c8
        open_cursor = false
#12 0x0000555c91254fed in mysqld_stmt_execute (thd=<optimized out>, packet_arg=<optimized out>, packet_length=<optimized out>) at /usr/src/debug/MariaDB-/src_0/sql/sql_prepare.cc:3262
        packet = <optimized out>
        flags = <optimized out>
        packet_end = <optimized out>
        stmt_id = <optimized out>
#13 0x0000555c9123fd63 in dispatch_command (command=command@entry=COM_STMT_EXECUTE, 

As noted elsewhere
> The failure looks to be related to csinfo=0x79f12 passed to #3 from #4 of Item_param::value_query_val_str (). csinfo is an address so the value is apparently wrong.

Generated at Thu Feb 08 10:16:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.