[MDEV-14662] crash when packets out of order Created: 2017-12-15  Updated: 2020-08-25  Resolved: 2018-02-14

Status: Closed
Project: MariaDB Server
Component/s: Server, Storage Engine - Spider
Affects Version/s: 10.3
Fix Version/s: 10.0.35, 10.1.32, 10.2.14, 10.3.5

Type: Bug Priority: Major
Reporter: Simon Mudd Assignee: Sergei Golubchik
Resolution: Fixed Votes: 1
Labels: None


 Description   

I noticed mysqld crash after running for some time.
This has the spider engine running but I don't think this crash looks spider related.

Version running was 10.3 branch commit 159a6c2e608d04732cb6

Logging shows

mysqld: /home/myuser/src/server/sql/net_serv.cc:1163: ulong my_real_read(NET*, size_t*, my_bool): Assertion `0' failed.
171215 10:36:07 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.3.3-MariaDB-debug-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=3
max_threads=302
thread_count=30
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 485739 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7fa7ac005030
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fa8b4249e68 thread_stack 0x49000
/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x3d)[0x1259b0b]
/usr/local/mysql/bin/mysqld(handle_fatal_signal+0x3a3)[0x9cba3a]
/lib64/libpthread.so.0(+0xf5e0)[0x7fa8d04675e0]
/lib64/libc.so.6(gsignal+0x37)[0x7fa8cecc91f7]
/lib64/libc.so.6(abort+0x148)[0x7fa8cecca8e8]
/lib64/libc.so.6(+0x2e266)[0x7fa8cecc2266]
/lib64/libc.so.6(+0x2e312)[0x7fa8cecc2312]
/usr/local/mysql/bin/mysqld[0x6351d8]
/usr/local/mysql/bin/mysqld(my_net_read_packet_reallen+0x4a)[0x63527c]
/usr/local/mysql/bin/mysqld(my_net_read_packet+0x2f)[0x635230]
/usr/local/mysql/bin/mysqld[0x688288]
mysys/stacktrace.c:269(my_print_stacktrace)[0x689aa0]
sql/signal_handler.cc:168(handle_fatal_signal)[0x68893b]
sql/sql_acl.cc:12979(server_mpvio_read_packet(st_plugin_vio*, unsigned char**))[0x688cc3]
sql/sql_connect.cc:1088(check_connection(THD*))[0x8598ab]
sql/sql_connect.cc:1157(login_connection(THD*))[0x859a1b]
sql/sql_connect.cc:1334(thd_prepare_connection(THD*))[0x85a086]
sql/sql_connect.cc:1411(do_handle_one_connection(CONNECT*))[0x85a2a7]
sql/sql_connect.cc:1327(handle_one_connection)[0x85a05b]
perfschema/pfs.cc:1865(pfs_spawn_thread)[0x11e48d0]
/lib64/libpthread.so.0(+0x7e25)[0x7fa8d045fe25]
/lib64/libc.so.6(clone+0x6d)[0x7fa8ced8c34d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): is an invalid pointer

Reporting in case this is something that's not known.



 Comments   
Comment by Simon Mudd [ 2017-12-15 ]

Note: server on startup reports as 10.3.3-MariaDB-debug-log

Comment by Elena Stepanova [ 2017-12-15 ]

Since you're already running a debug build, could you please get it produce a coredump (if it doesn't yet) and collect all threads' stack trace from it?

Comment by Daniel Black [ 2018-01-04 ]

Looks like some authentication protocol error with the mysql client you are using.

As a MITM debugging of this something like the following can show the bytes sent/received. UNIX-> TCP options also exist:

socat -x -v UNIX-LISTEN:/tmp/db.sock,fork UNIX:/var/lib/mysql/mysql.sock

Comment by Simon Mudd [ 2018-01-12 ]

Provided core dump file to support site with some comments.

Comment by Simon Mudd [ 2018-01-30 ]

Note sure that using the socat options suggested would be good. The server was doing a "parallel dump" of a 300+GB database into the spider node and the failure took place after about 1 hour. So to me catching adhoc traffic between the server and the client(s) is not going to be useful here.

Generated at Thu Feb 08 08:15:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.