[MDEV-13340] mysqld (or it's thread) crashed during analyze table Created: 2017-07-18  Updated: 2018-02-28  Resolved: 2018-02-28

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 10.1.25
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Pavel Kaidalov Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

Amazon AWS, CentOS 6 x86_64, Galera 3 nodes cluster


Issue Links:
Relates
relates to MDEV-12989 Crash with "an invalid pointer" error Closed

 Description   

This is first and one-time bug, I have "mysqlcheck --skip-write-binlog -a -A" running nightly on Galera cluster by cron, each node runs it different time. The last time this analyze or mysqld during analyze crashed, here is log from mysqld.log:

2017-07-14 13:24:50 7f649b1fa700 InnoDB: FTS Optimize Removing table doc_live_wi/#sql2-5c22-112eeb4
170716  5:40:19 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.1.25-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=77
max_threads=3002
thread_count=50
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6725132 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f62d7e8a008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f61bdf14140 thread_stack 0x48400
2017-07-16 10:25:28 140094283446016 [Note] WSREP: (01c439e4, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.10.1.133:4567

This Galera node did not failed, did not make SST, nothing, it just continue running. However I had stalled "ANALYZE NO_WRITE_TO_BINLOG TABLE `wo
rkflow_state`" statement in "Unlocking tables" state. And next day when "DROP TRIGGER" on the same table was executed - it stalls in the same "Unlocking tables" state and locked whole Galera cluster. I tried to kill the ANALYZE thread with no luck, then I tried to shutdown this Galera node also with no luck - it stalled, the last message was

2017-07-17  3:26:22 140030532045568 [Note] WSREP: killing local connection: 28899415

I waited almost 30 minutes, then made kill -9 to mysqld and after start this node successfully connected to Galera using IST.



 Comments   
Comment by Pavel Kaidalov [ 2017-07-21 ]

Today I hit this bug again, the same situation - "mysqlcheck --skip-write-binlog -a -A" crashed, in mysqld I have stalled query in "Unlocking tables" state:

8233725 root    localhost       live_le       Query   12953     Unlocking tables        ANALYZE NO_WRITE_TO_BINLOG TABLE `test` 0.000

and bug report in log, but now with backtrace:

170721  5:41:38 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.1.25-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=90
max_threads=3002
thread_count=42
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6725132 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f75136b6008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f7b91bb4140 thread_stack 0x48400
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7f849d3cf33b]
/usr/sbin/mysqld(handle_fatal_signal+0x4d5)[0x7f849cf2e6e5]
/lib64/libpthread.so.0(+0x3e0f20f7e0)[0x7f849c52d7e0]
/usr/sbin/mysqld(_ZN11Query_arena10free_itemsEv+0x2a)[0x7f849cd8b2ba]
/usr/sbin/mysqld(_ZN20sp_instr_jump_if_notD0Ev+0x4e)[0x7f849d031f7e]
/usr/sbin/mysqld(_ZN7sp_headD1Ev+0x166)[0x7f849d02b816]
/usr/sbin/mysqld(_ZN7sp_headD0Ev+0x11)[0x7f849d02c791]
/usr/sbin/mysqld(_ZN19Table_triggers_listD1Ev+0x47)[0x7f849ce459f7]
/usr/sbin/mysqld(_Z18intern_close_tableP5TABLE+0x22)[0x7f849cd73162]

When I tried to stop this Galera node - it stalled killing connection:

2017-07-21  9:19:16 140172447185664 [Note] WSREP: applier thread exiting (code:6)
2017-07-21  9:19:16 140172446276352 [Note] WSREP: applier thread exiting (code:6)
2017-07-21  9:19:18 140135580560128 [Note] WSREP: killing local connection: 8829986

I checked how it looks in system processlist:

root      65111  0.0  0.0 108300  1516 ?        S    Jul19   0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --pid-file=/var/lib/my
mysql     67519  148 71.1 70175128 47045672 ?   Sl   Jul19 4682:37  \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/
mysql    109020  0.0  0.1 114324 104036 ?       S    05:41   0:00      \_ addr2line -C -f -e /usr/sbin/mysqld

and tried to run basic strace to see what is going on, it was stalled, here is output, maybe it could be helpful:

[root@db1 ~]# strace -p 109020
Process 109020 attached
read(0, ^CProcess 109020 detached
 <detached ...>
[root@db1 ~]# strace -p 67519
Process 67519 attached
futex(0x7f849dc459c4, FUTEX_WAIT_PRIVATE, 35, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f849dc459c4, FUTEX_WAIT_PRIVATE, 36, NULL^CProcess 67519 detached
 <detached ...>
[root@db1 ~]# strace -p 65111
Process 65111 attached
wait4(-1, ^CProcess 65111 detached
 <detached ...>

and then I made kill -9 and service mysqld start, Galera node again synchronized using IST.

Comment by Andrii Nikitin (Inactive) [ 2017-07-23 ]

kpvadm Is the affected server the same or identical to server in MDEV-12989 ? Was this server recently under the same or similar 'weekly database re-import' jobs mentioned in MDEV-12989 ?
If yes - could you provide info requested in MDEV-12989 ?
Do you know reasons of running mysqlcheck command on regular basis? E.g. is it part of some more complex operation or 'just in case'?

Also for proper troubleshooting we need:
1 .configuration files
2. Full error log
3. output of shell command under normal load
top -cbn1 | grep mysql
4. output of SQL commands:
show global variables;
show global status;
show processlist;
5. details about both tables problem tables:

show create table test\G
show table status like 'test';
show index from test;
show triggers like 'test';

show create table workflow_state\G
show table status like 'workflow_state';
show index from workflow_state;
show triggers like 'workflow_state';

6. Check any errors / warnings in syslog during time of failures
grep -i "err|warn|mysql|fail" /var/log/*

Generated at Thu Feb 08 08:04:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.