[MDEV-8108] MariaDB 5.5.43 segfaults on startup Created: 2015-05-06  Updated: 2015-05-08  Resolved: 2015-05-08

Status: Closed
Project: MariaDB Server
Component/s: Admin statements, Views
Affects Version/s: 5.5.43
Fix Version/s: 5.5.44, 10.0.19

Type: Bug Priority: Major
Reporter: Arjen Lentz Assignee: Sergei Golubchik
Resolution: Duplicate Votes: 1
Labels: None
Environment:

Ubuntu 14.04 x64_64


Issue Links:
Duplicate
is duplicated by MDEV-8115 mysql_upgrade crashes the server with... Closed

 Description   

mysqld segfaults on startup.
Stack trace below.
It then restarts through mysqld_safe, and after that appears to be ok-ish with clients able to connect. But it's not in good shape. There may be InnoDB tables with corruption in this installation, which could be a possible trigger for the crash - however, a segfault is always a bug. An specific assert from InnoDB would be (although still annoying) more useful as then more context is shown.

150506 13:41:19 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
150506 13:41:19 [Note] /usr/sbin/mysqld (mysqld 5.5.43-MariaDB-1~trusty) starting as process 19351 ...
150506 13:41:19 InnoDB: The InnoDB memory heap is disabled
150506 13:41:19 InnoDB: Mutexes and rw_locks use GCC atomic builtins
150506 13:41:19 InnoDB: Compressed tables use zlib 1.2.8
150506 13:41:19 InnoDB: Using Linux native AIO
150506 13:41:19 InnoDB: Initializing buffer pool, size = 16.0G
150506 13:41:19 InnoDB: Completed initialization of buffer pool
150506 13:41:19 InnoDB: highest supported file format is Barracuda.
150506 13:41:19  InnoDB: Waiting for the background threads to start
150506 13:41:20 Percona XtraDB (http://www.percona.com) 5.5.42-MariaDB-37.1 started; log sequence number 74624717726974
150506 13:41:20 [Note] Plugin 'FEEDBACK' is disabled.
150506 13:41:20 [Note] Server socket created on IP: '0.0.0.0'.
150506 13:41:20 [Note] Event Scheduler: Loaded 0 events
150506 13:41:20 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.43-MariaDB-1~trusty'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
150506 13:41:20 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 5.5.43-MariaDB-1~trusty
key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=1
max_threads=153
thread_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 352025 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x7f0661bce000
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f0ab8ec9e40 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f0ab981dcee]
/usr/sbin/mysqld(handle_fatal_signal+0x457)[0x7f0ab940ed77]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f0ab84bc340]
/usr/sbin/mysqld(+0x66888d)[0x7f0ab95ad88d]
/usr/sbin/mysqld(+0x675c6d)[0x7f0ab95bac6d]
/usr/sbin/mysqld(+0x4560df)[0x7f0ab939b0df]
/usr/sbin/mysqld(_ZN22Repair_table_statement7executeEP3THD+0xcc)[0x7f0ab939b9fc]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1970)[0x7f0ab92c85e0]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x11a)[0x7f0ab92cd8ba]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1896)[0x7f0ab92cf9e6]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x1c3)[0x7f0ab938dd13]
/usr/sbin/mysqld(handle_one_connection+0x4a)[0x7f0ab938de0a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f0ab84b4182]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f0ab759e47d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f064841d018): is an invalid pointer
Connection ID (thread ID): 6
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
150506 13:41:20 mysqld_safe Number of processes running now: 0
150506 13:41:20 mysqld_safe mysqld restarted



 Comments   
Comment by Elena Stepanova [ 2015-05-06 ]

Hi Arjen,

Do you have a datadir backup on which the crash can be reproduced?
If you do, can you share it with us or, alternatively, run a debug server on it to produce a decent stack trace and all that?

Comment by Arjen Lentz [ 2015-05-07 ]

Hi Elena

No sorry. Unfortunately, time constraints and available infra at that time didn't allow us to create extra snapshots.
I thought the stacktrace was pretty decent - one problem being of course that InnoDB is not being informative at all, we only deduce indirectly that that's where it happened.
InnoDB has long a very bad habit of blowing up on corruption issues, the best cases being asserts but otherwise just segfaulting. It's not shiny.

Comment by Daniel Black [ 2015-05-07 ]

If debug packages where provided it would be quite easy to reverse this back to code (MDEV-658).

Pulling the mysqld out of the deb package, doing an objdump -d mysqld, finding +0xcc from the start of _ZN22Repair_table_statement7executeEP3THD (
Repair_table_statement::execute(THD*) from (echo _ZN22Repair_table_statement7executeEP3THD | demangle ) ). leaves this bit of code.

The segfault therefore happened at 0x46083C

0000000000460770 <_ZN22Repair_table_statement7executeEP3THD>:
...
  460825: 4c 89 e6              mov    %r12,%rsi
  460828: 48 89 4c 24 10        mov    %rcx,0x10(%rsp)
  46082d: 48 8d 0d 95 53 59 00  lea    0x595395(%rip),%rcx        # 9f5bc9 <_ZN18Interruptible_wait20m_interrupt_intervalE+0x3f829>
  460834: 41 0f 95 85 a7 2f 00  setne  0x2fa7(%r13)
  46083b: 00
  46083c: 48 8b 43 08           mov    0x8(%rbx),%rax
  460840: 48 8d 90 a0 0e 00 00  lea    0xea0(%rax),%rdx
  460847: 8b 80 a4 0e 00 00     mov    0xea4(%rax),%eax
  46084d: 83 e0 01              and    $0x1,%eax
  460850: 89 04 24              mov    %eax,(%rsp)
  460853: e8 d8 e3 ff ff        callq  45ec30 <_ZN15MDL_object_lockD1Ev+0x970>
  460858: 84 c0                 test   %al,%al
  46085a: 75 2a                 jne    460886 <_ZN22Repair_table_statement7executeEP3THD+0x116>
  46085c: 48 8b 53 08           mov    0x8(%rbx),%rdx
  460860: 80 ba 6c 11 00 00 00  cmpb   $0x0,0x116c(%rdx)
...

So it happened just before the call to MDL_object_lock::~MDL_object_lock +0x970 which looks really odd to me because the destructor of MDL_object_lock is pretty empty. I suspect since mysql_admin_table doesn't show up in the disassemble this might be the call (its a static method) and it appears quite a few other times.

What is clear is its after the call to check_table_access and before if (!res && !m_lex->no_write_to_binlog) in Repair_table_statement::execute which would make it before the any innodb code at all.

Looking at {{ 46083c: 48 8b 43 08 mov 0x8(%rbx),%rax}} we see earlier in the code 4607d0: 48 8b 05 79 27 b5 00 mov 0xb52779(%rip),%rax # fb2f50 <my_charset_utf8mb4_handler+0x5670>

Looking at the fb2f50

$ readelf -a usr/sbin/mysqld | grep fb2f50
000000fb2f50  000000000008 R_X86_64_RELATIVE                    ff6f80
 
$ readelf -a usr/sbin/mysqld | grep ff6f80
000000fb2f50  000000000008 R_X86_64_RELATIVE                    ff6f80
000000fb6198  03f100000001 R_X86_64_64       0000000000ff6f80 opt_log_slow_admin_sta + 0
000000fb61a0  03f100000001 R_X86_64_64       0000000000ff6f80 opt_log_slow_admin_sta + 0
  1009: 0000000000ff6f80     1 OBJECT  GLOBAL DEFAULT   29 opt_log_slow_admin_state

Which kind of puts the blame on:

thd->enable_slow_log= opt_log_slow_admin_statements;

We also see rbx is loaded from the stack which would point to the argument THD which kind of confirms the like of code.

So thd not right? seem odd if its survived a check_table_access call. Anyway that's my few minutes worth of thoughts.

Comment by Sergei Golubchik [ 2015-05-07 ]

danblack, I've just downloaded mariadb-server-core-5.5_5.5.43+maria-1~trusty_amd64.deb, but disassembly there look different. where did you get yours from?

Comment by Daniel Black [ 2015-05-07 ]

lucid my bad:

This what you have?

0x456930 + 0xcc = 0x4569FC is mov %eax,%edx. Odd. So taking the result of the mysql_admin_table before it puts it into res.

456930 <_ZN22Repair_table_statement7executeEP3THD>:
..
  4569d4: 48 8b 43 08           mov    0x8(%rbx),%rax
  4569d8: 48 89 4c 24 18        mov    %rcx,0x18(%rsp)
  4569dd: 48 8d 0d 0f 2d 59 00  lea    0x592d0f(%rip),%rcx        # 9e96f3 <_ZN18Interruptible_wait20m_interrupt_intervalE+0x3e89b>
  4569e4: 48 8d 90 a0 0e 00 00  lea    0xea0(%rax),%rdx
  4569eb: 8b 80 a4 0e 00 00     mov    0xea4(%rax),%eax
  4569f1: 83 e0 01              and    $0x1,%eax
  4569f4: 89 04 24              mov    %eax,(%rsp)
  4569f7: e8 d4 e1 ff ff        callq  454bd0 <_ZN15MDL_object_lockD0Ev+0xa50>
  4569fc: 89 c2                 mov    %eax,%edx
  4569fe: 48 8b 43 08           mov    0x8(%rbx),%rax
  456a02: 84 d2                 test   %dl,%dl
  456a04: 75 2a                 jne    456a30 <_ZN22Repair_table_statement7executeEP3THD+0x100>
  456a06: 80 b8 6c 11 00 00 00  cmpb   $0x0,0x116c(%rax)
  456a0d: 75 21                 jne    456a30 <_ZN22Repair_table_statement7executeEP3THD+0x100>
  456a0f: 41 8b 4d 68           mov    0x68(%r13),%ecx
  456a13: 49 8b 55 60           mov    0x60(%r13),%rdx
  456a17: 45 31 c0              xor    %r8d,%r8d
  456a1a: be 01 00 00 00        mov    $0x1,%esi
  456a1f: 4c 89 ef              mov    %r13,%rdi
  456a22: e8 69 31 fa ff        callq  3f9b90 <_Z13write_bin_logP3THDbPKcmb>

Comment by Sergei Golubchik [ 2015-05-07 ]

That's probably not "segfaults on startup" after all. There's a client connection that executes REPAIR. So that's probably mysql_upgrade (run automatically by Ubuntu?). And then it's MDEV-8115

Comment by Sergei Golubchik [ 2015-05-08 ]

I'll assume it's MDEV-8115

Generated at Thu Feb 08 07:24:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.