[MDEV-7276] Mariadb 10.0.14 Crash Signal 11 Created: 2014-12-06  Updated: 2015-01-13  Resolved: 2015-01-13

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.0.14
Fix Version/s: 10.0.15

Type: Bug Priority: Major
Reporter: Peter McLarty Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: None
Environment:

Redhat Linux 2.6.32-431.29.2.el6.x86_64
8 vcpu with 16GB RAM on Xen Host


Attachments: PNG File innodb_buffer_pool_free.png     File server.cnf    
Issue Links:
Relates
relates to MDEV-6808 MariaDB 10.0.13 crash with optimizer_... Closed

 Description   

Server version: 10.0.14-MariaDB-log
key_buffer_size=44040192
read_buffer_size=1048576
max_used_connections=1850
max_threads=503
thread_count=203
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1671067 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x7f737b4df008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f7395c94c48 thread_stack 0x48000
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0xb6b85b]
/usr/sbin/mysqld(handle_fatal_signal+0x398)[0x723f08]
/lib64/libpthread.so.0(+0xf710)[0x7f7669c97710]
/usr/sbin/mysqld[0x5f3588]
/usr/sbin/mysqld[0x601364]
/usr/sbin/mysqld[0x60198b]
/usr/sbin/mysqld(_Z11choose_planP4JOINy+0xf5)[0x601ef5]
/usr/sbin/mysqld[0x52a437]
/usr/sbin/mysqld(_ZN4JOIN14optimize_innerEv+0x6de)[0x61d3de]
/usr/sbin/mysqld(_ZN4JOIN8optimizeEv+0x28)[0x61f898]
/usr/sbin/mysqld(_ZN18st_select_lex_unit8optimizeEv+0x1e6)[0x65eac6]
/usr/sbin/mysqld(_ZN18st_select_lex_unit4execEv+0x82)[0x65ec32]
/usr/sbin/mysqld(_Z11mysql_unionP3THDP3LEXP13select_resultP18st_select_lex_unitm+0x2e)[0x66057e]
/usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x85)[0x623145]
/usr/sbin/mysqld[0x5cac06]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x4c97)[0x5d58d7]
/usr/sbin/mysqld[0x5d7402]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1b20)[0x5d95c0]
/usr/sbin/mysqld(_Z26threadpool_process_requestP3THD+0xa7)[0x6ca037]
/usr/sbin/mysqld[0x6f94fd]
/usr/sbin/mysqld[0xa6137d]
/lib64/libpthread.so.0(+0x79d1)[0x7f7669c8f9d1]
/lib64/libc.so.6(clone+0x6d)[0x7f76683ac86d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f73b07f4020): is an invalid pointer
Connection ID (thread ID): 132701
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=off,mrr_sort_keys=on,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
141206 00:04:58 mysqld_safe Number of processes running now: 0
141206 00:04:58 mysqld_safe mysqld restarted
141206  0:05:00 [Warning] option 'table_cache': unsigned value 2097152 adjusted to 524288
141206  0:05:00 [Warning] option 'table_definition_cache': unsigned value 2097152 adjusted to 524288
141206  0:05:00 [Note] InnoDB: Using mutexes to ref count buffer pool pages
141206  0:05:00 [Note] InnoDB: The InnoDB memory heap is disabled
141206  0:05:00 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
141206  0:05:00 [Note] InnoDB: Memory barrier is not used
141206  0:05:00 [Note] InnoDB: Compressed tables use zlib 1.2.3
141206  0:05:00 [Note] InnoDB: Using Linux native AIO
141206  0:05:00 [Note] InnoDB: Using CPU crc32 instructions
141206  0:05:00 [Note] InnoDB: Initializing buffer pool, size = 8.0G
141206  0:05:01 [Note] InnoDB: Completed initialization of buffer pool
141206  0:05:01 [Note] InnoDB: Highest supported file format is Barracuda.
141206  0:05:01 [Note] InnoDB: Log scan progressed past the checkpoint lsn 484491799379
141206  0:05:01 [Note] InnoDB: Database was not shutdown normally!
141206  0:05:01 [Note] InnoDB: Starting crash recovery.
141206  0:05:01 [Note] InnoDB: Reading tablespace information from the .ibd files...
141206  0:05:02 [Note] InnoDB: Restoring possible half-written data pages
141206  0:05:02 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Doing recovery: scanned up to log sequence number 484497041920
InnoDB: Doing recovery: scanned up to log sequence number 484502284800
InnoDB: Doing recovery: scanned up to log sequence number 484507527680
InnoDB: Doing recovery: scanned up to log sequence number 484512770560
InnoDB: Doing recovery: scanned up to log sequence number 484518013440
InnoDB: Doing recovery: scanned up to log sequence number 484523256320
InnoDB: Doing recovery: scanned up to log sequence number 484528499200
InnoDB: Doing recovery: scanned up to log sequence number 484533742080
InnoDB: Doing recovery: scanned up to log sequence number 484538984960
InnoDB: Doing recovery: scanned up to log sequence number 484540297680
InnoDB: Transaction 479796262 was in the XA prepared state.
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 0 row operations to undo
InnoDB: Trx id counter is 479796736
141206  0:05:03 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
InnoDB: In a MySQL replication slave the last master binlog file
InnoDB: position 0 998653724, file name mysql-bin.000064
InnoDB: Last MySQL binlog file position 0 666279708, file name ./mysql-bin.000004
141206  0:05:05 [Note] InnoDB: 128 rollback segment(s) are active.
InnoDB: Starting in background the rollback of uncommitted transactions
2014-12-06 00:05:05 7fe33b7ff700  InnoDB: Rollback of non-prepared transactions completed
141206  0:05:05 [Note] InnoDB: Waiting for purge to start
141206  0:05:05 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.20-68.0 started; log sequence number 484540297680
141206  0:05:05 [Note] Plugin 'FEEDBACK' is disabled.
141206  0:05:06 [Note] CONNECT: Version 1.03.0003 Sep 25 2014 11:08:21
141206  0:05:06 [Note] Recovering after a crash using mysql-bin
141206  0:05:12 [Note] Starting crash recovery...
2014-12-06 00:05:12 7fe5bfa70820  InnoDB: Starting recovery for XA transactions...
2014-12-06 00:05:12 7fe5bfa70820  InnoDB: Transaction 479796262 in prepared state after recovery
2014-12-06 00:05:12 7fe5bfa70820  InnoDB: Transaction contains changes to 1 rows
2014-12-06 00:05:12 7fe5bfa70820  InnoDB: 1 transactions in prepared state after recovery
141206  0:05:12 [Note] Found 1 prepared transaction(s) in InnoDB
141206  0:05:12 [Note] Crash recovery finished.
141206  0:05:12 [ERROR] Failed to create a socket for IPv6 '::': errno: 97.
141206  0:05:12 [Note] Server socket created on IP: '0.0.0.0'.
141206  0:05:12 [ERROR] Failed to create a socket for IPv6 '::': errno: 97.
141206  0:05:12 [Note] Server socket created on IP: '0.0.0.0'.
141206  0:05:12 [Warning] 'user' entry 'itroot@rhel-mysql-test1' ignored in --skip-name-resolve mode.
141206  0:05:12 [ERROR] mysqld: Table './mysql/event' is marked as crashed and should be repaired
141206  0:05:12 [Warning] Checking table:   './mysql/event'
141206  0:05:12 [ERROR] mysql.event: 1 client is using or hasn't closed the table properly
141206  0:05:12 [Note] Event Scheduler: Loaded 1 event
141206  0:05:12 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.0.14-MariaDB-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
141206  0:05:34 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

Couldn't restart with optimizer_switch parameters kept failing on that
It then recycled in restarting the database until the optimizer_switch and optimizer_use_condition_selectivity parameters were removed from server.cnf



 Comments   
Comment by Peter McLarty [ 2014-12-06 ]

Interesting event - the buffer pool shows all pages free from around midnight before the crash

Comment by Peter McLarty [ 2014-12-08 ]

I have conducted some testing today to see if i can better isolate the cause of this problem
I have a smaller test server to conducting recovery tests of databases. It has a copy of very recent production data.

I was using Jmeter to run a series of real queries from our production database which don't always perform well and would likely benefit from histograms. I was only running 2 test threads in Jmeter

Setting the optimizer_use_condition_selectivity to either 4 or 5 will crash this database, below is the optimizer switch settings from the crash entry in that log,

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on

Comment by Elena Stepanova [ 2014-12-08 ]

Hi Peter,

Would you be able to provide the query and involved data structures (or better datadump from these tables, if possible)?
If it includes sensitive information, you can upload it to our ftp.askmonty.org/private, so that only MariaDB developers will have access to it.

Also, in the initial description you quoted the error log, but the quote ends 3 lines after 'signal 11'. Is it all yo uhad in the log? Wasn't there at least an attempt of printing a stack trace, followed by the connection id (and possibly the query) where the crash happened?

Comment by Peter McLarty [ 2014-12-09 ]

I have added the additional information to the description, missed it originally copying and pasting buffer contents didn't completely copy. Not sure what tables and query I am looking for? I will have to speak to my manager about what i can provide in the way of schema and data dump objects. Need to be aware of Australian privacy laws if the table contains sensitive data

Comment by Elena Stepanova [ 2014-12-09 ]

Hi Peter,

Thank you.

Even if you can't provide the data, please paste the table structures and of course the query. You can obfuscate the table and column names (call them col1, col2 etc. if you wish, just do it consistently through the tables and the query). I hope this way there will be no problem with the law.

As I understand from your previous query, you are able to reproduce the crash. In this case, locating the exact query and involved tables is not difficult.
Before the next time you reproduce it, please turn on general log (SET GLOBAL general_log=1 or put general_log to your cnf file). Optionally, also set general_log_file=<file location> to make sure it's written where you want.
After the server crashes, you'll see in your error log the stack trace similar to the one that you pasted. Please note the following lines after the stack trace:

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f73b07f4020): is an invalid pointer
Connection ID (thread ID): 132701

In the general log you created, you will see numerous records of the kind

     123344 Query SELECT ...
     133454 Query UPDATE ...

etc.

The first number is the connection ID, which corresponds the Connection ID from the error log above.
Find the LAST query printed in the general log which has the same ID as the one printed in the error log. In all likelihood it will be the guilty query.
Then just look at it to see which tables it uses, their structures along with the query is what we want to see.

Don't forget to disable general log again, as it can affect performance.

Thanks.

Comment by Peter McLarty [ 2015-01-13 ]

I have uploaded to ftp some information on this. I can now reproduce the crash. It seems the problem won't occur when there is no new data being added to the database as I test instance, restored copy would not break.
I started to test against our devs main instance and as that routinely has test cases that generate data, and I can get the failure there.

I have uploaded some gdb info, the schema of the tables which seem to have been involved and the query that was the last in the general.log at the time of the failure

Comment by Elena Stepanova [ 2015-01-13 ]

Thanks, I could now reproduce the crash on 10.0.14.
It was fixed in 10.0.15 by the following revision:

revno: 4420
revision-id: psergey@askmonty.org-20140930171103-29ti2smqbo9wydj1
parent: psergey@askmonty.org-20140926144840-cd94pd4bvdia8t19
committer: Sergey Petrunya <psergey@askmonty.org>
branch nick: 10.0-cp
timestamp: Tue 2014-09-30 21:11:03 +0400
message:
  MDEV-6808: MariaDB 10.0.13 crash with optimizer_use_condition_selectivity > 1
  
  Fix a trivial typo: in table_multi_eq_cond_selectivity(), reset the loop variable
  between loops.

Generated at Thu Feb 08 07:18:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.