[MDEV-4158] Crash on applying updates in MariaDB-Galera Created: 2013-02-09  Updated: 2013-03-08  Resolved: 2013-03-08

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Aleksey Sanin (Inactive) Assignee: Elena Stepanova
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Centos 5.8



 Description   

130209 5:48:19 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 5.5.28a-MariaDB-log
key_buffer_size=0
read_buffer_size=131072
max_used_connections=0
max_threads=501
thread_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 194588 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x14c383b0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x426bd0c8 thread_stack 0x80000
??:0(my_print_stacktrace)[0xa9a3ae]
??:0(handle_fatal_signal)[0x6e383b]
:0()[0x3b8a40ebe0]
??:0(plugin_lock(THD*, st_plugin_int*))[0x5a020c]
??:0(ha_checktype(THD*, legacy_db_type, bool, bool))[0x6e925f]
??:0(open_table_def(THD*, TABLE_SHARE*, unsigned int))[0x62c420]
??:0(_Z15get_table_shareP3THDP10TABLE_LISTPcjjPij.clone.7)[0x546ffc]
??:0(open_table(THD*, TABLE_LIST*, st_mem_root*, Open_table_context*))[0x550036]
??:0(open_tables(THD*, TABLE_LIST*, unsigned int, unsigned int, Prelocking_strategy*))[0x5514c1]
??:0(open_and_lock_tables(THD*, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*))[0x552294]
??:0(Rows_log_event::do_apply_event(Relay_log_info const*))[0x7ae815]
??:0(_ZL15wsrep_apply_rbrP3THDPKhm)[0x58fff6]
??:0(wsrep_apply_cb(void*, void const*, unsigned long, long))[0x5905f6]
:0()[0x2aaaab57f94a]
:0()[0x2aaaab588372]
:0()[0x2aaaab588f05]
:0()[0x2aaaab560f94]
:0()[0x2aaaab5617d8]
:0()[0x2aaaab57ee4d]
:0()[0x2aaaab599023]
??:0(wsrep_replication_process(THD*))[0x58fbb3]
??:0(start_wsrep_THD)[0x50af2c]
:0()[0x3b8a40677d]
:0()[0x3b89cd3c1d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): is an invalid pointer
Connection ID (thread ID): 2
Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=on,mrr_sort_keys=on,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
130209 05:48:19 mysqld_safe mysqld from pid file /var/lib/mysql/devdb02.corp.wepay-inc.com.pid ended



 Comments   
Comment by Elena Stepanova [ 2013-02-09 ]

Hi Aleksey,

There isn't really much for investigation to go with..
Is the problem persistent?
Was there anything in the error log prior to the signal?
Do you still have the log of the other node(s) this one was replicating from? Would it be possible to upload the datadir of the crashed node, along with the binlog it was replicating from at the time of the crash, and the cnf file from this node?

Thanks

Comment by Aleksey Sanin (Inactive) [ 2013-02-09 ]

I've seen the crash a couple times in 3 days. Same stack trace in plugin_lock() and same NULL pointer access. There were nothing interesting in the logs on this or other nodes. I am continuing testing and if I see it again I will definitely get all the data you've asked about.

Comment by Elena Stepanova [ 2013-02-19 ]

Moving discussion regarding this bug from the recent comment from MDEV-4179:

>> Lastly, I've actually remembered that we've seen similar issue on dev environment though stack trace was different:

>> https://mariadb.atlassian.net/browse/MDEV-4158

>> It was the same upgrade process though it didn't crash 100% of the time. May be it is a timing issue somewhere?

If it was also happening on slave restart, I guess it might be the same issue. To make sure, I will need to get a confirmation that on wsrep-recovery not only a slave is started and IO thread runs, but SQL thread can start applying events too. If that's the case, it could explain both the stack trace in this report, and the database corruption in MDEV-4179 if the event was ALTER TABLE or something equally crash-unsafe.

A side note:
How did you get key_buffer_size=0 ? MDEV-4179 shows a real value, is it a different build, different package, or different config?

Comment by Aleksey Sanin (Inactive) [ 2013-02-19 ]

The MDEV-4158 was also happening on server restart with slave enabled. And there is a good chance there was an ALTER TABLE there.

For key_buffer_size, I think that MySQL sets it to the default 32K value if it is set to 0 in the config.

Comment by Elena Stepanova [ 2013-03-08 ]

I suppose for now we can assume it's the same issue as MDEV-4179. If it continues to happen or/and there is any new information, we can always re-open it.

Generated at Thu Feb 08 06:54:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.