[MDEV-16856] Crash Created: 2018-07-30  Updated: 2020-07-20  Resolved: 2018-09-30

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Virtual Columns
Affects Version/s: 10.3.8
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Antonio Fernandes Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None
Environment:

CentOS 7 x86-64


Issue Links:
Relates
relates to MDEV-17005 ASAN heap-use-after-free in innobase_... Closed
relates to MDEV-23033 All slaves crash once in ~24 hours an... Closed

 Description   

The server crashed and through out:

Jul 30 17:10:56 maridb.server mysqld[12566]: 180730 17:10:56 [ERROR] mysqld got signal 11 ;
Jul 30 17:10:56 maridb.server mysqld[12566]: This could be because you hit a bug. It is also possible that this binary
Jul 30 17:10:56 maridb.server mysqld[12566]: or one of the libraries it was linked against is corrupt, improperly built,
Jul 30 17:10:56 maridb.server mysqld[12566]: or misconfigured. This error can also be caused by malfunctioning hardware.
Jul 30 17:10:56 maridb.server mysqld[12566]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Jul 30 17:10:56 maridb.server mysqld[12566]: We will try our best to scrape up some info that will hopefully help
Jul 30 17:10:56 maridb.server mysqld[12566]: diagnose the problem, but since we have already crashed,
Jul 30 17:10:56 maridb.server mysqld[12566]: something is definitely wrong and this may fail.
Jul 30 17:10:56 maridb.server mysqld[12566]: Server version: 10.3.8-MariaDB
Jul 30 17:10:56 maridb.server mysqld[12566]: key_buffer_size=1073741824
Jul 30 17:10:56 maridb.server mysqld[12566]: read_buffer_size=4194304
Jul 30 17:10:56 maridb.server mysqld[12566]: max_used_connections=5
Jul 30 17:10:56 maridb.server mysqld[12566]: max_threads=83
Jul 30 17:10:56 maridb.server mysqld[12566]: thread_count=14
Jul 30 17:10:56 maridb.server mysqld[12566]: It is possible that mysqld could use up to
Jul 30 17:10:56 maridb.server mysqld[12566]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2070312 K  bytes of memory
Jul 30 17:10:56 maridb.server mysqld[12566]: Hope that's ok; if not, decrease some variables in the equation.
Jul 30 17:10:56 maridb.server mysqld[12566]: Thread pointer: 0x7fd998001298
Jul 30 17:10:56 maridb.server mysqld[12566]: Attempting backtrace. You can use the following information to find out
Jul 30 17:10:56 maridb.server mysqld[12566]: where mysqld died. If you see no messages after this, something went
Jul 30 17:10:56 maridb.server mysqld[12566]: terribly wrong...
Jul 30 17:10:56 maridb.server mysqld[12566]: stack_bottom = 0x7fdc64089580 thread_stack 0x49000
Jul 30 17:10:56 maridb.server mysqld[12566]: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x56550ecbfece]
Jul 30 17:10:56 maridb.server mysqld[12566]: /usr/sbin/mysqld(handle_fatal_signal+0x357)[0x56550e75c577]
Jul 30 17:10:56 maridb.server mysqld[12566]: sigaction.c:0(__restore_rt)[0x7fdc718946d0]
Jul 30 17:10:56 maridb.server mysqld[12566]: /usr/sbin/mysqld(+0x95e529)[0x56550e93d529]
Jul 30 17:10:56 maridb.server mysqld[12566]: /usr/sbin/mysqld(+0xa4b4f5)[0x56550ea2a4f5]
Jul 30 17:10:56 maridb.server mysqld[12566]: /usr/sbin/mysqld(+0xa4bfd7)[0x56550ea2afd7]
Jul 30 17:10:56 maridb.server mysqld[12566]: handler/ha_innodb.cc:20822(innobase_allocate_row_for_vcol(THD*, dict_index_t*, mem_block_info_t**, TABLE**, unsigned char**, VCOL_STORAGE**))[0x56550ea2c0bb]
Jul 30 17:10:56 maridb.server mysqld[12566]: row/row0upd.cc:2140(row_upd_store_v_row)[0x56550e9f9311]
Jul 30 17:10:56 maridb.server mysqld[12566]: row/row0upd.cc:2991(row_upd_del_mark_clust_rec)[0x56550e9e3fe3]
Jul 30 17:10:56 maridb.server mysqld[12566]: row/row0upd.cc:3296(row_upd)[0x56550ea2598b]
Jul 30 17:10:56 maridb.server mysqld[12566]: row/row0mysql.cc:2250(row_update_cascade_for_mysql(que_thr_t*, upd_node_t*, dict_table_t*))[0x56550ea2b4f8]
Jul 30 17:10:56 maridb.server mysqld[12566]: row/row0ins.cc:1444(row_ins_foreign_check_on_constraint)[0x56550ea2c0bb]
Jul 30 17:10:56 maridb.server mysqld[12566]: row/row0upd.cc:321(row_upd_check_references_constraints(upd_node_t*, btr_pcur_t*, dict_table_t*, dict_index_t*, unsigned long*, que_thr_t*, mtr_t*) [clone .part.44])[0x56550e9f8e0b]
Jul 30 17:10:56 maridb.server mysqld[12566]: row/row0upd.cc:234(row_upd_check_references_constraints)[0x56550e938114]
Jul 30 17:10:56 maridb.server mysqld[12566]: row/row0upd.cc:3296(row_upd)[0x56550e767c1f]
Jul 30 17:10:56 maridb.server mysqld[12566]: sql/handler.cc:6284(handler::ha_delete_row(unsigned char const*))[0x56550e864201]
Jul 30 17:10:56 maridb.server mysqld[12566]: sql/log_event.cc:14080(Delete_rows_log_event::do_exec_row(rpl_group_info*))[0x56550e857cec]
Jul 30 17:10:56 maridb.server mysqld[12566]: sql/log_event.cc:11519(Rows_log_event::do_apply_event(rpl_group_info*))[0x56550e4e7593]
Jul 30 17:10:57 maridb.server mysqld[12566]: sql/log_event.h:1482(Log_event::apply_event(rpl_group_info*))[0x56550e4f0c5c]
Jul 30 17:10:57 maridb.server mysqld[12566]: pthread_create.c:0(start_thread)[0x7fdc7188ce25]
Jul 30 17:10:57 maridb.server mysqld[12566]: /lib64/libc.so.6(clone+0x6d)[0x7fdc6fc30bad]
Jul 30 17:10:57 maridb.server mysqld[12566]: Trying to get some variables.
Jul 30 17:10:57 maridb.server mysqld[12566]: Some pointers may be invalid and cause the dump to abort.
Jul 30 17:10:57 maridb.server kernel: mysqld[12604]: segfault at 0 ip 000056550e93d529 sp 00007fdc64085dc0 error 4 in mysqld[56550dfdf000+12aa000]
Jul 30 17:10:57 maridb.server mysqld[12566]: Query (0x0):
Jul 30 17:10:57 maridb.server mysqld[12566]: Connection ID (thread ID): 11
Jul 30 17:10:57 maridb.server mysqld[12566]: Status: NOT_KILLED
Jul 30 17:10:57 maridb.server mysqld[12566]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on
Jul 30 17:10:57 maridb.server mysqld[12566]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
Jul 30 17:10:57 maridb.server mysqld[12566]: information that should help you find out what is causing the crash.
Jul 30 17:10:57 maridb.server systemd[1]: mariadb.service: main process exited, code=killed, status=11/SEGV
Jul 30 17:10:57 maridb.server systemd[1]: Unit mariadb.service entered failed state.
Jul 30 17:10:57 maridb.server systemd[1]: mariadb.service failed.

What can it be?

The server is a slave from 10.2 with some users reading data.



 Comments   
Comment by Elena Stepanova [ 2018-08-01 ]

thiru, marko, do you think it could b a variation of MDEV-15855 / MDEV-16779 (complicated by foreign keys and RBR)?
If it is, do you expect it to be fixed along with the above-mentioned bugs, or do we need a test case which reproduces it precisely?

Comment by Thirunarayanan Balathandayuthapani [ 2018-08-01 ]

MDEV-15855 / MDEV-16779 are related to purge thread. But the given issue looks like delete table on table with FK (cascade relation).

Comment by Marko Mäkelä [ 2018-08-02 ]

Right, it looks like a DELETE that is being applied on a replication slave. I see two possibilities:

  1. An ON DELETE SET NULL causes a column to be updated to NULL in the parent table, and some indexed virtual column depends on this column.
  2. An ON DELETE CASCADE causes a row to be deleted in the parent table, and there exist indexed virtual columns in the parent table.

ampf, can you try to get the table schema for us? Also, if you can repeat this easily, more details could be obtained from a debugger. We can give instructions for that.

Comment by Antonio Fernandes [ 2018-08-02 ]

Hello,

I've since tested downgrading the slave to 10.2 but somewhere in the replication flow it crashed the slave again.

Jul 31 19:20:28 mariadb.server kernel: mysqld[13765]: segfault at 0 ip 000055fdfb979db9 sp 00007f2c20053070 error 4 in mysqld[55fdfb11e000+111d000]
Jul 31 19:20:40 mariadb.server kernel: mysqld[13818]: segfault at 0 ip 0000559ba9bf2db9 sp 00007fa1a81fd070 error 4 in mysqld[559ba9397000+111d000]
Jul 31 19:20:51 mariadb.server kernel: mysqld[13871]: segfault at 0 ip 000055e9ef91edb9 sp 00007f2c001c3070 error 4 in mysqld[55e9ef0c3000+111d000]
Jul 31 19:21:02 mariadb.server kernel: mysqld[13946]: segfault at 0 ip 00005608affc4db9 sp 00007fb120524070 error 4 in mysqld[5608af769000+111d000]
Jul 31 19:21:13 mariadb.server kernel: mysqld[14009]: segfault at 0 ip 000055a9a1ba4db9 sp 00007fb7801e5070 error 4 in mysqld[55a9a1349000+111d000]

I've then copied all the files from master (xtrabackup) but did not start the replication. The slave is currently idling in the replication with 10.2 version on it.

Since I don't know where the crash is in the replication flow (what event is processing), I don't have "a schema" besides the 20 or so database (and their tables) schemas.

What do you have me do? Should I upgrade to 10.3 and with the debugger?

Best regards,
Antonio

Comment by Elena Stepanova [ 2018-08-03 ]

The table in question (and the row event that caused the crash) should be fairly easy to locate.
After your slave crashed and restarted, it wrote in the error log which binary log position (and/or GTID position) it starts replication from. This would be the last event that it attempted to execute but failed. If you are using parallel replication, it can be slighly more complicated, but not by much.

Can you find the event in the log and provide structures of the tables that it affects, as well as tables which either refer to or referenced by them through foreign keys?
Please also attach your cnf file(s).

Thanks!

Comment by Elena Stepanova [ 2018-09-02 ]

ampf, any luck locating the event?

Comment by Elena Stepanova [ 2018-10-07 ]

Another possibility is that it's related to MDEV-17005

Comment by Antonio Fernandes [ 2018-10-15 ]

Hi,

I've since upgraded to 10.3 (and maintained InnoDB).

No problems since.

Don't know if you want to close it...

Best regards,
António

Generated at Thu Feb 08 08:32:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.