[MDEV-13747] MariaDB 10.1.21 server sudden crash - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: 10.1.21
Fix Version/s: 10.1.30
Component/s: Galera
Labels:
None
Environment:
CentOS 7.3.1611

Description

2017-09-06 14:50:21 140465624627968 [Note] WSREP: cluster conflict due to high priority abort for threads:

2017-09-06 14:50:21 140465624627968 [Note] WSREP: Winning thread:

   THD: 25160306, mode: total order, state: executing, conflict: no conflict, seqno: 1598878419

   SQL: ALTER TABLE `catalog_product_index_price` ENABLE KEYS

2017-09-06 14:50:21 140465624627968 [Note] WSREP: Victim thread:

   THD: 25160307, mode: local, state: executing, conflict: no conflict, seqno: -1

   SQL: SELECT GET_LOCK('magentodev19.index_process_18', '5')

2017-09-06 14:50:21 140465624627968 [Note] WSREP: MDL conflict db=magentodev19 table=catalog_product_index_price ticket=4 solved by abort

170906 14:50:21 [ERROR] mysqld got signal 11 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.1.21-MariaDB

key_buffer_size=33554432

read_buffer_size=131072

max_used_connections=469

max_threads=1026

thread_count=409

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2286396 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7faa805c5008

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x7faa76b20080 thread_stack 0x48400

(my_addr_resolve failure: fork)

/usr/sbin/mysqld(my_print_stacktrace+0x2e) [0x7fc8e12309ce]

/usr/sbin/mysqld(handle_fatal_signal+0x305) [0x7fc8e0d56355]

/lib64/libpthread.so.0(+0xf370) [0x7fc8e0372370]

/usr/sbin/mysqld(MDL_lock::Ticket_list::remove_ticket(MDL_ticket*)+0x11) [0x7fc8e0cac3c1]

/usr/sbin/mysqld(MDL_lock::remove_ticket(LF_PINS*, MDL_lock::Ticket_list MDL_lock::*, MDL_ticket*)+0x8c) [0x7fc8e0cacedc]

/usr/sbin/mysqld(MDL_context::release_lock(enum_mdl_duration, MDL_ticket*)+0x24) [0x7fc8e0cadf34]

/usr/sbin/mysqld(MDL_context::release_locks_stored_before(enum_mdl_duration, MDL_ticket*)+0x35) [0x7fc8e0cadfb5]

/usr/sbin/mysqld(mysql_execute_command(THD*)+0xaed) [0x7fc8e0bd10fd]

/usr/sbin/mysqld(mysql_parse(THD*, char*, unsigned int, Parser_state*)+0x332) [0x7fc8e0bd9a02]

/usr/sbin/mysqld(+0x439229) [0x7fc8e0bda229]

/usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x1f5a) [0x7fc8e0bdc83a]

/usr/sbin/mysqld(do_command(THD*)+0x14a) [0x7fc8e0bdd6aa]

/usr/sbin/mysqld(do_handle_one_connection(THD*)+0x18a) [0x7fc8e0ca485a]

/usr/sbin/mysqld(handle_one_connection+0x40) [0x7fc8e0ca4a00]

/usr/sbin/mysqld(+0x96d05d) [0x7fc8e110e05d]

/lib64/libpthread.so.0(+0x7dc5) [0x7fc8e036adc5]

/lib64/libc.so.6(clone+0x6d) [0x7fc8de78973d]

Trying to get some variables.

Some pointers may be invalid and cause the dump to abort.

Query (0x7faa0b3bc020): INSERT INTO MCHECKPOINT (CP_MCHAIN_ID,CP_SHOP_ID,CP_SYSTEM,CP_ACTION,CP_SUBACTION,CP_LASTCHECKED,CP_LASTSUCCESS,CP_LASTALERTED,CP_LASTSWEEPED,ETS,UTS) VALUES (53,'Chain','DT','Export','CustArtSystemLst','20170906145021',' ',' ',' ', '20170906145021648', '20170906145021648')

Connection ID (thread ID): 25160372

Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off

After this WSREP Recovery process starts.

Attachments

Issue Links

relates to

MDEV-9510 Segmentation fault in binlog thread causes crash

Closed

MDEV-10501 MariaDB crash

Closed

Activity

Ascending order - Click to sort in descending order

View 4 older comments

Andrii Nikitin (Inactive) added a comment - 2017-09-11 22:06

I can say only that crash is related to binary logging and it is confirmed that if binary logging doesn't happen - the crash will not occur.
It is very possible that commenting out only expire-logs-days (and probably somehow making sure purge of binary logs doesn't happen on 'unlucky' moments) will be enough.
I personally did spend several days on ~~MDEV-9510~~ (and related) and didn't come up with solution. Now the ticket is assigned to another qualified engineer, so I hope he will come up with solution. I will still analyze new data in this ticket and see if new ideas come up.

Andrii Nikitin (Inactive) added a comment - 2017-09-11 22:06 I can say only that crash is related to binary logging and it is confirmed that if binary logging doesn't happen - the crash will not occur. It is very possible that commenting out only expire-logs-days (and probably somehow making sure purge of binary logs doesn't happen on 'unlucky' moments) will be enough. I personally did spend several days on MDEV-9510 (and related) and didn't come up with solution. Now the ticket is assigned to another qualified engineer, so I hope he will come up with solution. I will still analyze new data in this ticket and see if new ideas come up.

Brendan P added a comment - 2017-10-08 01:45

Some further information,

On our cluster we observed that the node we do most of the writes to will eventually stop purging binlogs, expire-logs-days ceases to function.
No amount of purge binlog commands will purge anything either. When the server is in this state we find it is extremely easy to trigger the crash.

Running a reset master will delete all the binary logs as expected, but the server has crashed several times randomly after, this could be 10 or 30 minutes after the fact or never at all, however expire-logs-days does work again if the server hasn't crashed. We think that pt-online-schema-change could be a factor, it has crashed the server many times with the same random time period after running it during alters, and also seems to be a cause of the binary log purging failure after some time after a few concurrent successful alters.

Brendan P added a comment - 2017-10-08 01:45 Some further information, On our cluster we observed that the node we do most of the writes to will eventually stop purging binlogs, expire-logs-days ceases to function. No amount of purge binlog commands will purge anything either. When the server is in this state we find it is extremely easy to trigger the crash. Running a reset master will delete all the binary logs as expected, but the server has crashed several times randomly after, this could be 10 or 30 minutes after the fact or never at all, however expire-logs-days does work again if the server hasn't crashed. We think that pt-online-schema-change could be a factor, it has crashed the server many times with the same random time period after running it during alters, and also seems to be a cause of the binary log purging failure after some time after a few concurrent successful alters.

Artur Čuvašov added a comment - 2017-10-08 11:21

Check this out if any of your apps use "ENABLE/DISABLE KEYS" construction:

https://binary-data.github.io/2017/04/05/magento-mysql-crash-deadlock-when-index-under-highload/
https://github.com/OpenMage/magento-lts/pull/188

Br,
Arthur

Artur Čuvašov added a comment - 2017-10-08 11:21 Check this out if any of your apps use "ENABLE/DISABLE KEYS" construction: https://binary-data.github.io/2017/04/05/magento-mysql-crash-deadlock-when-index-under-highload/ https://github.com/OpenMage/magento-lts/pull/188 Br, Arthur

Daniel Black added a comment - 2017-10-09 06:51

Also worth noting GET_LOCK is a known limitation and unsupported in Galera https://mariadb.com/kb/en/library/mariadb-galera-cluster-known-limitations/

Daniel Black added a comment - 2017-10-09 06:51 Also worth noting GET_LOCK is a known limitation and unsupported in Galera https://mariadb.com/kb/en/library/mariadb-galera-cluster-known-limitations/

Andrei Elkin added a comment - 2017-12-13 13:06

Fixed by ~~MDEV-9510~~.

Andrei Elkin added a comment - 2017-12-13 13:06 Fixed by MDEV-9510 .

MariaDB Server

MariaDB 10.1.21 server sudden crash

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration