Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate
    • 10.1.21
    • 10.1.30
    • Galera
    • None
    • CentOS 7.3.1611

    Description

      2017-09-06 14:50:21 140465624627968 [Note] WSREP: cluster conflict due to high priority abort for threads:
      2017-09-06 14:50:21 140465624627968 [Note] WSREP: Winning thread: 
         THD: 25160306, mode: total order, state: executing, conflict: no conflict, seqno: 1598878419
         SQL: ALTER TABLE `catalog_product_index_price` ENABLE KEYS
      2017-09-06 14:50:21 140465624627968 [Note] WSREP: Victim thread: 
         THD: 25160307, mode: local, state: executing, conflict: no conflict, seqno: -1
         SQL: SELECT GET_LOCK('magentodev19.index_process_18', '5')
      2017-09-06 14:50:21 140465624627968 [Note] WSREP: MDL conflict db=magentodev19 table=catalog_product_index_price ticket=4 solved by abort
      170906 14:50:21 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.1.21-MariaDB
      key_buffer_size=33554432
      read_buffer_size=131072
      max_used_connections=469
      max_threads=1026
      thread_count=409
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2286396 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7faa805c5008
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7faa76b20080 thread_stack 0x48400
      (my_addr_resolve failure: fork)
      /usr/sbin/mysqld(my_print_stacktrace+0x2e) [0x7fc8e12309ce]
      /usr/sbin/mysqld(handle_fatal_signal+0x305) [0x7fc8e0d56355]
      /lib64/libpthread.so.0(+0xf370) [0x7fc8e0372370]
      /usr/sbin/mysqld(MDL_lock::Ticket_list::remove_ticket(MDL_ticket*)+0x11) [0x7fc8e0cac3c1]
      /usr/sbin/mysqld(MDL_lock::remove_ticket(LF_PINS*, MDL_lock::Ticket_list MDL_lock::*, MDL_ticket*)+0x8c) [0x7fc8e0cacedc]
      /usr/sbin/mysqld(MDL_context::release_lock(enum_mdl_duration, MDL_ticket*)+0x24) [0x7fc8e0cadf34]
      /usr/sbin/mysqld(MDL_context::release_locks_stored_before(enum_mdl_duration, MDL_ticket*)+0x35) [0x7fc8e0cadfb5]
      /usr/sbin/mysqld(mysql_execute_command(THD*)+0xaed) [0x7fc8e0bd10fd]
      /usr/sbin/mysqld(mysql_parse(THD*, char*, unsigned int, Parser_state*)+0x332) [0x7fc8e0bd9a02]
      /usr/sbin/mysqld(+0x439229) [0x7fc8e0bda229]
      /usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x1f5a) [0x7fc8e0bdc83a]
      /usr/sbin/mysqld(do_command(THD*)+0x14a) [0x7fc8e0bdd6aa]
      /usr/sbin/mysqld(do_handle_one_connection(THD*)+0x18a) [0x7fc8e0ca485a]
      /usr/sbin/mysqld(handle_one_connection+0x40) [0x7fc8e0ca4a00]
      /usr/sbin/mysqld(+0x96d05d) [0x7fc8e110e05d]
      /lib64/libpthread.so.0(+0x7dc5) [0x7fc8e036adc5]
      /lib64/libc.so.6(clone+0x6d) [0x7fc8de78973d]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7faa0b3bc020): INSERT INTO MCHECKPOINT (CP_MCHAIN_ID,CP_SHOP_ID,CP_SYSTEM,CP_ACTION,CP_SUBACTION,CP_LASTCHECKED,CP_LASTSUCCESS,CP_LASTALERTED,CP_LASTSWEEPED,ETS,UTS) VALUES (53,'Chain','DT','Export','CustArtSystemLst','20170906145021',' ',' ',' ', '20170906145021648', '20170906145021648')
      Connection ID (thread ID): 25160372
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
      
      

      After this WSREP Recovery process starts.

      Attachments

        Issue Links

          Activity

            I can say only that crash is related to binary logging and it is confirmed that if binary logging doesn't happen - the crash will not occur.
            It is very possible that commenting out only expire-logs-days (and probably somehow making sure purge of binary logs doesn't happen on 'unlucky' moments) will be enough.
            I personally did spend several days on MDEV-9510 (and related) and didn't come up with solution. Now the ticket is assigned to another qualified engineer, so I hope he will come up with solution. I will still analyze new data in this ticket and see if new ideas come up.

            anikitin Andrii Nikitin (Inactive) added a comment - I can say only that crash is related to binary logging and it is confirmed that if binary logging doesn't happen - the crash will not occur. It is very possible that commenting out only expire-logs-days (and probably somehow making sure purge of binary logs doesn't happen on 'unlucky' moments) will be enough. I personally did spend several days on MDEV-9510 (and related) and didn't come up with solution. Now the ticket is assigned to another qualified engineer, so I hope he will come up with solution. I will still analyze new data in this ticket and see if new ideas come up.
            spikestabber Brendan P added a comment -

            Some further information,

            On our cluster we observed that the node we do most of the writes to will eventually stop purging binlogs, expire-logs-days ceases to function.
            No amount of purge binlog commands will purge anything either. When the server is in this state we find it is extremely easy to trigger the crash.

            Running a reset master will delete all the binary logs as expected, but the server has crashed several times randomly after, this could be 10 or 30 minutes after the fact or never at all, however expire-logs-days does work again if the server hasn't crashed. We think that pt-online-schema-change could be a factor, it has crashed the server many times with the same random time period after running it during alters, and also seems to be a cause of the binary log purging failure after some time after a few concurrent successful alters.

            spikestabber Brendan P added a comment - Some further information, On our cluster we observed that the node we do most of the writes to will eventually stop purging binlogs, expire-logs-days ceases to function. No amount of purge binlog commands will purge anything either. When the server is in this state we find it is extremely easy to trigger the crash. Running a reset master will delete all the binary logs as expected, but the server has crashed several times randomly after, this could be 10 or 30 minutes after the fact or never at all, however expire-logs-days does work again if the server hasn't crashed. We think that pt-online-schema-change could be a factor, it has crashed the server many times with the same random time period after running it during alters, and also seems to be a cause of the binary log purging failure after some time after a few concurrent successful alters.
            mtxd Artur Čuvašov added a comment - Check this out if any of your apps use "ENABLE/DISABLE KEYS" construction: https://binary-data.github.io/2017/04/05/magento-mysql-crash-deadlock-when-index-under-highload/ https://github.com/OpenMage/magento-lts/pull/188 Br, Arthur
            danblack Daniel Black added a comment -

            Also worth noting GET_LOCK is a known limitation and unsupported in Galera https://mariadb.com/kb/en/library/mariadb-galera-cluster-known-limitations/

            danblack Daniel Black added a comment - Also worth noting GET_LOCK is a known limitation and unsupported in Galera https://mariadb.com/kb/en/library/mariadb-galera-cluster-known-limitations/
            Elkin Andrei Elkin added a comment -

            Fixed by MDEV-9510.

            Elkin Andrei Elkin added a comment - Fixed by MDEV-9510 .

            People

              anikitin Andrii Nikitin (Inactive)
              mtxd Artur Čuvašov
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.