Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18455

Crash On Table structure Change on very large table - With filesystem not having enough space, rendering the table (and server) unusable

    XMLWordPrintable

Details

    Description

      While trying to ALTER a very large table (23M records), The server , and all galera linked servers crash. The peer server can never recover and and I have to do a manual rsync to get them to restart.

      2019-02-02 22:52:09 0x7f4960194700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.12/storage/innobase/btr/btr0pcur.cc line 461
      InnoDB: Failing assertion: btr_page_get_prev(next_page, mtr) == btr_pcur_get_block(cursor)->page.id.page_no()
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      InnoDB: If you get repeated assertion failures or crashes, even
      InnoDB: immediately after the mysqld startup, there may be
      InnoDB: corruption in the InnoDB tablespace. Please refer to
      InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
      InnoDB: about forcing recovery.
      190202 22:52:09 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
       
      Server version: 10.3.12-MariaDB-log
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=2
      max_threads=153
      thread_count=10
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467389 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7f491c000ac8
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f4960193d30 thread_stack 0x49000
      /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55d09923dd1e]
      /usr/sbin/mysqld(handle_fatal_signal+0x357)[0x55d098cded97]
      sigaction.c:0(__restore_rt)[0x7f4975ac55d0]
      :0(__GI_raise)[0x7f4973d99207]
      :0(__GI_abort)[0x7f4973d9a8f8]
      ut/ut0rbt.cc:218(rbt_tree_add_child(ib_rbt_t const*, ib_rbt_bound_t*, ib_rbt_node_t*) [clone .isra.4] [clone .part.5])[0x55d098a29946]
      btr/btr0pcur.cc:459(btr_pcur_move_to_next_page(btr_pcur_t*, mtr_t*))[0x55d09902b152]
      include/btr0pcur.ic:340(btr_pcur_move_to_next(btr_pcur_t*, mtr_t*))[0x55d098a27c6b]
      row/row0sel.cc:5589(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long))[0x55d098f94201]
      handler/ha_innodb.cc:9537(ha_innobase::general_fetch(unsigned char*, unsigned int, unsigned int))[0x55d098ebced8]
      sql/handler.cc:2765(handler::ha_rnd_next(unsigned char*))[0x55d098ce3bc7]
      sql/records.cc:481(rr_sequential(READ_RECORD*))[0x55d098e0785c]
      sql/records.h:73(READ_RECORD::read_record())[0x55d098a1cdd0]
      sql/sql_table.cc:9881(mysql_alter_table(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool))[0x55d098b9aaac]
      sql/sql_alter.cc:497(Sql_cmd_alter_table::execute(THD*))[0x55d098be7db3]
      sql/sql_parse.cc:6285(mysql_execute_command(THD*))[0x55d098b09b05]
      sql/sql_parse.cc:8093(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x55d098b1182b]
      sql/sql_parse.cc:7887(wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x55d098b121c1]
      sql/sql_parse.cc:1915(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x55d098b13a43]
      sql/sql_parse.cc:1398(do_command(THD*))[0x55d098b1532e]
      sql/sql_connect.cc:1402(do_handle_one_connection(CONNECT*))[0x55d098be512a]
      sql/sql_connect.cc:1310(handle_one_connection)[0x55d098be524d]
      pthread_create.c:0(start_thread)[0x7f4975abddd5]
      /lib64/libc.so.6(clone+0x6d)[0x7f4973e60ead]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7f491c011730): ALTER TABLE contents ADD publisher_id               INT UNSIGNED, ADD cr_holder_id               INT UNSIGNED, ADD INDEX IDX_content_publishers(publisher_id), ADD INDEX IDX_content_copyright(cr_holder_id), ADD CONSTRAINT Refpublishers1691 FOREIGN KEY (publisher_id)     REFERENCES publishers(publisher_id), ADD CONSTRAINT Refcopyright_holders1701 FOREIGN KEY (cr_holder_id)     REFERENCES copyright_holders(cr_holder_id)
      Connection ID (thread ID): 11
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on
       
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      2019-02-02 22:52:22 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
      2019-02-02 22:52:22 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
      2019-02-02 22:52:22 0 [Note] WSREP: wsrep_load(): Galera 25.3.25(r3836) by Codership Oy <info@codership.com> loaded successfully.
      2019-02-02 22:52:22 0 [Note] WSREP: CRC-32C: using hardware acceleration.
      2019-02-02 22:52:22 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 0
      2019-02-02 22:52:22 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 6e8ceebd-277d-11e9-9b83-13112382c60b, offset: -1
      2019-02-02 22:52:22 0 [Note] WSREP: GCache::RingBuffer initial scan...  0.0% (        0/134217768 bytes) complete.
      2019-02-02 22:52:22 0 [Note] WSREP: Attempt to reuse the same seqno: 1. New ptr = 0x7f1087734ce8, seqno_g: 1, seqno_d: 0, size: 920, ctx: 0x56110d49a548, flags: 1. store: 1, cs: 00000000 00000000 00000000 00000000, previous ptr = 0x7f107fb99528, seqno_g: 1, seqno_d: 0, size: 936, ctx: 0x56110d49a548, flags: 1. store: 1, cs: 00000000 00000000 00000000 00000000
      2019-02-02 22:52:22 0 [Note] WSREP: Contents differ. Discarding both.
      2019-02-02 22:52:22 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217768/134217768 bytes) complete.
      2019-02-02 22:52:22 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 2-2
      2019-02-02 22:52:22 0 [Note] WSREP: Recovering GCache ring buffer: discarding seqnos 1-1
      2019-02-02 22:52:22 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...  0.0% (  0/936 bytes) complete.
      2019-02-02 22:52:22 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (936/936 bytes) complete.
      2019-02-02 22:52:22 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): found 0/1 locked buffers
      2019-02-02 22:52:22 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): used space: 936/134217728
      2019-02-02 22:52:22 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.50.216; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT
      2019-02-02 22:52:22 0 [Note] WSREP: GCache history reset: 6e8ceebd-277d-11e9-9b83-13112382c60b:2 -> 00000000-0000-0000-0000-000000000000:-1
      2019-02-02 22:52:22 0 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): full reset
      2019-02-02 22:52:22 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
      2019-02-02 22:52:22 0 [Note] WSREP: wsrep_sst_grab()
      2019-02-02 22:52:22 0 [Note] WSREP: Start replication
      2019-02-02 22:52:22 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
      2019-02-02 22:52:22 0 [Note] WSREP: protonet asio version 0
      2019-02-02 22:52:22 0 [Note] WSREP: Using CRC-32C for message checksums.
      2019-02-02 22:52:22 0 [Note] WSREP: backend: asio
      2019-02-02 22:52:22 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
      2019-02-02 22:52:22 0 [Note] WSREP: restore pc from disk successfully
      2019-02-02 22:52:22 0 [Note] WSREP: GMCast version 0
      2019-02-02 22:52:22 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
      2019-02-02 22:52:22 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
      2019-02-02 22:52:22 0 [Note] WSREP: EVS version 0
      2019-02-02 22:52:22 0 [Note] WSREP: gcomm: connecting to group '9jr DB Cluster', peer '192.168.50.30:,192.168.50.65:,192.168.50.213:,192.168.50.214:,192.168.50.225:,192.168.50.55:'
      2019-02-02 22:52:25 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.50.213:4567 timed out, no messages seen in PT3S
      2019-02-02 22:52:25 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.50.225:4567 timed out, no messages seen in PT3S
      2019-02-02 22:52:25 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.50.55:4567 timed out, no messages seen in PT3S
      2019-02-02 22:52:25 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.50.65:4567 timed out, no messages seen in PT3S
      2019-02-02 22:52:25 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
      2019-02-02 22:52:25 0 [Note] WSREP: view(view_id(NON_PRIM,f766f12e,4) memb {
              f766f12e,0
      } joined {
      } left {
      } partitioned {
      })
      2019-02-02 22:52:25 0 [Note] WSREP: gcomm: connected
      2019-02-02 22:52:25 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
      2019-02-02 22:52:25 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
      2019-02-02 22:52:25 0 [Note] WSREP: Opened channel '9jr DB Cluster'
      2019-02-02 22:52:25 0 [Note] WSREP: Waiting for SST to complete.
      2019-02-02 22:52:25 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2019-02-02 22:52:25 0 [Note] WSREP: Flow-control interval: [16, 16]
      2019-02-02 22:52:25 0 [Note] WSREP: Trying to continue unpaused monitor
      2019-02-02 22:52:25 0 [Note] WSREP: Received NON-PRIMARY.
      2019-02-02 22:52:25 2 [Note] WSREP: New cluster view: global state: :-1, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version -1
      2019-02-02 22:52:25 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2019-02-02 22:52:26 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50278S), skipping check
      2019-02-02 22:52:29 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.50.213:4567 timed out, no messages seen in PT3S
      2019-02-02 22:52:32 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection established to 119c0d79 tcp://192.168.50.30:4567
      2019-02-02 22:52:32 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
      2019-02-02 22:52:32 0 [Note] WSREP: declaring 119c0d79 at tcp://192.168.50.30:4567 stable
      2019-02-02 22:52:32 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
      2019-02-02 22:52:32 0 [Note] WSREP: view(view_id(NON_PRIM,119c0d79,5) memb {
              119c0d79,0
              f766f12e,0
      } joined {
      } left {
      } partitioned {
      })
      2019-02-02 22:52:32 0 [Note] WSREP: promote to primary component
      2019-02-02 22:52:32 0 [Note] WSREP: view(view_id(PRIM,119c0d79,5) memb {
              119c0d79,0
              f766f12e,0
      } joined {
      } left {
      } partitioned {
      })
      2019-02-02 22:52:32 0 [Note] WSREP: save pc into disk
      2019-02-02 22:52:32 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2
      2019-02-02 22:52:32 0 [Note] WSREP: Flow-control interval: [23, 23]
      2019-02-02 22:52:32 0 [Note] WSREP: Trying to continue unpaused monitor
      2019-02-02 22:52:32 0 [Note] WSREP: Received NON-PRIMARY.
      2019-02-02 22:52:32 2 [Note] WSREP: New cluster view: global state: :-1, view# -1: non-Primary, number of nodes: 2, my index: 1, protocol version -1
      2019-02-02 22:52:32 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.50.213:4567
      2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.50.214:4567
      2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.50.225:4567
      2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr proto entry 0x7f1070001240
      2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.50.55:4567
      2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.50.65:4567
      2019-02-02 22:52:32 0 [Note] WSREP: clear restored view
      2019-02-02 22:52:32 0 [Warning] WSREP: non weight changing install in S_PRIM: pcmsg{ type=INSTALL, seq=0, flags= 2, node_map {  119c0d79,prim=0,un=0,last_seq=4294967295,last_prim=view_id(NON_PRIM,119c0d79,2),to_seq=-1,weight=1,segment=0
              f766f12e,prim=0,un=0,last_seq=4294967295,last_prim=view_id(NON_PRIM,119c0d79,2),to_seq=-1,weight=1,segment=0
      }}
      2019-02-02 22:52:32 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = yes, my_idx = 1, memb_num = 2
      2019-02-02 22:52:32 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
      2019-02-02 22:52:32 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 47e64989-2780-11e9-a986-ef5ad4566067
      2019-02-02 22:52:32 0 [Note] WSREP: STATE EXCHANGE: got state msg: 47e64989-2780-11e9-a986-ef5ad4566067 from 0 (kvm.amar.com)
      2019-02-02 22:52:32 0 [Note] WSREP: STATE EXCHANGE: got state msg: 47e64989-2780-11e9-a986-ef5ad4566067 from 1 (db.9jr.com)
      2019-02-02 22:52:32 0 [Warning] WSREP: Quorum: No node with complete state:
       
              Version      : 4
              Flags        : 0x5
              Protocols    : 0 / 9 / 3
              State        : NON-PRIMARY
              Desync count : 0
              Prim state   : NON-PRIMARY
              Prim UUID    : 00000000-0000-0000-0000-000000000000
              Prim  seqno  : -1
              First seqno  : -1
              Last  seqno  : -1
              Prim JOINED  : 0
              State UUID   : 47e64989-2780-11e9-a986-ef5ad4566067
              Group UUID   : 00000000-0000-0000-0000-000000000000
              Name         : 'kvm.amar.com'
              Incoming addr: '192.168.50.30:3306'
       
              Version      : 4
              Flags        : 0x4
              Protocols    : 0 / 9 / 3
              State        : NON-PRIMARY
              Desync count : 0
              Prim state   : NON-PRIMARY
              Prim UUID    : 00000000-0000-0000-0000-000000000000
              Prim  seqno  : -1
              First seqno  : -1
              Last  seqno  : -1
              Prim JOINED  : 0
              State UUID   : 47e64989-2780-11e9-a986-ef5ad4566067
              Group UUID   : 00000000-0000-0000-0000-000000000000
              Name         : 'db.9jr.com'
              Incoming addr: '192.168.50.216:3306'
       
      2019-02-02 22:52:32 0 [Warning] WSREP: No re-merged primary component found.
      2019-02-02 22:52:32 0 [Note] WSREP: Bootstrapped primary 00000000-0000-0000-0000-000000000000 found: 2.
      2019-02-02 22:52:32 0 [Note] WSREP: Quorum results:
              version    = 4,
              component  = PRIMARY,
              conf_id    = -1,
              members    = 2/2 (joined/total),
              act_id     = -1,
              last_appl. = -1,
              protocols  = 0/9/3 (gcs/repl/appl),
              group UUID = 00000000-0000-0000-0000-000000000000
      2019-02-02 22:52:32 0 [Note] WSREP: Flow-control interval: [23, 23]
      2019-02-02 22:52:32 0 [Note] WSREP: Trying to continue unpaused monitor
      2019-02-02 22:52:32 0 [Note] WSREP: Restored state OPEN -> JOINED (-1)
      2019-02-02 22:52:32 2 [Note] WSREP: New cluster view: global state: :-1, view# 0: Primary, number of nodes: 2, my index: 1, protocol version 3
      2019-02-02 22:52:32 0 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
      2019-02-02 22:52:32 0 [ERROR] Aborting
       
      Error in my_thread_global_end(): 1 threads didn't exit
      

      Attachments

        Activity

          People

            marko Marko Mäkelä
            laxamar Jacques Amar
            Votes:
            1 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.