[MDEV-18455] Crash On Table structure Change on very large table - With filesystem not having enough space, rendering the table (and server) unusable - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Duplicate
Affects Version/s: 10.3.12
Fix Version/s: N/A
Component/s: Data Definition - Alter Table, Storage Engine - InnoDB
Labels:
None
Environment:
CentOS Linux 7

Description

While trying to ALTER a very large table (23M records), The server , and all galera linked servers crash. The peer server can never recover and and I have to do a manual rsync to get them to restart.

2019-02-02 22:52:09 0x7f4960194700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.12/storage/innobase/btr/btr0pcur.cc line 461

InnoDB: Failing assertion: btr_page_get_prev(next_page, mtr) == btr_pcur_get_block(cursor)->page.id.page_no()

InnoDB: We intentionally generate a memory trap.

InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

InnoDB: If you get repeated assertion failures or crashes, even

InnoDB: immediately after the mysqld startup, there may be

InnoDB: corruption in the InnoDB tablespace. Please refer to

InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/

InnoDB: about forcing recovery.

190202 22:52:09 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.3.12-MariaDB-log

key_buffer_size=134217728

read_buffer_size=131072

max_used_connections=2

max_threads=153

thread_count=10

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467389 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f491c000ac8

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x7f4960193d30 thread_stack 0x49000

/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55d09923dd1e]

/usr/sbin/mysqld(handle_fatal_signal+0x357)[0x55d098cded97]

sigaction.c:0(__restore_rt)[0x7f4975ac55d0]

:0(__GI_raise)[0x7f4973d99207]

:0(__GI_abort)[0x7f4973d9a8f8]

ut/ut0rbt.cc:218(rbt_tree_add_child(ib_rbt_t const*, ib_rbt_bound_t*, ib_rbt_node_t*) [clone .isra.4] [clone .part.5])[0x55d098a29946]

btr/btr0pcur.cc:459(btr_pcur_move_to_next_page(btr_pcur_t*, mtr_t*))[0x55d09902b152]

include/btr0pcur.ic:340(btr_pcur_move_to_next(btr_pcur_t*, mtr_t*))[0x55d098a27c6b]

row/row0sel.cc:5589(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long))[0x55d098f94201]

handler/ha_innodb.cc:9537(ha_innobase::general_fetch(unsigned char*, unsigned int, unsigned int))[0x55d098ebced8]

sql/handler.cc:2765(handler::ha_rnd_next(unsigned char*))[0x55d098ce3bc7]

sql/records.cc:481(rr_sequential(READ_RECORD*))[0x55d098e0785c]

sql/records.h:73(READ_RECORD::read_record())[0x55d098a1cdd0]

sql/sql_table.cc:9881(mysql_alter_table(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool))[0x55d098b9aaac]

sql/sql_alter.cc:497(Sql_cmd_alter_table::execute(THD*))[0x55d098be7db3]

sql/sql_parse.cc:6285(mysql_execute_command(THD*))[0x55d098b09b05]

sql/sql_parse.cc:8093(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x55d098b1182b]

sql/sql_parse.cc:7887(wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x55d098b121c1]

sql/sql_parse.cc:1915(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x55d098b13a43]

sql/sql_parse.cc:1398(do_command(THD*))[0x55d098b1532e]

sql/sql_connect.cc:1402(do_handle_one_connection(CONNECT*))[0x55d098be512a]

sql/sql_connect.cc:1310(handle_one_connection)[0x55d098be524d]

pthread_create.c:0(start_thread)[0x7f4975abddd5]

/lib64/libc.so.6(clone+0x6d)[0x7f4973e60ead]

Trying to get some variables.

Some pointers may be invalid and cause the dump to abort.

Query (0x7f491c011730): ALTER TABLE contents ADD publisher_id               INT UNSIGNED, ADD cr_holder_id               INT UNSIGNED, ADD INDEX IDX_content_publishers(publisher_id), ADD INDEX IDX_content_copyright(cr_holder_id), ADD CONSTRAINT Refpublishers1691 FOREIGN KEY (publisher_id)     REFERENCES publishers(publisher_id), ADD CONSTRAINT Refcopyright_holders1701 FOREIGN KEY (cr_holder_id)     REFERENCES copyright_holders(cr_holder_id)

Connection ID (thread ID): 11

Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains

information that should help you find out what is causing the crash.

2019-02-02 22:52:22 0 [Note] WSREP: Read nil XID from storage engines, skipping position init

2019-02-02 22:52:22 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'

2019-02-02 22:52:22 0 [Note] WSREP: wsrep_load(): Galera 25.3.25(r3836) by Codership Oy <info@codership.com> loaded successfully.

2019-02-02 22:52:22 0 [Note] WSREP: CRC-32C: using hardware acceleration.

2019-02-02 22:52:22 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 0

2019-02-02 22:52:22 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 6e8ceebd-277d-11e9-9b83-13112382c60b, offset: -1

2019-02-02 22:52:22 0 [Note] WSREP: GCache::RingBuffer initial scan...  0.0% (        0/134217768 bytes) complete.

2019-02-02 22:52:22 0 [Note] WSREP: Attempt to reuse the same seqno: 1. New ptr = 0x7f1087734ce8, seqno_g: 1, seqno_d: 0, size: 920, ctx: 0x56110d49a548, flags: 1. store: 1, cs: 00000000 00000000 00000000 00000000, previous ptr = 0x7f107fb99528, seqno_g: 1, seqno_d: 0, size: 936, ctx: 0x56110d49a548, flags: 1. store: 1, cs: 00000000 00000000 00000000 00000000

2019-02-02 22:52:22 0 [Note] WSREP: Contents differ. Discarding both.

2019-02-02 22:52:22 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217768/134217768 bytes) complete.

2019-02-02 22:52:22 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 2-2

2019-02-02 22:52:22 0 [Note] WSREP: Recovering GCache ring buffer: discarding seqnos 1-1

2019-02-02 22:52:22 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...  0.0% (  0/936 bytes) complete.

2019-02-02 22:52:22 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (936/936 bytes) complete.

2019-02-02 22:52:22 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): found 0/1 locked buffers

2019-02-02 22:52:22 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): used space: 936/134217728

2019-02-02 22:52:22 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.50.216; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT

2019-02-02 22:52:22 0 [Note] WSREP: GCache history reset: 6e8ceebd-277d-11e9-9b83-13112382c60b:2 -> 00000000-0000-0000-0000-000000000000:-1

2019-02-02 22:52:22 0 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): full reset

2019-02-02 22:52:22 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1

2019-02-02 22:52:22 0 [Note] WSREP: wsrep_sst_grab()

2019-02-02 22:52:22 0 [Note] WSREP: Start replication

2019-02-02 22:52:22 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1

2019-02-02 22:52:22 0 [Note] WSREP: protonet asio version 0

2019-02-02 22:52:22 0 [Note] WSREP: Using CRC-32C for message checksums.

2019-02-02 22:52:22 0 [Note] WSREP: backend: asio

2019-02-02 22:52:22 0 [Note] WSREP: gcomm thread scheduling priority set to other:0

2019-02-02 22:52:22 0 [Note] WSREP: restore pc from disk successfully

2019-02-02 22:52:22 0 [Note] WSREP: GMCast version 0

2019-02-02 22:52:22 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567

2019-02-02 22:52:22 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') multicast: , ttl: 1

2019-02-02 22:52:22 0 [Note] WSREP: EVS version 0

2019-02-02 22:52:22 0 [Note] WSREP: gcomm: connecting to group '9jr DB Cluster', peer '192.168.50.30:,192.168.50.65:,192.168.50.213:,192.168.50.214:,192.168.50.225:,192.168.50.55:'

2019-02-02 22:52:25 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.50.213:4567 timed out, no messages seen in PT3S

2019-02-02 22:52:25 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.50.225:4567 timed out, no messages seen in PT3S

2019-02-02 22:52:25 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.50.55:4567 timed out, no messages seen in PT3S

2019-02-02 22:52:25 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.50.65:4567 timed out, no messages seen in PT3S

2019-02-02 22:52:25 0 [Warning] WSREP: no nodes coming from prim view, prim not possible

2019-02-02 22:52:25 0 [Note] WSREP: view(view_id(NON_PRIM,f766f12e,4) memb {

        f766f12e,0

} joined {

} left {

} partitioned {

})

2019-02-02 22:52:25 0 [Note] WSREP: gcomm: connected

2019-02-02 22:52:25 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636

2019-02-02 22:52:25 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)

2019-02-02 22:52:25 0 [Note] WSREP: Opened channel '9jr DB Cluster'

2019-02-02 22:52:25 0 [Note] WSREP: Waiting for SST to complete.

2019-02-02 22:52:25 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1

2019-02-02 22:52:25 0 [Note] WSREP: Flow-control interval: [16, 16]

2019-02-02 22:52:25 0 [Note] WSREP: Trying to continue unpaused monitor

2019-02-02 22:52:25 0 [Note] WSREP: Received NON-PRIMARY.

2019-02-02 22:52:25 2 [Note] WSREP: New cluster view: global state: :-1, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version -1

2019-02-02 22:52:25 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

2019-02-02 22:52:26 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50278S), skipping check

2019-02-02 22:52:29 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.50.213:4567 timed out, no messages seen in PT3S

2019-02-02 22:52:32 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection established to 119c0d79 tcp://192.168.50.30:4567

2019-02-02 22:52:32 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:

2019-02-02 22:52:32 0 [Note] WSREP: declaring 119c0d79 at tcp://192.168.50.30:4567 stable

2019-02-02 22:52:32 0 [Warning] WSREP: no nodes coming from prim view, prim not possible

2019-02-02 22:52:32 0 [Note] WSREP: view(view_id(NON_PRIM,119c0d79,5) memb {

        119c0d79,0

        f766f12e,0

} joined {

} left {

} partitioned {

})

2019-02-02 22:52:32 0 [Note] WSREP: promote to primary component

2019-02-02 22:52:32 0 [Note] WSREP: view(view_id(PRIM,119c0d79,5) memb {

        119c0d79,0

        f766f12e,0

} joined {

} left {

} partitioned {

})

2019-02-02 22:52:32 0 [Note] WSREP: save pc into disk

2019-02-02 22:52:32 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2

2019-02-02 22:52:32 0 [Note] WSREP: Flow-control interval: [23, 23]

2019-02-02 22:52:32 0 [Note] WSREP: Trying to continue unpaused monitor

2019-02-02 22:52:32 0 [Note] WSREP: Received NON-PRIMARY.

2019-02-02 22:52:32 2 [Note] WSREP: New cluster view: global state: :-1, view# -1: non-Primary, number of nodes: 2, my index: 1, protocol version -1

2019-02-02 22:52:32 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.50.213:4567

2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.50.214:4567

2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.50.225:4567

2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr proto entry 0x7f1070001240

2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.50.55:4567

2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.50.65:4567

2019-02-02 22:52:32 0 [Note] WSREP: clear restored view

2019-02-02 22:52:32 0 [Warning] WSREP: non weight changing install in S_PRIM: pcmsg{ type=INSTALL, seq=0, flags= 2, node_map {  119c0d79,prim=0,un=0,last_seq=4294967295,last_prim=view_id(NON_PRIM,119c0d79,2),to_seq=-1,weight=1,segment=0

        f766f12e,prim=0,un=0,last_seq=4294967295,last_prim=view_id(NON_PRIM,119c0d79,2),to_seq=-1,weight=1,segment=0

}}

2019-02-02 22:52:32 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = yes, my_idx = 1, memb_num = 2

2019-02-02 22:52:32 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.

2019-02-02 22:52:32 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 47e64989-2780-11e9-a986-ef5ad4566067

2019-02-02 22:52:32 0 [Note] WSREP: STATE EXCHANGE: got state msg: 47e64989-2780-11e9-a986-ef5ad4566067 from 0 (kvm.amar.com)

2019-02-02 22:52:32 0 [Note] WSREP: STATE EXCHANGE: got state msg: 47e64989-2780-11e9-a986-ef5ad4566067 from 1 (db.9jr.com)

2019-02-02 22:52:32 0 [Warning] WSREP: Quorum: No node with complete state:

        Version      : 4

        Flags        : 0x5

        Protocols    : 0 / 9 / 3

        State        : NON-PRIMARY

        Desync count : 0

        Prim state   : NON-PRIMARY

        Prim UUID    : 00000000-0000-0000-0000-000000000000

        Prim  seqno  : -1

        First seqno  : -1

        Last  seqno  : -1

        Prim JOINED  : 0

        State UUID   : 47e64989-2780-11e9-a986-ef5ad4566067

        Group UUID   : 00000000-0000-0000-0000-000000000000

        Name         : 'kvm.amar.com'

        Incoming addr: '192.168.50.30:3306'

        Version      : 4

        Flags        : 0x4

        Protocols    : 0 / 9 / 3

        State        : NON-PRIMARY

        Desync count : 0

        Prim state   : NON-PRIMARY

        Prim UUID    : 00000000-0000-0000-0000-000000000000

        Prim  seqno  : -1

        First seqno  : -1

        Last  seqno  : -1

        Prim JOINED  : 0

        State UUID   : 47e64989-2780-11e9-a986-ef5ad4566067

        Group UUID   : 00000000-0000-0000-0000-000000000000

        Name         : 'db.9jr.com'

        Incoming addr: '192.168.50.216:3306'

2019-02-02 22:52:32 0 [Warning] WSREP: No re-merged primary component found.

2019-02-02 22:52:32 0 [Note] WSREP: Bootstrapped primary 00000000-0000-0000-0000-000000000000 found: 2.

2019-02-02 22:52:32 0 [Note] WSREP: Quorum results:

        version    = 4,

        component  = PRIMARY,

        conf_id    = -1,

        members    = 2/2 (joined/total),

        act_id     = -1,

        last_appl. = -1,

        protocols  = 0/9/3 (gcs/repl/appl),

        group UUID = 00000000-0000-0000-0000-000000000000

2019-02-02 22:52:32 0 [Note] WSREP: Flow-control interval: [23, 23]

2019-02-02 22:52:32 0 [Note] WSREP: Trying to continue unpaused monitor

2019-02-02 22:52:32 0 [Note] WSREP: Restored state OPEN -> JOINED (-1)

2019-02-02 22:52:32 2 [Note] WSREP: New cluster view: global state: :-1, view# 0: Primary, number of nodes: 2, my index: 1, protocol version 3

2019-02-02 22:52:32 0 [ERROR] WSREP: SST failed: 1 (Operation not permitted)

2019-02-02 22:52:32 0 [ERROR] Aborting

Error in my_thread_global_end(): 1 threads didn't exit

Attachments

Issue Links

relates to

MDEV-13542 Crashing on a corrupted page is unhelpful

Closed

MDEV-34118 fsp_alloc_free_extent() fails to flag DB_OUT_OF_FILE_SPACE

Closed

Activity

Transition	Time In Source Status	Execution Times

Thirunarayanan Balathandayuthapani made transition - 2025-03-21 06:57

Open

Closed

2237d 23h 52m

People

Assignee:: Thirunarayanan Balathandayuthapani

Reporter:: Jacques Amar

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 2019-02-03 07:05

Updated:: 2025-03-21 13:17

Resolved:: 2025-03-21 06:57

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server