[MDEV-18455] Crash On Table structure Change on very large table - With filesystem not having enough space, rendering the table (and server) unusable Created: 2019-02-03  Updated: 2023-04-27

Status: Open
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Storage Engine - InnoDB
Affects Version/s: 10.3.12
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Jacques Amar Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 1
Labels: None
Environment:

CentOS Linux 7



 Description   

While trying to ALTER a very large table (23M records), The server , and all galera linked servers crash. The peer server can never recover and and I have to do a manual rsync to get them to restart.

2019-02-02 22:52:09 0x7f4960194700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.12/storage/innobase/btr/btr0pcur.cc line 461
InnoDB: Failing assertion: btr_page_get_prev(next_page, mtr) == btr_pcur_get_block(cursor)->page.id.page_no()
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
190202 22:52:09 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.3.12-MariaDB-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=2
max_threads=153
thread_count=10
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467389 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f491c000ac8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f4960193d30 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55d09923dd1e]
/usr/sbin/mysqld(handle_fatal_signal+0x357)[0x55d098cded97]
sigaction.c:0(__restore_rt)[0x7f4975ac55d0]
:0(__GI_raise)[0x7f4973d99207]
:0(__GI_abort)[0x7f4973d9a8f8]
ut/ut0rbt.cc:218(rbt_tree_add_child(ib_rbt_t const*, ib_rbt_bound_t*, ib_rbt_node_t*) [clone .isra.4] [clone .part.5])[0x55d098a29946]
btr/btr0pcur.cc:459(btr_pcur_move_to_next_page(btr_pcur_t*, mtr_t*))[0x55d09902b152]
include/btr0pcur.ic:340(btr_pcur_move_to_next(btr_pcur_t*, mtr_t*))[0x55d098a27c6b]
row/row0sel.cc:5589(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long))[0x55d098f94201]
handler/ha_innodb.cc:9537(ha_innobase::general_fetch(unsigned char*, unsigned int, unsigned int))[0x55d098ebced8]
sql/handler.cc:2765(handler::ha_rnd_next(unsigned char*))[0x55d098ce3bc7]
sql/records.cc:481(rr_sequential(READ_RECORD*))[0x55d098e0785c]
sql/records.h:73(READ_RECORD::read_record())[0x55d098a1cdd0]
sql/sql_table.cc:9881(mysql_alter_table(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool))[0x55d098b9aaac]
sql/sql_alter.cc:497(Sql_cmd_alter_table::execute(THD*))[0x55d098be7db3]
sql/sql_parse.cc:6285(mysql_execute_command(THD*))[0x55d098b09b05]
sql/sql_parse.cc:8093(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x55d098b1182b]
sql/sql_parse.cc:7887(wsrep_mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x55d098b121c1]
sql/sql_parse.cc:1915(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x55d098b13a43]
sql/sql_parse.cc:1398(do_command(THD*))[0x55d098b1532e]
sql/sql_connect.cc:1402(do_handle_one_connection(CONNECT*))[0x55d098be512a]
sql/sql_connect.cc:1310(handle_one_connection)[0x55d098be524d]
pthread_create.c:0(start_thread)[0x7f4975abddd5]
/lib64/libc.so.6(clone+0x6d)[0x7f4973e60ead]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f491c011730): ALTER TABLE contents ADD publisher_id               INT UNSIGNED, ADD cr_holder_id               INT UNSIGNED, ADD INDEX IDX_content_publishers(publisher_id), ADD INDEX IDX_content_copyright(cr_holder_id), ADD CONSTRAINT Refpublishers1691 FOREIGN KEY (publisher_id)     REFERENCES publishers(publisher_id), ADD CONSTRAINT Refcopyright_holders1701 FOREIGN KEY (cr_holder_id)     REFERENCES copyright_holders(cr_holder_id)
Connection ID (thread ID): 11
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
2019-02-02 22:52:22 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
2019-02-02 22:52:22 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
2019-02-02 22:52:22 0 [Note] WSREP: wsrep_load(): Galera 25.3.25(r3836) by Codership Oy <info@codership.com> loaded successfully.
2019-02-02 22:52:22 0 [Note] WSREP: CRC-32C: using hardware acceleration.
2019-02-02 22:52:22 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 0
2019-02-02 22:52:22 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 6e8ceebd-277d-11e9-9b83-13112382c60b, offset: -1
2019-02-02 22:52:22 0 [Note] WSREP: GCache::RingBuffer initial scan...  0.0% (        0/134217768 bytes) complete.
2019-02-02 22:52:22 0 [Note] WSREP: Attempt to reuse the same seqno: 1. New ptr = 0x7f1087734ce8, seqno_g: 1, seqno_d: 0, size: 920, ctx: 0x56110d49a548, flags: 1. store: 1, cs: 00000000 00000000 00000000 00000000, previous ptr = 0x7f107fb99528, seqno_g: 1, seqno_d: 0, size: 936, ctx: 0x56110d49a548, flags: 1. store: 1, cs: 00000000 00000000 00000000 00000000
2019-02-02 22:52:22 0 [Note] WSREP: Contents differ. Discarding both.
2019-02-02 22:52:22 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217768/134217768 bytes) complete.
2019-02-02 22:52:22 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 2-2
2019-02-02 22:52:22 0 [Note] WSREP: Recovering GCache ring buffer: discarding seqnos 1-1
2019-02-02 22:52:22 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...  0.0% (  0/936 bytes) complete.
2019-02-02 22:52:22 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (936/936 bytes) complete.
2019-02-02 22:52:22 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): found 0/1 locked buffers
2019-02-02 22:52:22 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): used space: 936/134217728
2019-02-02 22:52:22 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.50.216; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT
2019-02-02 22:52:22 0 [Note] WSREP: GCache history reset: 6e8ceebd-277d-11e9-9b83-13112382c60b:2 -> 00000000-0000-0000-0000-000000000000:-1
2019-02-02 22:52:22 0 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): full reset
2019-02-02 22:52:22 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2019-02-02 22:52:22 0 [Note] WSREP: wsrep_sst_grab()
2019-02-02 22:52:22 0 [Note] WSREP: Start replication
2019-02-02 22:52:22 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2019-02-02 22:52:22 0 [Note] WSREP: protonet asio version 0
2019-02-02 22:52:22 0 [Note] WSREP: Using CRC-32C for message checksums.
2019-02-02 22:52:22 0 [Note] WSREP: backend: asio
2019-02-02 22:52:22 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2019-02-02 22:52:22 0 [Note] WSREP: restore pc from disk successfully
2019-02-02 22:52:22 0 [Note] WSREP: GMCast version 0
2019-02-02 22:52:22 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2019-02-02 22:52:22 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2019-02-02 22:52:22 0 [Note] WSREP: EVS version 0
2019-02-02 22:52:22 0 [Note] WSREP: gcomm: connecting to group '9jr DB Cluster', peer '192.168.50.30:,192.168.50.65:,192.168.50.213:,192.168.50.214:,192.168.50.225:,192.168.50.55:'
2019-02-02 22:52:25 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.50.213:4567 timed out, no messages seen in PT3S
2019-02-02 22:52:25 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.50.225:4567 timed out, no messages seen in PT3S
2019-02-02 22:52:25 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.50.55:4567 timed out, no messages seen in PT3S
2019-02-02 22:52:25 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.50.65:4567 timed out, no messages seen in PT3S
2019-02-02 22:52:25 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2019-02-02 22:52:25 0 [Note] WSREP: view(view_id(NON_PRIM,f766f12e,4) memb {
        f766f12e,0
} joined {
} left {
} partitioned {
})
2019-02-02 22:52:25 0 [Note] WSREP: gcomm: connected
2019-02-02 22:52:25 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2019-02-02 22:52:25 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2019-02-02 22:52:25 0 [Note] WSREP: Opened channel '9jr DB Cluster'
2019-02-02 22:52:25 0 [Note] WSREP: Waiting for SST to complete.
2019-02-02 22:52:25 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2019-02-02 22:52:25 0 [Note] WSREP: Flow-control interval: [16, 16]
2019-02-02 22:52:25 0 [Note] WSREP: Trying to continue unpaused monitor
2019-02-02 22:52:25 0 [Note] WSREP: Received NON-PRIMARY.
2019-02-02 22:52:25 2 [Note] WSREP: New cluster view: global state: :-1, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version -1
2019-02-02 22:52:25 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2019-02-02 22:52:26 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50278S), skipping check
2019-02-02 22:52:29 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.50.213:4567 timed out, no messages seen in PT3S
2019-02-02 22:52:32 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') connection established to 119c0d79 tcp://192.168.50.30:4567
2019-02-02 22:52:32 0 [Note] WSREP: (f766f12e, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2019-02-02 22:52:32 0 [Note] WSREP: declaring 119c0d79 at tcp://192.168.50.30:4567 stable
2019-02-02 22:52:32 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2019-02-02 22:52:32 0 [Note] WSREP: view(view_id(NON_PRIM,119c0d79,5) memb {
        119c0d79,0
        f766f12e,0
} joined {
} left {
} partitioned {
})
2019-02-02 22:52:32 0 [Note] WSREP: promote to primary component
2019-02-02 22:52:32 0 [Note] WSREP: view(view_id(PRIM,119c0d79,5) memb {
        119c0d79,0
        f766f12e,0
} joined {
} left {
} partitioned {
})
2019-02-02 22:52:32 0 [Note] WSREP: save pc into disk
2019-02-02 22:52:32 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 1, memb_num = 2
2019-02-02 22:52:32 0 [Note] WSREP: Flow-control interval: [23, 23]
2019-02-02 22:52:32 0 [Note] WSREP: Trying to continue unpaused monitor
2019-02-02 22:52:32 0 [Note] WSREP: Received NON-PRIMARY.
2019-02-02 22:52:32 2 [Note] WSREP: New cluster view: global state: :-1, view# -1: non-Primary, number of nodes: 2, my index: 1, protocol version -1
2019-02-02 22:52:32 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.50.213:4567
2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.50.214:4567
2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.50.225:4567
2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr proto entry 0x7f1070001240
2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.50.55:4567
2019-02-02 22:52:32 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.50.65:4567
2019-02-02 22:52:32 0 [Note] WSREP: clear restored view
2019-02-02 22:52:32 0 [Warning] WSREP: non weight changing install in S_PRIM: pcmsg{ type=INSTALL, seq=0, flags= 2, node_map {  119c0d79,prim=0,un=0,last_seq=4294967295,last_prim=view_id(NON_PRIM,119c0d79,2),to_seq=-1,weight=1,segment=0
        f766f12e,prim=0,un=0,last_seq=4294967295,last_prim=view_id(NON_PRIM,119c0d79,2),to_seq=-1,weight=1,segment=0
}}
2019-02-02 22:52:32 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = yes, my_idx = 1, memb_num = 2
2019-02-02 22:52:32 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2019-02-02 22:52:32 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 47e64989-2780-11e9-a986-ef5ad4566067
2019-02-02 22:52:32 0 [Note] WSREP: STATE EXCHANGE: got state msg: 47e64989-2780-11e9-a986-ef5ad4566067 from 0 (kvm.amar.com)
2019-02-02 22:52:32 0 [Note] WSREP: STATE EXCHANGE: got state msg: 47e64989-2780-11e9-a986-ef5ad4566067 from 1 (db.9jr.com)
2019-02-02 22:52:32 0 [Warning] WSREP: Quorum: No node with complete state:
 
        Version      : 4
        Flags        : 0x5
        Protocols    : 0 / 9 / 3
        State        : NON-PRIMARY
        Desync count : 0
        Prim state   : NON-PRIMARY
        Prim UUID    : 00000000-0000-0000-0000-000000000000
        Prim  seqno  : -1
        First seqno  : -1
        Last  seqno  : -1
        Prim JOINED  : 0
        State UUID   : 47e64989-2780-11e9-a986-ef5ad4566067
        Group UUID   : 00000000-0000-0000-0000-000000000000
        Name         : 'kvm.amar.com'
        Incoming addr: '192.168.50.30:3306'
 
        Version      : 4
        Flags        : 0x4
        Protocols    : 0 / 9 / 3
        State        : NON-PRIMARY
        Desync count : 0
        Prim state   : NON-PRIMARY
        Prim UUID    : 00000000-0000-0000-0000-000000000000
        Prim  seqno  : -1
        First seqno  : -1
        Last  seqno  : -1
        Prim JOINED  : 0
        State UUID   : 47e64989-2780-11e9-a986-ef5ad4566067
        Group UUID   : 00000000-0000-0000-0000-000000000000
        Name         : 'db.9jr.com'
        Incoming addr: '192.168.50.216:3306'
 
2019-02-02 22:52:32 0 [Warning] WSREP: No re-merged primary component found.
2019-02-02 22:52:32 0 [Note] WSREP: Bootstrapped primary 00000000-0000-0000-0000-000000000000 found: 2.
2019-02-02 22:52:32 0 [Note] WSREP: Quorum results:
        version    = 4,
        component  = PRIMARY,
        conf_id    = -1,
        members    = 2/2 (joined/total),
        act_id     = -1,
        last_appl. = -1,
        protocols  = 0/9/3 (gcs/repl/appl),
        group UUID = 00000000-0000-0000-0000-000000000000
2019-02-02 22:52:32 0 [Note] WSREP: Flow-control interval: [23, 23]
2019-02-02 22:52:32 0 [Note] WSREP: Trying to continue unpaused monitor
2019-02-02 22:52:32 0 [Note] WSREP: Restored state OPEN -> JOINED (-1)
2019-02-02 22:52:32 2 [Note] WSREP: New cluster view: global state: :-1, view# 0: Primary, number of nodes: 2, my index: 1, protocol version 3
2019-02-02 22:52:32 0 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
2019-02-02 22:52:32 0 [ERROR] Aborting
 
Error in my_thread_global_end(): 1 threads didn't exit



 Comments   
Comment by Jacques Amar [ 2019-02-03 ]

The command that crashes and corrupts the full cluster every time:

ALTER TABLE contents
ADD publisher_id               INT UNSIGNED,
ADD cr_holder_id               INT UNSIGNED,
ADD INDEX IDX_content_publishers(publisher_id),
ADD INDEX IDX_content_copyright(cr_holder_id),
ADD CONSTRAINT Refpublishers1691 FOREIGN KEY (publisher_id)
    REFERENCES publishers(publisher_id),
ADD CONSTRAINT Refcopyright_holders1701 FOREIGN KEY (cr_holder_id)
    REFERENCES copyright_holders(cr_holder_id);

Comment by Elena Stepanova [ 2019-02-05 ]

Are there any errors in the error log upon server startup before you run the ALTER statement which crashes the server/cluster?
Did you try to run CHECK TABLE and innochecksum on the table?

Please provide the output of

SHOW CREATE TABLE contents;
SHOW CREATE TABLE publishers;
SHOW CREATE TABLE copyright_holders;
SHOW INDEX IN contents;
SHOW INDEX IN publishers;
SHOW INDEX IN copyright_holders;

and the same for all tables which they reference?

Please also paste or attach your cnf file(s).

Comment by Jacques Amar [ 2019-02-05 ]

I am sorry, this was a bad blocker and couldn't wait on a debug. Had to wipe the whole installation and restore all DBs from a backup. No other errors existed in the logs before touching this.

The issue was not just on an ALTER TABLE. **ANYTHING** touching the table contents was crashing the server. It was that single table. I could not recover from this in any meaningful way. Couldn't delete, truncate, drop. Tried moving to another DB (worked) but couldn't touch it in that other DB either and it was taking a lot of unnecessary space.

I will try to change the meta of this bug so that it's not related to Data Definition.

If it's still relevant to you, I can post the structure and configs.

Let me know

Comment by Jacques Amar [ 2019-02-05 ]

 CREATE TABLE `contents` (
  `content_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `content_slug` varchar(128) COLLATE utf8mb4_bin DEFAULT NULL,
  `song_id` int(10) unsigned DEFAULT NULL,
  `media_cat_id` char(3) COLLATE utf8mb4_bin NOT NULL,
  `content_title` varchar(255) COLLATE utf8mb4_bin NOT NULL,
  `content_rating` varchar(32) COLLATE utf8mb4_bin DEFAULT '1',
  `content_original_date` date DEFAULT NULL,
  `content_play_time` decimal(18,0) unsigned DEFAULT NULL,
  `content_file_size` decimal(18,0) unsigned DEFAULT NULL,
  `content_cms_location` varchar(255) COLLATE utf8mb4_bin DEFAULT NULL,
  `is_content_instrumental` tinyint(3) unsigned NOT NULL DEFAULT 0,
  `is_content_active` tinyint(3) unsigned NOT NULL DEFAULT 1,
  `default_image_id` int(10) unsigned DEFAULT NULL,
  `created_at` datetime NOT NULL,
  `updated_at` datetime NOT NULL,
  `mn_id` varchar(32) COLLATE utf8mb4_bin DEFAULT NULL,
  `label_id` int(10) unsigned DEFAULT NULL,
  `mn_label_id` varchar(32) COLLATE utf8mb4_bin DEFAULT NULL,
  `publisher_id` int(10) unsigned DEFAULT NULL,
  `cr_holder_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`content_id`),
  UNIQUE KEY `AK_Content_Slug` (`content_slug`,`media_cat_id`),
  UNIQUE KEY `AK_media_slug` (`media_cat_id`,`content_slug`),
  KEY `IDX_Media_Song_Cat` (`song_id`,`media_cat_id`),
  KEY `IDX_Media_Cat_Song` (`media_cat_id`,`song_id`),
  KEY `Ref43` (`song_id`),
  KEY `Ref95` (`media_cat_id`),
  KEY `IDX_Content_Title` (`content_title`),
  KEY `RefImages126` (`default_image_id`),
  KEY `IDX_contents_mn_id` (`mn_id`),
  KEY `Reflabels161` (`label_id`),
  KEY `IDX_content_publishers` (`publisher_id`),
  KEY `IDX_content_copyright` (`cr_holder_id`),
  CONSTRAINT `RefImages126` FOREIGN KEY (`default_image_id`) REFERENCES `images` (`image_id`),
  CONSTRAINT `RefMedia_Type5` FOREIGN KEY (`media_cat_id`) REFERENCES `media_types` (`media_cat_id`),
  CONSTRAINT `RefSongs3` FOREIGN KEY (`song_id`) REFERENCES `songs` (`song_id`),
  CONSTRAINT `Refcopyright_holders1701` FOREIGN KEY (`cr_holder_id`) REFERENCES `copyright_holders` (`cr_holder_id`),
  CONSTRAINT `Reflabels161` FOREIGN KEY (`label_id`) REFERENCES `labels` (`label_id`),
  CONSTRAINT `Refpublishers1691` FOREIGN KEY (`publisher_id`) REFERENCES `publishers` (`publisher_id`)
) ENGINE=InnoDB AUTO_INCREMENT=8428 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin 

MariaDB [caltex_cms]> show index in contents;
+----------+------------+------------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name               | Seq_in_index | Column_name      | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+------------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| contents |          0 | PRIMARY                |            1 | content_id       | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| contents |          0 | AK_Content_Slug        |            1 | content_slug     | A         |           0 |     NULL | NULL   | YES  | BTREE      |         |               |
| contents |          0 | AK_Content_Slug        |            2 | media_cat_id     | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| contents |          0 | AK_media_slug          |            1 | media_cat_id     | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| contents |          0 | AK_media_slug          |            2 | content_slug     | A         |           0 |     NULL | NULL   | YES  | BTREE      |         |               |
| contents |          1 | IDX_Media_Song_Cat     |            1 | song_id          | A         |           0 |     NULL | NULL   | YES  | BTREE      |         |               |
| contents |          1 | IDX_Media_Song_Cat     |            2 | media_cat_id     | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| contents |          1 | IDX_Media_Cat_Song     |            1 | media_cat_id     | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| contents |          1 | IDX_Media_Cat_Song     |            2 | song_id          | A         |           0 |     NULL | NULL   | YES  | BTREE      |         |               |
| contents |          1 | Ref43                  |            1 | song_id          | A         |           0 |     NULL | NULL   | YES  | BTREE      |         |               |
| contents |          1 | Ref95                  |            1 | media_cat_id     | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| contents |          1 | IDX_Content_Title      |            1 | content_title    | A         |           0 |     NULL | NULL   |      | BTREE      |         |               |
| contents |          1 | RefImages126           |            1 | default_image_id | A         |           0 |     NULL | NULL   | YES  | BTREE      |         |               |
| contents |          1 | IDX_contents_mn_id     |            1 | mn_id            | A         |           0 |     NULL | NULL   | YES  | BTREE      |         |               |
| contents |          1 | Reflabels161           |            1 | label_id         | A         |           0 |     NULL | NULL   | YES  | BTREE      |         |               |
| contents |          1 | IDX_content_publishers |            1 | publisher_id     | A         |           0 |     NULL | NULL   | YES  | BTREE      |         |               |
| contents |          1 | IDX_content_copyright  |            1 | cr_holder_id     | A         |           0 |     NULL | NULL   | YES  | BTREE      |         |               |
+----------+------------+------------------------+--------------+------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Comment by Elena Stepanova [ 2019-02-05 ]

Thanks.
If it was crashing on everything touching the table, and if it stopped crashing after re-creating the table (presumably with the same structure and similar contents), it's a strong indication of the bad table's files being physically corrupted.
It could happen due to some previous problems in MariaDB server, or disk issues, or an unexpected outage at the moment when it's not crash-safe.
I don't think we can do anything without having those corrupt files present, but I'll assign it to marko in case he wants to take a look at the assertion failure / stack trace before closing it, maybe he'll recognize a known problem.

Comment by Jacques Amar [ 2019-02-12 ]

Hi,

I've tracked down this issue and it's repeatable!

It happens when the under-laying filesystem doens't have enough space for a duplicate of a table being modified. Either with an ALTER or a simple OPTIMIZE. The system tries to create a duplicate of the InnoDB table, the OS runs out of space and MariaDb crashes and corrupts the system irrevocably! Only solution I found was to restore from backup.

We need a clean recovery from out of space on things that duplicate a table. Specifically a very large table that could starve the filesystem.

Let me know if you need more info.

Comment by Marko Mäkelä [ 2019-03-13 ]

ADD FOREIGN KEY would normally use ALGORITHM=COPY. Only with SET foreign_key_checks=OFF it could be an instantaneous operation.

The rbt_tree_add_child at the top of the stack trace is definitely wrong, but the rest of the stack trace looks plausible. I see that Galera is enabled; I wonder if that is playing a role for the error handling.

I believe that we should try to repeat this with ALGORITHM=COPY on a nearly full file system.

Comment by Jacques Amar [ 2019-03-13 ]

" I see that Galera is enabled;"

Interestingly (or not!) both replicated galera servers had the same exact issue with this particular table (they were corrupted). The servers didn't crash, but the table was unusable. I tried an rsync over to my crashed server and it did start, but the table never recovered. Had to resort to a full wipe and restore from backup ... I'm guessing the InnoDB table was damaged internally.

Comment by Eugene Z. Zheganin [ 2020-04-28 ]

Got identical crash today (Oracle Linux 7.7, MariaDB 10.4.12), only that I have plently of space and wasnt altering any metadata. Furthermore, this corruption has propagated to all of the galera members except one, triggering cluster crash (except one node). (Should I report this as a separate bug ? Assertion looks identically).

Apr 27 23:24:33 sso-vrn-db02 mysqld: 2020-04-27 23:24:33 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notif
ication.
Apr 27 23:24:34 sso-vrn-db02 mysqld: 2020-04-27 23:24:34 0 [Note] WSREP: (e85c1d07, 'tcp://0.0.0.0:4567') turning messag
e relay requesting off
Apr 27 23:27:08 sso-vrn-db02 mysqld: 2020-04-27 23:27:08 995 [Warning] Aborted connection 995 to db: 'unconnected' user:
 'monitor' host: '188.186.154.181' (Got timeout reading communication packets)
Apr 28 00:40:20 sso-vrn-db02 mysqld: 2020-04-28 00:40:20 0x7f5ec8153700  InnoDB: Assertion failure in file /home/buildbo
t/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.12/storage/innobase/btr/btr0pcur.cc line 495
Apr 28 00:40:20 sso-vrn-db02 mysqld: InnoDB: Failing assertion: btr_page_get_prev(next_page) == btr_pcur_get_block(curso
r)->page.id.page_no()
Apr 28 00:40:20 sso-vrn-db02 mysqld: InnoDB: We intentionally generate a memory trap.
Apr 28 00:40:20 sso-vrn-db02 mysqld: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
Apr 28 00:40:20 sso-vrn-db02 mysqld: InnoDB: If you get repeated assertion failures or crashes, even
Apr 28 00:40:20 sso-vrn-db02 mysqld: InnoDB: immediately after the mysqld startup, there may be
Apr 28 00:40:20 sso-vrn-db02 mysqld: InnoDB: corruption in the InnoDB tablespace. Please refer to
Apr 28 00:40:20 sso-vrn-db02 mysqld: InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
Apr 28 00:40:20 sso-vrn-db02 mysqld: InnoDB: about forcing recovery.
Apr 28 00:40:20 sso-vrn-db02 mysqld: 200428  0:40:20 [ERROR] mysqld got signal 6 ;
Apr 28 00:40:20 sso-vrn-db02 mysqld: This could be because you hit a bug. It is also possible that this binary
Apr 28 00:40:20 sso-vrn-db02 mysqld: or one of the libraries it was linked against is corrupt, improperly built,
Apr 28 00:40:20 sso-vrn-db02 mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
Apr 28 00:40:20 sso-vrn-db02 mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Apr 28 00:40:20 sso-vrn-db02 mysqld: We will try our best to scrape up some info that will hopefully help
Apr 28 00:40:20 sso-vrn-db02 mysqld: diagnose the problem, but since we have already crashed,
Apr 28 00:40:20 sso-vrn-db02 mysqld: something is definitely wrong and this may fail.
Apr 28 00:40:20 sso-vrn-db02 mysqld: Server version: 10.4.12-MariaDB
Apr 28 00:40:20 sso-vrn-db02 mysqld: key_buffer_size=16777216
Apr 28 00:40:20 sso-vrn-db02 mysqld: read_buffer_size=8388608
Apr 28 00:40:20 sso-vrn-db02 mysqld: max_used_connections=143
Apr 28 00:40:20 sso-vrn-db02 mysqld: max_threads=1026
Apr 28 00:40:20 sso-vrn-db02 mysqld: thread_count=151
Apr 28 00:40:20 sso-vrn-db02 mysqld: It is possible that mysqld could use up to
Apr 28 00:40:20 sso-vrn-db02 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 8684712 K  by
tes of memory
Apr 28 00:40:20 sso-vrn-db02 mysqld: Hope that's ok; if not, decrease some variables in the equation.
Apr 28 00:40:20 sso-vrn-db02 mysqld: Thread pointer: 0x7f5eb4111188
Apr 28 00:40:20 sso-vrn-db02 mysqld: Attempting backtrace. You can use the following information to find out
Apr 28 00:40:20 sso-vrn-db02 mysqld: where mysqld died. If you see no messages after this, something went
Apr 28 00:40:20 sso-vrn-db02 mysqld: terribly wrong...
Apr 28 00:40:20 sso-vrn-db02 mysqld: stack_bottom = 0x7f5ec8152cf0 thread_stack 0x49000
Apr 28 00:40:20 sso-vrn-db02 mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x56469e7fda1e]
Apr 28 00:40:20 sso-vrn-db02 mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x56469e291d8f]
Apr 28 00:40:20 sso-vrn-db02 mysqld: /lib64/libpthread.so.0(+0xf630)[0x7f626e72e630]
Apr 28 00:40:20 sso-vrn-db02 mysqld: /lib64/libc.so.6(gsignal+0x37)[0x7f626c9fe377]
Apr 28 00:40:20 sso-vrn-db02 mysqld: /lib64/libc.so.6(abort+0x148)[0x7f626c9ffa68]
Apr 28 00:40:20 sso-vrn-db02 mysqld: /usr/sbin/mysqld(+0x5b18be)[0x56469df858be]
Apr 28 00:40:20 sso-vrn-db02 mysqld: /usr/sbin/mysqld(+0xbdbc9a)[0x56469e5afc9a]
Apr 28 00:40:20 sso-vrn-db02 mysqld: /usr/sbin/mysqld(+0xb4ecbc)[0x56469e522cbc]
Apr 28 00:40:20 sso-vrn-db02 mysqld: /usr/sbin/mysqld(+0xa6cd80)[0x56469e440d80]
Apr 28 00:40:20 sso-vrn-db02 mysqld: /usr/sbin/mysqld(_ZN7handler11ha_rnd_nextEPh+0x10f)[0x56469e29688f]
Apr 28 00:40:20 sso-vrn-db02 mysqld: /usr/sbin/mysqld(_Z13rr_sequentialP11READ_RECORD+0x1c)[0x56469e3c584c]
Apr 28 00:40:20 sso-vrn-db02 mysqld: /usr/sbin/mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x24f)[0x56469e0b843f]
Apr 28 00:40:20 sso-vrn-db02 mysqld: /usr/sbin/mysqld(_ZN4JOIN10exec_innerEv+0xb9c)[0x56469e0da7fc]
Apr 28 00:40:21 sso-vrn-db02 mysqld: /usr/sbin/mysqld(_ZN4JOIN4execEv+0x33)[0x56469e0daa53]
Apr 28 00:40:21 sso-vrn-db02 mysqld: /usr/sbin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_jP8st_orderS9_
S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x186)[0x56469e0d8d56]
Apr 28 00:40:21 sso-vrn-db02 mysqld: /usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x1d7)[0x56469e0d98c7
]
Apr 28 00:40:21 sso-vrn-db02 mysqld: /usr/sbin/mysqld(+0x59f3d4)[0x56469df733d4]
Apr 28 00:40:21 sso-vrn-db02 mysqld: /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x467d)[0x56469e07f4fd]
Apr 28 00:40:21 sso-vrn-db02 mysqld: /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x24b)[0x56469e0844eb]
Apr 28 00:40:21 sso-vrn-db02 mysqld: /usr/sbin/mysqld(+0x6b0d99)[0x56469e084d99]
Apr 28 00:40:21 sso-vrn-db02 mysqld: /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x2b06)[0x5646
9e088296]
Apr 28 00:40:21 sso-vrn-db02 mysqld: /usr/sbin/mysqld(_Z10do_commandP3THD+0x11c)[0x56469e088a4c]
Apr 28 00:40:21 sso-vrn-db02 mysqld: /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1fa)[0x56469e16732a]
Apr 28 00:40:21 sso-vrn-db02 mysqld: /usr/sbin/mysqld(handle_one_connection+0x3d)[0x56469e16740d]
Apr 28 00:40:21 sso-vrn-db02 mysqld: /lib64/libpthread.so.0(+0x7ea5)[0x7f626e726ea5]
Apr 28 00:40:21 sso-vrn-db02 mysqld: /lib64/libc.so.6(clone+0x6d)[0x7f626cac68cd]
Apr 28 00:40:21 sso-vrn-db02 mysqld: Trying to get some variables.
Apr 28 00:40:21 sso-vrn-db02 mysqld: Some pointers may be invalid and cause the dump to abort.
Apr 28 00:40:21 sso-vrn-db02 mysqld: Query (0x7f5eb4217eb0): select autolockno0_.id as id1_7_, autolockno0_.sended_at as
 sended_a2_7_, autolockno0_.status as status3_7_, autolockno0_.type as type4_7_, autolockno0_.user_id as user_id5_7_ fro
m AUTO_LOCK_NOTIFICATION autolockno0_ where autolockno0_.status='PREPARE' for update
Apr 28 00:40:21 sso-vrn-db02 mysqld: Connection ID (thread ID): 1211
Apr 28 00:40:21 sso-vrn-db02 mysqld: Status: NOT_KILLED
Apr 28 00:40:21 sso-vrn-db02 mysqld: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,ind
ex_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,der
ived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_mat
ch_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_
with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buff
er_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_deriv
ed=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on
Apr 28 00:40:21 sso-vrn-db02 mysqld: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
Apr 28 00:40:21 sso-vrn-db02 mysqld: information that should help you find out what is causing the crash.
Apr 28 00:40:21 sso-vrn-db02 mysqld: Writing a core file...
Apr 28 00:40:21 sso-vrn-db02 mysqld: Working directory at /var/lib/mysql
Apr 28 00:40:21 sso-vrn-db02 mysqld: Resource Limits:
Apr 28 00:40:21 sso-vrn-db02 mysqld: Limit                     Soft Limit           Hard Limit           Units
Apr 28 00:40:21 sso-vrn-db02 mysqld: Max cpu time              unlimited            unlimited            seconds
Apr 28 00:40:21 sso-vrn-db02 mysqld: Max file size             unlimited            unlimited            bytes
Apr 28 00:40:21 sso-vrn-db02 mysqld: Max data size             unlimited            unlimited            bytes
Apr 28 00:40:21 sso-vrn-db02 mysqld: Max stack size            8388608              unlimited            bytes
Apr 28 00:40:21 sso-vrn-db02 mysqld: Max core file size        0                    unlimited            bytes
Apr 28 00:40:21 sso-vrn-db02 mysqld: Max resident set          unlimited            unlimited            bytes
Apr 28 00:40:21 sso-vrn-db02 mysqld: Max processes             63374                63374                processes
Apr 28 00:40:21 sso-vrn-db02 mysqld: Max open files            16364                16364                files
Apr 28 00:40:21 sso-vrn-db02 mysqld: Max locked memory         65536                65536                bytes
Apr 28 00:40:21 sso-vrn-db02 mysqld: Max address space         unlimited            unlimited            bytes
Apr 28 00:40:21 sso-vrn-db02 mysqld: Max file locks            unlimited            unlimited            locks
Apr 28 00:40:21 sso-vrn-db02 mysqld: Max pending signals       63374                63374                signals
Apr 28 00:40:21 sso-vrn-db02 mysqld: Max msgqueue size         819200               819200               bytes
Apr 28 00:40:21 sso-vrn-db02 mysqld: Max nice priority         0                    0
Apr 28 00:40:21 sso-vrn-db02 mysqld: Max realtime priority     0                    0
Apr 28 00:40:21 sso-vrn-db02 mysqld: Max realtime timeout      unlimited            unlimited            us
Apr 28 00:40:21 sso-vrn-db02 mysqld: Core pattern: core

Comment by Marko Mäkelä [ 2020-10-23 ]

emz, an ALTER TABLE with ALGORITHM=COPY is similar to a INSERT…SELECT operation that copies an entire table. What you are experiencing might share the same root cause.

For the out-of-space handling, we have an open bug MDEV-13680, but I do not think that we are hitting it here.

This corruption is occurring in the B-trees, which are below the transaction layer, which Galera is interfacing with. I would think that Galera is an unlikely culprit.

I do not remember our support customers encountering this problem. I am happy to fix this, provided that someone creates a reasonably repeatable test case.

I have a couple of culprit candidates: InnoDB adaptive hash index and buffer pool resizing. We disabled the adaptive hash index by default in 10.5 (MDEV-20487). We have found several race conditions related to buffer pool resizing, one of them being MDEV-23693. Could these have been involved?

Generated at Thu Feb 08 08:44:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.