[MCOL-3383] mysqld crashing on replication drop table - local query PMs Created: 2019-06-14  Updated: 2020-08-25  Resolved: 2019-06-14

Status: Closed
Project: MariaDB ColumnStore
Component/s: MariaDB Server
Affects Version/s: 1.2.3
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: David Hill (Inactive) Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Environment:

2um 2pm with local query


Issue Links:
Duplicate
is duplicated by MCOL-2061 MariaDB shows warnings and could cras... Closed

 Description   

Customer reporting the system was down and the mysqld on pm1 and pm2 were crashing.

This might be the same issue as https://jira.mariadb.org/browse/MCOL-2212, which is tied to 1.2.4.

But customer is asking is the actually crash fixed? They request that the mysqld server be fixed not to crash in the case of the failed drop table.

Server version: 10.3.13-MariaDB-log
key_buffer_size=536870912
read_buffer_size=4194304
max_used_connections=0
max_threads=153
thread_count=9
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1781053 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7efcf40012a8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7efd54405588 thread_stack 0x80000
mysys/stacktrace.c:270(my_print_stacktrace)[0x55d934421ac9]
sql/signal_handler.cc:168(handle_fatal_signal)[0x55d933f5a6ff]
sigaction.c:0(__restore_rt)[0x7efd9d1095d0]
sql/ha_sequence.cc:53(ha_sequence)[0x55d934404e17]
sql/handler.cc:268(get_new_handler(TABLE_SHARE*, st_mem_root*, handlerton*))[0x55d933f5c770]
sql/handler.cc:2588(ha_delete_table(THD*, handlerton*, char const*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, bool))[0x55d933f62587]
sql/sql_table.cc:2511(mysql_rm_table_no_locks(THD*, TABLE_LIST*, bool, bool, bool, bool, bool, bool))[0x55d933e2939f]
sql/sql_table.cc:2125(mysql_rm_table(THD*, TABLE_LIST*, bool, bool, bool))[0x55d933e2a433]
sql/sql_parse.cc:5108(mysql_execute_command(THD*))[0x55d933dafdb6]
sql/sql_parse.cc:8143(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x55d933db3fa2]
sql/log_event.cc:5681(Query_log_event::do_apply_event(rpl_group_info*, char const*, unsigned int))[0x55d93404534e]
sql/log_event.h:1483(Log_event::apply_event(rpl_group_info*))[0x55d933d17728]
sql/slave.cc:4371(exec_relay_log_event)[0x55d933d206a2]
pthread_create.c:0(start_thread)[0x7efd9d101dd5]
/lib64/libc.so.6(clone+0x6d)[0x7efd9b0acead]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7efcf4014d84): is an invalid pointer
Connection ID (thread ID): 11
Status: NOT_KILLED



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2019-06-14 ]

Duplicate of MCOL-2061. User either needs to upgrade to 1.2.4 and follow the upgrade procedure or use the "ALTER TABLE ... CHANGE" workaround in the notes for MCOL-2061.

Comment by David Hill (Inactive) [ 2019-06-14 ]

From customer:

1) it seems that the upgrade procedure erase the existing comments, we have tables with autoincrement in comments. So this won't really work and we still have to manually do this. That 'workaround' needs to work in all cases.

2) my comment is still valid, replication issues should not crash the mysqld and a mysqld crash should not bring down the columnstore cluster. which the patch for MCOL-2061 does not address.

Comment by Andrew Hutchings (Inactive) [ 2019-06-17 ]

1) Autoincrement getting removed from the table comments is fine, that is only used at the CREATE TABLE phase to tell ColumnStore the initial information. Many alternative workarounds have been tried and they are more risky or complicated than this one. Also if they use the "ALTER TABLE ... CHANGE" workaround as in my first comment to this ticket the table comment would remain since this is altering a column comment (any column will work).

2) The workaround does address this because it stops the crashing. But if that isn't acceptable then MDEV-19120 needs to be fixed. That is not a bug in ColumnStore but a bug in MariaDB Server and affects all external engines. All we can do is provide the workaround until it gets fixed.

Comment by David Hill (Inactive) [ 2019-06-18 ]

from customer:

I understand that the engine part cannot do anything about the server side crashing, do you want me to open a different ticket for the mariadb server side then?

Also, the point I am making is different from who is responsible for that crash or can we workaround it.

1) The mysqld server side, should not crash on replication issue, maybe a log an error and stop the slave instead of a segfault.

2) The columnstore stack went degraded while the mysqld was not on a um and not all um access point were down. The system should have been queryable fine. Then the system stays in fail state upon restart. This has to do with management of the stack and what is crucial to run or not, and robustness and recovery from failure, whether or not the crash has originated from columnstore.

This is viewed as a mariadb unified product (mariadb platform).

Let me known if i need to open a different ticket to escalate this on the server side.

Generated at Thu Feb 08 02:42:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.