[MDEV-17172] ALTER TABLE on the base table of a OQGRAPH table leads to crash Created: 2018-09-11  Updated: 2023-04-27

Status: Confirmed
Project: MariaDB Server
Component/s: Storage Engine - OQGRAPH
Affects Version/s: 10.3.9, 10.2, 10.3
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Oli Sennhauser Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: columns, crash, drop, oqgraph
Environment:

Linux, generic binaries


Issue Links:
Relates
relates to MDEV-13918 Race condition between INFORMATION_SC... Closed
relates to MDEV-26227 Convert Character Set of backing tabl... Open

 Description   

CREATE TABLE rivers (
  origid INT UNSIGNED NOT NULL
, destid INT UNSIGNED NOT NULL
, name VARCHAR(32)
, weight DOUBLE NOT NULL
, PRIMARY KEY (origid, destid)
, KEY (destid)
);
 
INSERT INTO rivers (origid, destid, name, weight) VALUES
  (1,2,'Rhein',1), (2,3,'Rhein',1), (3,4,'Rhein',1), (4,5,'Rhein',1), (2,6,'Rhein',1), (5,6,'Rhein',1)
, (6,2,'Thur',2), (7,3,'Glatt',2)
, (8,9,'Aare',2), (9,10,'Aare',2), (10,4,'Aare',2)
, (11,12,'Reuss',3), (12,10,'Reuss',3), (13,12,'Lorze',4)
, (14,15,'Limmat',3), (15,9,'Limmat',3), (16,15,'Reppisch',4)
;
 
 
CREATE TABLE river_graph (
  latch VARCHAR(32) NULL
, origid BIGINT UNSIGNED NULL
, destid BIGINT UNSIGNED NULL
, weight DOUBLE NULL
, seq BIGINT UNSIGNED NULL
, linkid BIGINT UNSIGNED NULL
, KEY (latch, origid, destid) USING HASH
, KEY (latch, destid, origid) USING HASH
) ENGINE=OQGRAPH data_table='rivers' origid='origid' destid='destid' weight='weight'
;
 
SELECT * FROM river_graph;
 
 
SELECT * FROM river_graph WHERE latch="leaves" AND origid=16;
alter table rivers drop column name;

 
Server version: 10.3.9-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=3
max_threads=153
thread_count=7
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467396 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f63640009a8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f63ddbf9e90 thread_stack 0x49000
/home/mysql/product/mariadb-10.3/bin/mysqld(my_print_stacktrace+0x2b)[0x55e9829c913b]
/home/mysql/product/mariadb-10.3/bin/mysqld(handle_fatal_signal+0x537)[0x55e9824ae837]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f63fb1fb390]
linux/raise.c:54(__GI_raise)[0x7f63fa7ca428]
stdlib/abort.c:91(__GI_abort)[0x7f63fa7cc02a]
ut/ut0rbt.cc:461(rbt_eject_node(ib_rbt_node_t*, ib_rbt_node_t*) [clone .part.4])[0x55e98221e6dd]
row/row0merge.cc:4527(row_merge_drop_table(trx_t*, dict_table_t*))[0x55e982703f6c]
handler/handler0alter.cc:9980(ha_innobase::commit_inplace_alter_table(TABLE*, Alter_inplace_info*, bool))[0x55e98266784b]
sql/sql_table.cc:7618(mysql_inplace_alter_table(THD*, TABLE_LIST*, TABLE*, TABLE*, Alter_inplace_info*, enum_alter_inplace_result, MDL_request*, Alter_table_ctx*) [clone .isra.227])[0x55e982212e01]
sql/sql_table.cc:9715(mysql_alter_table(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool))[0x55e98237e194]
sql/sql_alter.cc:495(Sql_cmd_alter_table::execute(THD*))[0x55e9823c82fe]
sql/sql_parse.cc:6284(mysql_execute_command(THD*))[0x55e9822f4eb9]
sql/sql_class.h:1809(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x55e9822fc239]
sql/sql_parse.cc:1850(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x55e9822fe2b3]
sql/sql_parse.cc:1397(do_command(THD*))[0x55e9822fea4e]
sql/sql_connect.cc:1402(do_handle_one_connection(CONNECT*))[0x55e9823c4f8f]
sql/sql_connect.cc:1310(handle_one_connection)[0x55e9823c50b4]
perfschema/pfs.cc:1865(pfs_spawn_thread)[0x55e98262644d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f63fb1f16ba]
x86_64/clone.S:111(clone)[0x7f63fa89c41d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f63640115f0): alter table rivers drop column name
Connection ID (thread ID): 14
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on



 Comments   
Comment by Alice Sherepa [ 2018-09-12 ]

Thanks for the report!
Reproducible with InnoDB on 10.2 ( e76c4c06f18c0d09a2)- 10.3 (8dda6d797a22b11ad8afa374a154) - Assertion `user_table->get_ref_count() == 1' failed in commit_try_rebuild

10.2/storage/innobase/handler/handler0alter.cc:7877: bool commit_try_rebuild(Alter_inplace_info*, ha_innobase_inplace_ctx*, TABLE*, const TABLE*, trx_t*, const char*): Assertion `user_table->get_ref_count() == 1' failed.
 
#4  0x00007f925f9cf428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#5  0x00007f925f9d102a in __GI_abort () at abort.c:89
#6  0x00007f925f9c7bd7 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55f4241f0f40 "user_table->get_ref_count() == 1", file=file@entry=0x55f4241ef468 "/10.2/storage/innobase/handler/handler0alter.cc", line=line@entry=7877, function=function@entry=0x55f4241f5a20 <commit_try_rebuild(Alter_inplace_info*, ha_innobase_inplace_ctx*, TABLE*, TABLE const*, trx_t*, char const*)::__PRETTY_FUNCTION__> "bool commit_try_rebuild(Alter_inplace_info*, ha_innobase_inplace_ctx*, TABLE*, const TABLE*, trx_t*, const char*)") at assert.c:92
#7  0x00007f925f9c7c82 in __GI___assert_fail (assertion=0x55f4241f0f40 "user_table->get_ref_count() == 1", file=0x55f4241ef468 "/10.2/storage/innobase/handler/handler0alter.cc", line=7877, function=0x55f4241f5a20 <commit_try_rebuild(Alter_inplace_info*, ha_innobase_inplace_ctx*, TABLE*, TABLE const*, trx_t*, char const*)::__PRETTY_FUNCTION__> "bool commit_try_rebuild(Alter_inplace_info*, ha_innobase_inplace_ctx*, TABLE*, const TABLE*, trx_t*, const char*)") at assert.c:101
#8  0x000055f423c0e41f in commit_try_rebuild (ha_alter_info=0x7f9258369490, ctx=0x7f920c0138e8, altered_table=0x7f920c0508e0, old_table=0x7f920c040800, trx=0x7f9258609278, table_name=0x7f920c03c8ed "rivers") at /10.2/storage/innobase/handler/handler0alter.cc:7877
#9  0x000055f423c0aad0 in ha_innobase::commit_inplace_alter_table (this=0x7f920c041408, altered_table=0x7f920c0508e0, ha_alter_info=0x7f9258369490, commit=true) at /10.2/storage/innobase/handler/handler0alter.cc:8656
#10 0x000055f423892e83 in handler::ha_commit_inplace_alter_table (this=0x7f920c041408, altered_table=0x7f920c0508e0, ha_alter_info=0x7f9258369490, commit=true) at /10.2/sql/handler.cc:4230
#11 0x000055f4236e46bf in mysql_inplace_alter_table (thd=0x7f920c000b00, table_list=0x7f920c012580, table=0x7f920c040800, altered_table=0x7f920c0508e0, ha_alter_info=0x7f9258369490, inplace_supported=HA_ALTER_INPLACE_NO_LOCK_AFTER_PREPARE, target_mdl_request=0x7f9258369500, alter_ctx=0x7f925836a0c0) at /10.2/sql/sql_table.cc:7395
#12 0x000055f4236e9bd8 in mysql_alter_table (thd=0x7f920c000b00, new_db=0x7f920c012b90 "test", new_name=0x0, create_info=0x7f925836acd0, table_list=0x7f920c012580, alter_info=0x7f925836ac20, order_num=0, order=0x0, ignore=false) at /10.2/sql/sql_table.cc:9394
#13 0x000055f4237643ae in Sql_cmd_alter_table::execute (this=0x7f920c012bc0, thd=0x7f920c000b00) at /10.2/sql/sql_alter.cc:324
#14 0x000055f42361c9f3 in mysql_execute_command (thd=0x7f920c000b00) at /10.2/sql/sql_parse.cc:6225
#15 0x000055f423621a8a in mysql_parse (thd=0x7f920c000b00, rawbuf=0x7f920c012488 "alter table rivers drop column origid", length=37, parser_state=0x7f925836c080, is_com_multi=false, is_next_command=false) at /10.2/sql/sql_parse.cc:8009
#16 0x000055f42360f223 in dispatch_command (command=COM_QUERY, thd=0x7f920c000b00, packet=0x7f920c0963f1 "alter table rivers drop column origid", packet_length=37, is_com_multi=false, is_next_command=false) at /10.2/sql/sql_parse.cc:1824
#17 0x000055f42360db7f in do_command (thd=0x7f920c000b00) at /10.2/sql/sql_parse.cc:1378
#18 0x000055f42375f0dd in do_handle_one_connection (connect=0x55f42605a800) at /10.2/sql/sql_connect.cc:1335
#19 0x000055f42375ee5d in handle_one_connection (arg=0x55f42605a800) at /10.2/sql/sql_connect.cc:1241
#20 0x000055f423f8b9a4 in pfs_spawn_thread (arg=0x55f425fa6200) at /10.2/storage/perfschema/pfs.cc:1862
#21 0x00007f926060c6ba in start_thread (arg=0x7f925836d700) at pthread_create.c:333
#22 0x00007f925faa141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

on 10.0-10.1

mysqltest: At line 39: query 'alter table rivers drop column name' failed: 1205: Lock wait timeout exceeded; try restarting transaction

Comment by Marko Mäkelä [ 2018-09-24 ]

Executing table-rebuilding DDL statements (such as certain ALTER TABLE or OPTIMIZE TABLE, or starting with MDEV-13564, also TRUNCATE TABLE) requires that there exist no open handles on the table.

Given that the implementation of DROP COLUMN inside InnoDB has not changed between MariaDB 10.0 and 10.3, it seems to me that the locking must have changed. Without debugging, it seems to me that no metadata lock (MDL) on the base table is being held in 10.2, while it is being held (and blocking the ALTER TABLE) in 10.0 and 10.1.

pprkut, how does the locking of the base table work in OQGRAPH? Has it changed in MariaDB Server 10.2?

Comment by Sergey Vojtovich [ 2018-09-24 ]

This is oqgraph bug. It opens a table, bypassing table cache and keeps TABLE instance open.

At some point ALTER TABLE issues tdc_remove_table() to make sure there're no stale TABLE instances left. However it won't notice such unregistered TABLE instance. Eventually we get this assertion failure.

Comment by Marko Mäkelä [ 2018-09-24 ]

pprkut, I hope you can provide a fix for OQGRAPH.

Comment by Heinz Wiesinger [ 2018-09-24 ]

This is most likely related to MDEV-11342. I discussed this a bit with Sergey at the meetup, but didn't get to actually work on a fix yet.
However, I did have this on my plate for this week, so I'll see how far I'll get.

Comment by richardeaxon [ 2019-01-11 ]

Any update on this and MDEV-11342?

Comment by Heinz Wiesinger [ 2019-01-11 ]

I made a little progress a couple months ago in prototyping a solution, but it's rather complicated to fix correctly and I haven't had time to continue since. It's regularly on my mind though, so I hope I can revisit this soon.

Generated at Thu Feb 08 08:34:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.