[MDEV-17877] Hitting Bug while adding new partitions to the table on MariaDB Galera Cluster 10.2 Created: 2018-11-29  Updated: 2019-04-15  Resolved: 2019-04-11

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table
Affects Version/s: 10.2.16
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Marko Assignee: Varun Gupta (Inactive)
Resolution: Duplicate Votes: 0
Labels: crash, galera, innodb
Environment:

10.2.16-MariaDB-10.2.16+maria~trusty-log
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty


Issue Links:
Relates
relates to MDEV-17839 Crash (pure virtual method called) in... Closed

 Description   

Galera Cluster node crashes while adding new partitions to the table with more than 250 partitions. We have scheduled event which is running procedure for automatically adding new partitions.

Problem is experienced on Mariadb Galera Cluster 10.0.34 and 10.2.16 versions.

We have three more tables with more than 150 partitions and same procedure for adding new partitions and everything works without any problems.

Trace:

2018-11-25 18:00:52 139643177436928 [Warning] Sort aborted, host: 10.4.0.8, user: my_user, thread: 166381, query: select missioneve0_.id as id2_23_, missioneve0_.channel as channel3_23_, missioneve0_.created_at as created_4_23_, missioneve0_.event_type as event_ty5_23_, missioneve0_.extra_data as extra_da6_23_, missioneve0_.id_message as id_messa7_23_, missioneve0_.tracking_id as tracking8_23_, missioneve0_.id_user as id_user9_23_, missioneve0_.id_mission as id_miss10_23_, missioneve0_.uuid_user as uuid_us11_23_, missioneve0_.discriminator as discrimi1_23_ from touchpoint_event_log missioneve0_ left outer join missions missionent1_ on missioneve0_.id_mission=missionent1_.id where missioneve0_.discriminator in (2, 3) and missioneve0_.event_type='testMission' and missionent1_.id=137734 and missioneve0_.created_at>'2018-10-25 18:00:52' order by missioneve0_.created_at desc
pure virtual method called
terminate called without an active exception
181125 18:00:52 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.2.16-MariaDB-10.2.16+maria~trusty-log
key_buffer_size=16777216
read_buffer_size=2097152
max_used_connections=91
max_threads=8002
thread_count=150
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 17599632 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f00b8000a88
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f0135fc7e70 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55f9b92e050e]
/usr/sbin/mysqld(handle_fatal_signal+0x355)[0x55f9b8d6c175]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f0e70c7f330]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f0e702d2c37]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f0e702d6028]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(ZN9gnu_cxx27_verbose_terminate_handlerEv+0x155)[0x7f0e709cb535]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e6d6)[0x7f0e709c96d6]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5e703)[0x7f0e709c9703]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5f1bf)[0x7f0e709ca1bf]
/usr/sbin/mysqld(_ZN7handler16ha_external_lockEP3THDi+0xc3)[0x55f9b8d75e63]
/usr/sbin/mysqld(+0x75c5ab)[0x55f9b8e565ab]
/usr/sbin/mysqld(_Z24mysql_unlock_read_tablesP3THDP13st_mysql_lock+0x68)[0x55f9b8e56df8]
/usr/sbin/mysqld(_ZN4JOIN9join_freeEv+0x16b)[0x55f9b8c190bb]
/usr/sbin/mysqld(_ZN4JOIN10exec_innerEv+0x9b5)[0x55f9b8c2e2b5]
/usr/sbin/mysqld(_ZN4JOIN4execEv+0x33)[0x55f9b8c2e553]
/usr/sbin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x11a)[0x55f9b8c2e6aa]
/usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x24c)[0x55f9b8c2f1fc]
/usr/sbin/mysqld(+0x419f0a)[0x55f9b8b13f0a]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x2aa4)[0x55f9b8bdc6e4]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x2bd)[0x55f9b8be2fad]
/usr/sbin/mysqld(+0x4e987f)[0x55f9b8be387f]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1aef)[0x55f9b8be59ff]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x165)[0x55f9b8be6b65]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1aa)[0x55f9b8ca9bfa]
/usr/sbin/mysqld(handle_one_connection+0x3d)[0x55f9b8ca9d1d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7f0e70c77184]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f0e70399ffd]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f00b800f230): is an invalid pointer
Connection ID (thread ID): 166381
Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

We think the query pointer is invalid, but we will try to print it anyway.
Query: select missioneve0_.id as id2_23_, missioneve0_.channel as channel3_23_, missioneve0_.created_at as created_4_23_, missioneve0_.event_type as event_ty5_23_, missioneve0_.extra_data as extra_da6_23_, missioneve0_.id_message as id_messa7_23_, missioneve0_.tracking_id as tracking8_23_, missioneve0_.id_user as id_user9_23_, missioneve0_.id_mission as id_miss10_23_, missioneve0_.uuid_user as uuid_us11_23_, missioneve0_.discriminator as discrimi1_23_ from touchpoint_event_log missioneve0_ left outer join missions missionent1_ on missioneve0_.id_mission=missionent1_.id where missioneve0_.discriminator in (2, 3) and missioneve0_.event_type='testMission' and missionent1_.id=137734 and missioneve0_.created_at>'2018-10-25 18:00:52' order by missioneve0_.created_at desc



 Comments   
Comment by Elena Stepanova [ 2018-12-07 ]

The crash appears to happen upon SELECT which is run separately, not from inside a stored procedure.
Do you mean that a partition is being added in parallel by another connection? Which table is it added to, is it one of those participating in the crashing SELECT? Can you provide the complete SHOW CREATE TABLE for the table in question?

Comment by Marko [ 2018-12-10 ]

Hello Elena,

Yes, crash happens upon SELECT which is run separately.
Partition is being added automatically - scheduled event is calling procedure for adding partitions.
Table which is added partition to is the one mentioned in the crashing SELECT.

Table definition (I have obfuscated some fields):

CREATE TABLE `tel` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`created_at` timestamp NOT NULL DEFAULT current_timestamp(),
`msg_id` varchar(255) NOT NULL,
`tkg_id` varchar(255) DEFAULT NULL,
`user_id` int(10) unsigned DEFAULT NULL,
`user_uuid` char(36) DEFAULT NULL,
`mss_id` int(10) unsigned DEFAULT NULL,
`channel` varchar(100) NOT NULL,
`event_type` varchar(255) NOT NULL,
`extra_data` blob DEFAULT NULL,
`discriminator` int(10) unsigned NOT NULL,
PRIMARY KEY (`created_at`,`id`) USING BTREE,
KEY `tel_id_IDX` (`id`),
KEY `tel_users_IDX` (`user_id`),
KEY `tel_mss_IDX` (`mss_id`),
KEY `tel_event_type_IDX` (`event_type`),
KEY `tel_msg_IDX` (`msg_id`),
KEY `tel_tkg_IDX` (`tkg_id`)
) ENGINE=InnoDB AUTO_INCREMENT=11437264 DEFAULT CHARSET=utf8
PARTITION BY RANGE (unix_timestamp(`created_at`))
(PARTITION `pmin` VALUES LESS THAN (0) ENGINE = InnoDB,
PARTITION `part20180401` VALUES LESS THAN (1522627200) ENGINE = InnoDB,
PARTITION `part20180402` VALUES LESS THAN (1522713600) ENGINE = InnoDB,
PARTITION `part20180403` VALUES LESS THAN (1522800000) ENGINE = InnoDB,
PARTITION `part20180404` VALUES LESS THAN (1522886400) ENGINE = InnoDB,
PARTITION `part20180405` VALUES LESS THAN (1522972800) ENGINE = InnoDB,
...
... cut... there is around 260 partitions...
...
PARTITION `part20181206` VALUES LESS THAN (1544140800) ENGINE = InnoDB,
PARTITION `part20181207` VALUES LESS THAN (1544227200) ENGINE = InnoDB,
PARTITION `part20181208` VALUES LESS THAN (1544313600) ENGINE = InnoDB,
PARTITION `pmax` VALUES LESS THAN MAXVALUE ENGINE = InnoDB)

Regards,
Marko

Comment by Marko [ 2018-12-10 ]

Hello,

I don't know if it helps...
Event is scheduled on node 1 of the cluster, but nodes are crashing randomly.
It is not the case that we have crashes only for node 1.

Also, when I call procedure manually it is executed most of the times without any problem.
When scheduled automatically one node crashes almost always.

Regards,
Marko

Comment by Elena Stepanova [ 2018-12-29 ]

I think it might be the same problem as MDEV-17839. The stack trace (visible parts of it) is similar, pure virtual method called is thrown, the problem occurs on concurrent ALTER TABLE and DML. The difference is that the test case in MDEV-17839 doesn't use partitioning and the concurrent query is UPDATE rather than SELECT, but I suppose it can be tweaked to increase similarity further.

Comment by Elena Stepanova [ 2018-12-29 ]

varun, after MDEV-17839 is fixed, we will need to ask Sutic to check whether the same patch fixes his variation of the problem as well.

Comment by Marko [ 2019-01-28 ]

Hello Elena,

I have added specified optimizer switch to the procedure which is used for adding partitions.
Procedure is executed automatically using events and two times everything worked fine, but third time Node3 crashed.
Event for adding partitions is enabled on Node1.

Below is the error... I have excluded mentined query but it is similar to the query which i sent in initial post of this thread.

Error:

2019-01-28 18:00:04 140095848867584 [Warning] Sort aborted, host: 10.4.0.6, user: my_user thread: 2006148, query: <<< QUERY >>>
190128 18:00:04 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.2.16-MariaDB-1:10.2.16+maria~bionic-log
key_buffer_size=16777216
read_buffer_size=2097152
max_used_connections=44
max_threads=8002
thread_count=105
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 17599632 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f6a2804a288
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f6a9b4e5dd8 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5593763b03ee]
/usr/sbin/mysqld(handle_fatal_signal+0x5a3)[0x559375e37343]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f77f0484890]
/usr/sbin/mysqld(bitmap_get_first_set+0xc)[0x5593763a5e1c]
/usr/sbin/mysqld(+0xb97faf)[0x559376378faf]
/usr/sbin/mysqld(_ZN7handler16ha_external_lockEP3THDi+0x204)[0x559375e40bd4]
/usr/sbin/mysqld(_Z24mysql_unlock_read_tablesP3THDP13st_mysql_lock+0xbc)[0x559375f0b1bc]
/usr/sbin/mysqld(_ZN4JOIN9join_freeEv+0x1a0)[0x559375cdea10]
/usr/sbin/mysqld(_ZN4JOIN10exec_innerEv+0x9c1)[0x559375cf38a1]
/usr/sbin/mysqld(_ZN4JOIN4execEv+0x33)[0x559375cf3b43]
/usr/sbin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x1ec)[0x559375cf3d8c]
/usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x197)[0x559375cf4617]
/usr/sbin/mysqld(+0x4b82d1)[0x559375c992d1]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x64dc)[0x559375ca5dfc]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x2d2)[0x559375ca85f2]
/usr/sbin/mysqld(+0x4c7dc2)[0x559375ca8dc2]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1b34)[0x559375caaf34]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x180)[0x559375cabf30]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x20a)[0x559375d70a1a]
/usr/sbin/mysqld(handle_one_connection+0x3d)[0x559375d70bed]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f77f04796db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f77efa7b88f]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
<<< QUERY >>>
 
Connection ID (thread ID): 2006148
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Regards,
Marko

Comment by Varun Gupta (Inactive) [ 2019-03-14 ]

Sutic can you check if your issue is fixed in MariaDB 10.2.22

Comment by Marko [ 2019-03-14 ]

Hello,

We have patched MariaDB to 10.2.22 and will let you know if we expirience error while automatically adding new partitions.

Regards,
Marko

Comment by Marko [ 2019-03-29 ]

Hello Varun,

it looks like the issue is fixed on 10.2.22.
Since patching we din't have any problems with nodes crashing.

I think this ticket should be closed.

Regards,
Marko

Comment by Varun Gupta (Inactive) [ 2019-04-11 ]

This is fixed by MDEV-14440

Comment by Marko [ 2019-04-15 ]

Hello Varun,

we are now occasionally receiving error "[ERROR] Got error 128 when reading table..." on one of the nodes.
This error happens at the time when we add new partitions.

Can you please explain how can we trace this issue?
As I can see partitions are normally created and I don't see any problems beside this errors in logfile.

Thanks,
Marko

Generated at Thu Feb 08 08:39:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.