[MDEV-5167] Complex DELETE caused mariadb-galera-cluster node to abend with Signal 11 Created: 2013-10-22  Updated: 2015-03-12  Resolved: 2015-03-12

Status: Closed
Project: MariaDB Server
Component/s: wsrep
Affects Version/s: 5.5.33a-galera
Fix Version/s: 10.0.17-galera

Type: Bug Priority: Major
Reporter: Jeff Armstrong Assignee: Nirbhay Choubey (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Debian Wheezy



 Description   

The following DELETE causes mariadb-galera-server to abend with signal 11 (and sometimes signal 6). The server crash logs follow the delete statement. Note that the DELETE statement works if broken into smaller pieces.

DELETE FROM period WHERE process IN ('reconcile','statistics') AND (
  (item_period < '20130830' AND rec_id IN ('10002607', '300032001')) OR
  (item_period < '20130902' AND rec_id IN ('10002530', '10002598', '10003238', '290032001')) OR
  (item_period < '20130903' AND rec_id IN ('10000343', '10828288', '260032001')) OR
  (item_period < '20130905' AND rec_id IN ('10000854', '10002447', '10002472', '10002550', '10002561', '10003120', '100032001', '10003409', '10004172', '10004555', '10004858', '10004861', '10023903', '10032001', '10085903', '110032001', '110132001', '11579596', '11579769', '120032001', '130032001', '150032001', '160032001', '170032001', '190032001', '190132001', '20032001', '250032001', '320051001', '320080001', '320081001', '320082001', '40032001', '60032001', '70032001', '90032001')) OR
  (item_period < '20130905' AND item_period NOT IN ('20130801', '20130802', '20130805', '20130806', '20130807', '20130808', '20130809', '20130812', '20130813', '20130814', '20130815', '20130816', '20130819', '20130820', '20130821', '20130822', '20130823', '20130826', '20130827') AND
    rec_id IN ('11301288')) OR
  (item_period < '20130906' AND rec_id IN ('10001410')) OR
  (item_period < '20130926' AND rec_id IN ('10000346', '10000347', '10002716', '10003510')) OR
  (item_period < '20130930' AND rec_id IN ('10000374', '10000375', '10001810', '10001811', '10750288', '10836288', '10842288', '10966288', '11140288', '11141288', '11142288', '11144288', '11145288', '11146288', '11147288', '11148288', '11149288', '11150288', '11151288', '11152288', '11153288', '11169288', '11170288', '11231288', '11232288', '11237288', '11238288', '11239288', '11483288', '11484288', '11579609', '11579650', '11579700', '11579701', '11579739', '250032004', '250032005', '70032004', '70032005', '90032002', '90032004')) OR
  (item_period < '20131004' AND rec_id IN ('10001418', '10003519', '11579328')) OR
  (item_period < '20131007' AND rec_id IN ('11172288', '300032002')) OR
  (item_period < '20131008' AND rec_id IN ('10001839', '10002531', '10002600', '10003239', '10105903', '10523172', '10749288', '10784288', '10835288', '10837288', '11049288', '11426288', '11485288', '11486288', '11579302', '11579303', '11579387', '11579389', '11579604', '11579638', '11579740', '11579849', '290032002')) OR
  (item_period < '20131009' AND rec_id IN ('10000344', '10829288', '260032002')) OR
  (item_period < '20131010' AND rec_id IN ('10000855', '10002448', '10002474', '10002549', '10002560', '10003119', '10003408', '10004171', '10004556', '10004859', '10004862', '10022903', '10032002', '10084903', '110032002', '110132002', '11579625', '11579770', '120032002', '130032002', '150032002', '160032002', '170032002', '190032002', '190132002', '20032002', '250032002', '320051002', '320080002', '320081002', '320082002', '40032002', '60032002', '70032002', '90032005')) OR
  (item_period < '20131010' AND item_period NOT IN ('20130801', '20130802', '20130805', '20130806', '20130807', '20130808', '20130809', '20130812', '20130813', '20130814', '20130815', '20130816', '20130819', '20130820', '20130821', '20130822', '20130823', '20130826', '20130827') AND rec_id IN ('11300288'))
)

and the server logs...

mysqld: 131021 21:53:14 [ERROR] mysqld got signal 11 ;
mysqld: This could be because you hit a bug. It is also possible that this binary
mysqld: or one of the libraries it was linked against is corrupt, improperly built,
mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
mysqld:
mysqld: To report this bug, see http://kb.askmonty.org/en/reporting-bugs
mysqld:
mysqld: We will try our best to scrape up some info that will hopefully help
mysqld: diagnose the problem, but since we have already crashed,
mysqld: something is definitely wrong and this may fail.
mysqld:
mysqld: Server version: 5.5.33a-MariaDB-1~wheezy-log
mysqld: key_buffer_size=536870912
mysqld: read_buffer_size=2097152
mysqld: max_used_connections=501
mysqld: max_threads=10002
mysqld: thread_count=143
mysqld: It is possible that mysqld could use up to
mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 62159406 K  bytes of memory                                                  
mysqld: Hope that's ok; if not, decrease some variables in the equation.                                                                                   
mysqld:                                                                                                                                                    
mysqld: Thread pointer: 0x0x7f0c92b45000                                                                                                                   
mysqld: Attempting backtrace. You can use the following information to find out                                                                            
mysqld: where mysqld died. If you see no messages after this, something went                                                                               
mysqld: terribly wrong...                                                                                                                                  
mysqld: stack_bottom = 0x7f27c7cccdb0 thread_stack 0x48000                                                                                                 
mysqld: ??:0(??)[0x7f8bcd8a3e5b]                                                                                                                           
mysqld: ??:0(??)[0x7f8bcd4d58d2]                                                                                                                           
mysqld: ??:0(??)[0x7f8bccbb3030]                                                                                                                           
mysqld: ??:0(??)[0x7f8bcd5aef34]                                                                                                                           
mysqld: ??:0(??)[0x7f8bcd5b8720]                                                                                                                           
mysqld: ??:0(??)[0x7f8bcd5dfaa8]                                                                                                                           
mysqld: ??:0(??)[0x7f8bcd390e78]                                                                                                                           
mysqld: ??:0(??)[0x7f8bcd3943d7]                                                                                                                           
mysqld: ??:0(??)[0x7f8bcd3947cf]                                                                                                                           
mysqld: ??:0(??)[0x7f8bcd39652a]                                                                                                                           
mysqld: ??:0(??)[0x7f8bcd396c59]                                                                                                                           
mysqld: ??:0(??)[0x7f8bcd44994b]                                                                                                                           
mysqld: ??:0(??)[0x7f8bcd4499f1]                                                                                                                           
mysqld: ??:0(??)[0x7f8bccbaab50]                                                                                                                           
mysqld: ??:0(??)[0x7f8bcb4cea7d]
mysqld:
mysqld: Trying to get some variables.
mysqld: Some pointers may be invalid and cause the dump to abort.
mysqld: Query (0x7f00bddb2018): is an invalid pointer
mysqld: Connection ID (thread ID): 1044736
mysqld: Status: NOT_KILLED
mysqld:
mysqld: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engin
mysqld:
mysqld: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
mysqld: information that should help you find out what is causing the crash.



 Comments   
Comment by Elena Stepanova [ 2013-10-22 ]

Could you please paste SHOW CREATE and SHOW INDEX for the table `period`?

Thanks.

Comment by Jeff Armstrong [ 2013-10-24 ]

MariaDB [sbld_testeda]> show create table period;
CREATE TABLE `period` (
`client_id` int(10) unsigned NOT NULL,
`type` char(10) NOT NULL,
`subtype` varchar(20) NOT NULL,
`item_period` int(8) unsigned NOT NULL,
`relation_id` int(10) unsigned NOT NULL,
`keep_until` int(8) unsigned NOT NULL,
`rec_id` int(10) unsigned NOT NULL,
`process` varchar(30) NOT NULL,
`started` datetime NOT NULL,
`deleted_ind` char(1) NOT NULL,
`updated` datetime NOT NULL,
PRIMARY KEY (`client_id`,`type`,`item_period`,`relation_id`,`rec_id`,`process`,`subtype`),
KEY `rec_id` (`rec_id`,`item_period`),
KEY `process` (`process`,`item_period`,`type`,`subtype`),
KEY `type` (`type`,`process`,`item_period`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 PACK_KEYS=1

MariaDB [sbld_testeda]> show index in period;
---------------------------------------------------------------------------------------------------------------------

Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment

---------------------------------------------------------------------------------------------------------------------

period 0 PRIMARY 1 client_id A 8 NULL NULL   BTREE    
period 0 PRIMARY 2 type A 48 NULL NULL   BTREE    
period 0 PRIMARY 3 item_period A 293 NULL NULL   BTREE    
period 0 PRIMARY 4 relation_id A 293 NULL NULL   BTREE    
period 0 PRIMARY 5 rec_id A 293 NULL NULL   BTREE    
period 0 PRIMARY 6 process A 293 NULL NULL   BTREE    
period 0 PRIMARY 7 subtype A 293 NULL NULL   BTREE    
period 1 rec_id 1 rec_id A 2 NULL NULL   BTREE    
period 1 rec_id 2 item_period A 293 NULL NULL   BTREE    
period 1 process 1 process A 5 NULL NULL   BTREE    
period 1 process 2 item_period A 293 NULL NULL   BTREE    
period 1 process 3 type A 293 NULL NULL   BTREE    
period 1 process 4 subtype A 293 NULL NULL   BTREE    
period 1 type 1 type A 32 NULL NULL   BTREE    
period 1 type 2 process A 48 NULL NULL   BTREE    
period 1 type 3 item_period A 293 NULL NULL   BTREE    

---------------------------------------------------------------------------------------------------------------------

Comment by Elena Stepanova [ 2013-11-01 ]

I tried to reproduce the problem on some artificial data, but no luck so far.
Are you getting the crash in a single-node setup, or is it a cluster? Does it happen on the node where the DELETE is initially performed, or on some other nodes when it attempts to replicate it?

Comment by Jeff Armstrong [ 2013-11-02 ]

The crash occurs in a two-node, one-garbd cluster. The crash seems to occur immediately on the node on which the SQL is executed - the other node continues to run. When the crashed node restarts, it performs IST/SST to recover, and the deletes have not occurred on any node.

I will try and pin it down further for you - for example I will see if I can cause the crash with wsrep_on=off, but this may take me a few weeks as I will have to do this over the weekends.

Regards
Jeff

Comment by Nirbhay Choubey (Inactive) [ 2014-12-23 ]

mariadb@aquabolt.com Did you notice this error on later versions?

Comment by Jeff Armstrong [ 2014-12-23 ]

After running our trial for about 10 months, we decided that the Maria+Galera combo was not suitable for our specific requirement. This means I no longer have a full sized environment to repeat the test for you. The failure was consistent and repeatable, and testing showed that it was directly related to the number of where clauses. We modified our SQL handler to split complex statements (i.e. >n clauses) into multiple statements, which fixed the issue in our application layer. Once we had made this change, we ran without a repeat of the issue for over three months. Regards, Jeff.

Comment by Nirbhay Choubey (Inactive) [ 2014-12-23 ]

Ok, thanks!

Comment by Nirbhay Choubey (Inactive) [ 2014-12-23 ]

BTW, do you have the error log by any chance? I would be interested in the log around signal 6.

Generated at Thu Feb 08 07:02:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.