[MDEV-714] LP:1020645 - crash (sig 11) with union query Created: 2012-07-03  Updated: 2014-12-03  Resolved: 2014-02-14

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.4, 5.3.12, 5.5.33a
Fix Version/s: 5.5.36, 10.0.9, 5.3.13

Type: Bug Priority: Major
Reporter: Peter (Stig) Edwards (Inactive) Assignee: Sergei Golubchik
Resolution: Fixed Votes: 1
Labels: Launchpad

Attachments: XML File LPexportBug1020645.xml    
Issue Links:
Relates
relates to MDEV-7256 SEGV in JOIN::exec() during normal sh... Closed

 Description   

Hello and thanks for mariadb-5.3.7-Linux-x86_64.tar.gz,

Running on 2.6.18-274.7.1.el5 x86_64 x86_64 x86_64 GNU/Linux RedHat EL.

We recently upgraded some mysqld instances in a pool from MySQL 5.1.X to mariadb 5.3.7, we have had two crashes (on different instances on different hosts) with the same backtrace and very similar queries, the pool is a production pool so the priority has been restoration of service and a rollback, the query from the first backtrace did not cause a crash when run in isolation on our development and staging instances (I have not tested the 2nd yet), the development and staging instances do not have identical configurations, data or queries running, but are the same version (MariaDB 5.3.7) on the same OS on the same architecture. Oh, also both mariadb instances ran for several days days in production before crashing.

Here are the contents of the error log for the first crash (the backtrace is the same for the second crash and the query is very similar). I have removed the actual query reported and have included a representation of the query, I can send the actual query and table definitions privately.

I am wondering if (and hoping that) the backtrace looks familiar.

120627  5:20:14 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 5.3.7-MariaDB-log
key_buffer_size=33554432
read_buffer_size=2097152
max_used_connections=319
max_threads=3001
thread_count=29
connection_count=29
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18510029 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x2ab7fc3499b0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x512fc0f8 thread_stack 0x48000
./bin/mysqld(my_print_stacktrace+0x2e) [0xa2b62e]
./bin/mysqld(handle_fatal_signal+0x3f9) [0x7627f9]
/lib64/libpthread.so.0 [0x341f00eb70]
./bin/mysqld [0x6be068]
./bin/mysqld(JOIN::exec()+0x852) [0x6cfc52]
./bin/mysqld(st_select_lex_unit::exec()+0x184) [0x7ab904]
./bin/mysqld(mysql_union(THD*, st_lex*, select_result*, st_select_lex_unit*, unsigned long)+0x2e) [0x7add0e]
./bin/mysqld(handle_select(THD*, st_lex*, select_result*, unsigned long)+0x82) [0x6d2632]
./bin/mysqld [0x647e7e]
./bin/mysqld(mysql_execute_command(THD*)+0x3a58) [0x64da78]
./bin/mysqld(mysql_parse(THD*, char*, unsigned int, char const**)+0x299) [0x650859]
./bin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0xa9b) [0x65174b]
./bin/mysqld(do_command(THD*)+0x101) [0x6522a1]
./bin/mysqld(handle_one_connection+0xfd) [0x6436ad]
/lib64/libpthread.so.0 [0x341f00673d]
/lib64/libc.so.6(clone+0x6d) [0x341e4d44bd]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x2ab7deb1abf8): 
(select many fields from a few tables with joins and inner joins group by sort by limit 5000)
UNION ALL 
(select many fields from a few tables with joins and inner joins group by sort by limit 5000) 
order by limit 5000
Connection ID (thread ID): 30483258
Status: KILL_CONNECTION
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

Thank you.



 Comments   
Comment by Elena Stepanova [ 2012-07-03 ]

Re: crashes (sig 11) with 5.3.7-MariaDB union query
Hi Peter,

According to the log, the connection is in KILL_CONNECTION status – were you trying to kill it because it was hung?
If so, it somewhat reminds the bug #998516 (don't be fooled by the 'released' status, it was fixed after 5.3.7).
Otherwise, there is not much of a stack trace, but I will see what else we had with UNION recently.

Comment by Peter (Stig) Edwards (Inactive) [ 2012-07-03 ]

Re: crashes (sig 11) with 5.3.7-MariaDB union query
Not trying to kill the connection. I don't think there is anything in place doing that but I shall check our logs. The other crash also had KILL_CONNECTION status.
Thanks.

Comment by Peter (Stig) Edwards (Inactive) [ 2012-07-03 ]

Re: crashes (sig 11) with 5.3.7-MariaDB union query
I don't see any queries killing other queries, or connections at around the time or in the binlog that look like they might kill queries. We have a scheduled rollback of the remaining mariadb pool member this week, and will probably try again with 5.3.8 when it is out and internally QA'ed. If the remaining 5.3.7 instance crashes before we roll it back then I'll aim to at least try running the query from it's backtrace before the rollback, otherwise we will try (again) to reproduce in development and staging environments. Thanks.

Comment by Elena Stepanova [ 2012-07-03 ]

Re: crashes (sig 11) with 5.3.7-MariaDB union query
Please do upload the query, table structures and my.cnf to the private FTP. If you can provide data dumps, it might help too, otherwise please at least give us the idea of how many rows the tables contain.
Thanks.

Comment by Peter (Stig) Edwards (Inactive) [ 2012-07-03 ]

Re: crashes (sig 11) with 5.3.7-MariaDB union query
I have uploaded the query, the table structures, row counts and the my.cnf to the private FTP, the filename has this bug number in it. I can not provide a data dump right now, but may be able to when we do the rollback, it would be about 2GB of data. Thanks.

Comment by Elena Stepanova [ 2012-07-03 ]

Re: crashes (sig 11) with 5.3.7-MariaDB union query
Hi Peter,

You mentioned earlier that you tried the query on your dev/staging instances; but the query that comes with the backtrace that you provided is corrupted (all group by and a part of the order by in the first union part is gone, as well as the end of the query). Do you have the full version?

Comment by Peter (Stig) Edwards (Inactive) [ 2012-07-03 ]

Re: crashes (sig 11) with 5.3.7-MariaDB union query
Sorry, resent with the original full version. Thanks.

Comment by Rasmus Johansson (Inactive) [ 2012-07-04 ]

Launchpad bug id: 1020645

Comment by Peter (Stig) Edwards (Inactive) [ 2012-07-04 ]

Re: crashes (sig 11) with 5.3.7-MariaDB union query
I was able to take dumps of the tables (changes are frequent so the data is not the same as at the point of the crash) and import them into a development instance, the development instance has the same my.cnf apart from the port and the innodb_buffer_pool_size (smaller). This was with the latest crash query and it did not crash the development instance. Some of the data is sensitive/private and I am unable to send it to you without first changing much of it, so until I have a reproducer I will hold off on changing the data so that it can be sent.
Thanks for looking.

Comment by Elena Stepanova [ 2013-03-26 ]

Possibly the issue was related to MDEV-3217 (the stack trace is similar, as well as UNION etc.). Otherwise, not enough information to proceed.

Comment by Peter (Stig) Edwards [ 2013-06-07 ]

Hello and thanks for mariadb-5.3.12-Linux-x86_64.tar.gz

We have had two more crashes on production mysqld instances running 5.3.12 (based on mysql 5.1.67) on two different hosts, both RHEL5 x86_64. No reproducer yet. The hosts had each been up and running with a production load for several weeks.

The trace is the same as previously reported in this bug which was the same database/application but with mariadb-5.3.7-Linux-x86_64.tar.gz (based on 5.1.62),
and for the latest 2 crashes the status is also "KILL CONNECTION".

I was wondering if these stack traces might look familiar and be a bug fixed in 5.3.13 (or mysql 5.1.69).

http://bugs.mysql.com/bug.php?id=67707 - the stacktrace looks a little similar, as does https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=842052 and this https://bugzilla.redhat.com/show_bug.cgi?id=880104
http://bugs.mysql.com/bug.php?id=65971 - I don't have access to this.
http://dev.mysql.com/doc/relnotes/mysql/5.1/en/news-5-1-69.html - mentions a few fixes for KILL QUERY with InnoDB (the tables are InnoDB), from some of the comments in the percona bug I'm not sure if this was a regression introduced in 5.1.67 or earlier (the mysql bugs are internal), but mysql.com bug 67707 is against RedHat 5.1.66, and redhat bug 842052 is against RedHat 5.1.61 (I know RedHat may have backported security fixes into these versions).
http://bugs.mysql.com/bug.php?id=68051
http://bugs.mysql.com/bug.php?id=68928
https://bugs.launchpad.net/percona-server/+bug/1134757

Stack traces below, thank you.
(table names changed, fields removed and changed and values changed for public
bug report)

130522  4:25:30
 
Server version: 5.3.12-MariaDB-log
key_buffer_size=33554432
read_buffer_size=2097152
max_used_connections=726
max_threads=3001
thread_count=35
connection_count=35
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18510193 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x2ab8214cd180
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x588030f8 thread_stack 0x48000
./bin/mysqld(my_print_stacktrace+0x2e) [0x998a1e]
./bin/mysqld(handle_fatal_signal+0x3f9) [0x71ede9]
/lib64/libpthread.so.0 [0x35e940eb70]
./bin/mysqld [0x679018]
./bin/mysqld(JOIN::exec()+0x856) [0x68afb6]
./bin/mysqld(st_select_lex_unit::exec()+0x1b4) [0x768514]
./bin/mysqld(mysql_union(THD*, st_lex*, select_result*, st_select_lex_unit*, unsigned long)+0x2e) [0x76a91e]
./bin/mysqld(handle_select(THD*, st_lex*, select_result*, unsigned long)+0x82) [0x68d9e2]
./bin/mysqld [0x601e9e]
./bin/mysqld(mysql_execute_command(THD*)+0x3a43) [0x607833]
./bin/mysqld(mysql_parse(THD*, char*, unsigned int, char const**)+0x299) [0x60a629]
./bin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0xdf9) [0x60b879]
./bin/mysqld(do_command(THD*)+0x101) [0x60c0b1]
./bin/mysqld(handle_one_connection+0xfd) [0x5fd7bd]
/lib64/libpthread.so.0 [0x35e940673d]
/lib64/libc.so.6(clone+0x6d) [0x35e88d44bd]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x2ab8216c2648): /* comment */ 
(select fields from docs1 left join ret1 using (id) inner join  ( select id from docs1 left join look1 using (id) where ((look1.id IN ('id1',...,'id1')) AND (look1.sdate <= 20130212)) group by id order by docs1.sdate desc, docs1.stime desc  limit 6000 ) AS tmp using (id)) 
UNION ALL 
(select fields from docs2 left join ret2 using (id) inner join  ( select id from docs2 left join look2 using (id) where ((lookup2.id IN ('id1',...'id2')) AND (look2.sdate >= 20120905)) group by id order by docs1.sdate desc, docs2.time desc  limit 6000 ) AS tmp using (id)) 
order by sdate desc, stime desc limit 6000
 
Connection ID (thread ID): 173717823
Status: KILL_CONNECTION
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

130606 16:07:30 
 
Server version: 5.3.12-MariaDB-log
key_buffer_size=33554432
read_buffer_size=2097152
max_used_connections=1612
max_threads=3001
thread_count=19
connection_count=19
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18510193 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x2ab767c53db0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x500e80f8 thread_stack 0x48000
./bin/mysqld(my_print_stacktrace+0x2e) [0x998a1e]
./bin/mysqld(handle_fatal_signal+0x3f9) [0x71ede9]
/lib64/libpthread.so.0 [0x32bc80eb70]
./bin/mysqld [0x679018]
./bin/mysqld(JOIN::exec()+0x856) [0x68afb6]
./bin/mysqld(st_select_lex_unit::exec()+0x1b4) [0x768514]
./bin/mysqld(mysql_union(THD*, st_lex*, select_result*, st_select_lex_unit*, unsigned long)+0x2e) [0x76a91e]
./bin/mysqld(handle_select(THD*, st_lex*, select_result*, unsigned long)+0x82) [0x68d9e2]
./bin/mysqld [0x601e9e]
./bin/mysqld(mysql_execute_command(THD*)+0x3a43) [0x607833]
./bin/mysqld(mysql_parse(THD*, char*, unsigned int, char const**)+0x299) [0x60a629]
./bin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0xdf9) [0x60b879]
./bin/mysqld(do_command(THD*)+0x101) [0x60c0b1]
./bin/mysqld(handle_one_connection+0xfd) [0x5fd7bd]
/lib64/libpthread.so.0 [0x32bc80673d]
/lib64/libc.so.6(clone+0x6d) [0x32bbcd44bd]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
 
Query (0x2ab7c74140e8): /* comment */ 
(select fields from docs1 inner join  ( select id from docs1 left join lookup1 using (id) where ((look1.lan IN ('id1',...,'id2')) AND (lookup1.sdate <= 20130301)) group by id limit 100 ) AS tmp using (id)) 
UNION ALL 
(select fields from docs2 inner join  ( select id from docs2 left join lookup2 using (id) where ((lookup2.lan IN ('id1',...,'id2')) AND (lookup2.sdate >= 20120801)) group by id limit 100 ) AS tmp using (id)) limit 100
 
Connection ID (thread ID): 294799704
Status: KILL_CONNECTION
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,mate
rialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

Comment by Elena Stepanova [ 2013-10-06 ]

The failure appeared in 5.3 tree with the following revision:

revno: 3049 [merge]
revision-id: igor@askmonty.org-20110616044838-qt4dju9ikwomwhh9
timestamp: Wed 2011-06-15 21:48:38 -0700
message:
  Merge of mwl #106 into 5.3.

RQG grammar to reproduce (mdev714.yy):

query_init:
        CREATE TABLE IF NOT EXISTS `t1` (`i` TINYINT) ENGINE=InnoDB ; INSERT INTO `t` VALUES (1),(9),(5),(7),(5),(1),(NULL),(NULL),(8),(NULL) ;
 
thread2:
        KILL QUERY CONNECTION_ID() - 1 ;
 
query:
        DELETE FROM `t1` WHERE `i` BETWEEN _tinyint[invariant] AND _tinyint[invariant] + 2 |
        INSERT INTO `t1` (`i`) SELECT `i` FROM `t1` UNION SELECT `i` FROM `t1` |
        BEGIN ;
 

Command line (assuming the server is already running on port 3306):

perl ./gentest.pl --threads=2 --duration=300 --queries=100M --grammar=mdev714.yy --dsn=dbi:mysql:host=127.0.0.1:port=3306:user=root:database=test

Comment by Peter (Stig) Edwards [ 2013-11-21 ]

I was wondering if this bug title should have 5.3.7 removed from it as 5.5 and 10.0 are also affected? I know the "Affects Version/s:" list these versions but just thought the title alone could be misleading. Hit it again today. Thanks.

Comment by Sergei Golubchik [ 2013-11-21 ]

I think having 5.5 and 10.0 listed in "Affected Versions" is clear enough. But ok, you're the reporter — as you like. Changed.

Comment by Sergei Golubchik [ 2014-01-25 ]

elenst, I've tried your rqg grammar for both 5.3 and 5.5, debug and optimized builds. No crashes.

Comment by Elena Stepanova [ 2014-01-31 ]

Still crashes for me (tried current 5.5 tree).
I'll set up the test on perro.

Comment by Elena Stepanova [ 2014-01-31 ]

I've set it up on perro, for 5.3 and 5.5, current trees, debug builds.

  • log in as usual;
    cd mdev714
    . ./run53
    or
    . ./run55

Logs, including coredumps, are in ./vardir53 and ./vardir55, correspondingly.

Comment by Peter (Stig) Edwards [ 2014-02-14 ]

Thank you.
We are still running 5.3.12 and waiting for 10 to be GA before starting on a roll out.
With this fix and other fixes made in 5.3.13 since 5.3.12 was released, I am keen to go to 5.3.13 before starting on 10.
I was wondering if any of the fixes in 5.3.13 would result in a 5.3.13 release before April this year, or if I should go with the latest revision in the 5.3 branch that is successful in the buildbot.
Thanks again.

Comment by Sergei Golubchik [ 2014-02-17 ]

If you absolutely want to stay with 5.3, I'd suggest to grab the latest revision from buildbot.
But really, you should consider upgrading to 5.5. It may be problematic to upgrade directly to 10.0.

See, for example, MDEV-4860 — you cannot upgrade from InnoDB-5.1 to InnoDB-5.6 without upgrading to InnoDB-5.5 first. And MariaDB-5.3 uses InnoDB-5.1, MariaDB-10.0 uses InnoDB-5.6.

Generated at Thu Feb 08 06:30:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.