[MDEV-10419] crash in mariadb 10.1.16-MariaDB-1~trusty Created: 2016-07-22  Updated: 2016-09-02  Resolved: 2016-07-30

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.1.16, 5.5, 10.0, 10.1
Fix Version/s: 5.5.51, 10.1.17, 10.0.27, 10.2.2

Type: Bug Priority: Major
Reporter: Sander Pilon Assignee: Oleksandr Byelkin
Resolution: Fixed Votes: 0
Labels: None

Attachments: File dump.sql    
Sprint: 10.2.2-3

 Description   

After an upgrade to 10.1.16, MariaDB started crashing (again) ... a lot.

160722 23:14:03 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.1.16-MariaDB-1~trusty
key_buffer_size=16777216
read_buffer_size=131072
max_used_connections=23
max_threads=502
thread_count=13
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1118998 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x7f57a83d6008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f61375fedf0 thread_stack 0x80000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f65391b28fe]
/usr/sbin/mysqld(handle_fatal_signal+0x2d5)[0x7f6538cd9e75]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f653722f330]
/usr/sbin/mysqld(+0x4812d4)[0x7f6538baa2d4]
/usr/sbin/mysqld(_ZN13st_select_lex5printEP3THDP6String15enum_query_type+0x4b2)[0x7f6538baac02]
/usr/sbin/mysqld(_ZN14Item_subselect5printEP6String15enum_query_type+0x5f)[0x7f6538d69faf]
/usr/sbin/mysqld(_ZN22Item_func_conv_charset5printEP6String15enum_query_type+0x44)[0x7f6538d57e64]
/usr/sbin/mysqld(_ZN9Item_func10print_argsEP6Stringj15enum_query_type+0x4f)[0x7f6538d3a9df]
/usr/sbin/mysqld(_ZN9Item_func5printEP6String15enum_query_type+0x4d)[0x7f6538d3aabd]
/usr/sbin/mysqld(_ZN9Item_func10print_argsEP6Stringj15enum_query_type+0x4f)[0x7f6538d3a9df]
/usr/sbin/mysqld(_ZN9Item_func5printEP6String15enum_query_type+0x4d)[0x7f6538d3aabd]
/usr/sbin/mysqld(_ZN4Item17print_item_w_nameEP6String15enum_query_type+0x1c)[0x7f6538cf0fdc]
/usr/sbin/mysqld(_ZN13st_select_lex5printEP3THDP6String15enum_query_type+0x226)[0x7f6538baa976]
/usr/sbin/mysqld(_ZN18st_select_lex_unit5printEP6String15enum_query_type+0x57)[0x7f6538b4c267]
/usr/sbin/mysqld(_Z29mysqld_show_create_get_fieldsP3THDP10TABLE_LISTP4ListI4ItemEP6String+0x302)[0x7f6538bb9eb2]
/usr/sbin/mysqld(_Z18mysqld_show_createP3THDP10TABLE_LIST+0xc0)[0x7f6538bba8e0]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x3fbc)[0x7f6538b5aecc]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x26d)[0x7f6538b6073d]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x2460)[0x7f6538b63a80]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x169)[0x7f6538b64239]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x18a)[0x7f6538c2889a]
/usr/sbin/mysqld(handle_one_connection+0x40)[0x7f6538c28a70]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7f6537227184]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f653674637d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f57a70fe020): is an invalid pointer
Connection ID (thread ID): 478

For my config, see MDEV-9674.

Downgrading to 10.1.14 solved the crashing.



 Comments   
Comment by Elena Stepanova [ 2016-07-22 ]

CrewOne,

What exactly do you mean by started again? Did MDEV-9674 stop happening for you at some point, or did you downgrade to some other version and now upgraded again?
From which version did you upgrade now? Did anything change besides the version – work flow, configuration, dataset?

I think at some point you enabled general_log, do you still have it enabled? If so, can you find there the last query from connection 478 right before the crash?

Comment by Sander Pilon [ 2016-07-22 ]

Hi Elena,

We have been crash-free for a while now. I mean, a few months. I think you yourself found the offending query in MDEV-9674, we removed all queries like that in the code and it has been more or less fine since then. Is it fixed? Probably not, I guess. But we have a work-around in place.

I'm afraid that I can't replicate this bug for you at the moment. By that I mean that I have no resources to replicate a server, and that the crashing is immediate and severe so I daren't upgrade to 10.1.16 again at the moment.
(Downgrading to 10.1.14 was quite a pain as well... manually downloading all the packages and installing them.)

Nothing changed, all I did was an "apt-get upgrade" and after that it started crashing immediately. Really nothing changed since the end of 2016, as far as server setup and config files goes.

And with again I also mean MDEV-9560 and MDEV-9728

Comment by Elena Stepanova [ 2016-07-22 ]

CrewOne, thanks for clarification.

Regarding downgrade, for future, our mirrors usually have repos for several previous versions. E.g. if you normally use repo
http://mirror.netinch.com/pub/mariadb/repo/10.1/ubuntu
then 10.1.14 could be installed from
http://mirror.netinch.com/pub/mariadb/mariadb-10.1.14/repo/ubuntu/

I can't promise it applies to all mirrors, but it should work for most.
Hopefully you'll never need it again, but in case you do, maybe it will smooth things a bit.

Comment by Elena Stepanova [ 2016-07-23 ]

CrewOne,

Would you be able to provide the schemata dump (structures only, no data)? Something like mysqldump -A -d -R --triggers – it shouldn't be too big.

Also, could you please attach the error log or at least check yourself if there is anything else there after the Connection ID line? There has been a change recently so that the server might print something more useful than invalid pointer. It might look strange or even corrupted, but it's better to have it anyway.

Comment by Elena Stepanova [ 2016-07-28 ]

CrewOne, thank you very much for the dump, I found the view and was able to reproduce the problem.

The culprit was the view which have a specific definition with a subquery and references a non-existing function. It is not an excuse for a crash of course, it should just throw a warning.

The problem appeared in 5.5 tree (and was later merged up) with the following revision:

commit 79f852a069fb6ba5e18fd66ea2a24fa91c245c24
Author: Oleksandr Byelkin <sanja@mariadb.com>
Date:   Wed Jun 22 14:17:06 2016 +0200
 
    MDEV-10050: Crash in subselect
    
    thd should not be taken earlier then fix_field and reset on fix_fields if it is needed.

Test case

CREATE TABLE t1 (c1 CHAR(13));
CREATE TABLE t2 (c2 CHAR(13));
 
CREATE FUNCTION f() RETURNS INT RETURN 0;
CREATE OR REPLACE VIEW v1 AS select f() from t1 where c1 in (select c2 from t2);
DROP FUNCTION f;
 
SHOW CREATE VIEW v1;

Stack trace from 5.5 commit 1b5da2ca49f69605ccfe4d98e9207e7b8551e21f

#3  <signal handler called>
#4  0x000000000056b6b6 in Query_arena::alloc (this=0xa5a5a5a5a5a5a5bd, size=8) at /data/src/5.5/sql/sql_class.h:771
#5  0x000000000066b297 in print_join (thd=0xa5a5a5a5a5a5a5a5, eliminated_tables=0, str=0x7f03712de320, tables=0x7f0369db65d8, query_type=QT_ORDINARY) at /data/src/5.5/sql/sql_select.cc:23169
#6  0x000000000066bfa7 in st_select_lex::print (this=0x7f0369db6460, thd=0xa5a5a5a5a5a5a5a5, str=0x7f03712de320, query_type=QT_ORDINARY) at /data/src/5.5/sql/sql_select.cc:23449
#7  0x000000000084e022 in subselect_single_select_engine::print (this=0x7f0369d98810, str=0x7f03712de320, query_type=QT_ORDINARY) at /data/src/5.5/sql/item_subselect.cc:3749
#8  0x000000000084635e in Item_subselect::print (this=0x7f0369d98660, str=0x7f03712de320, query_type=QT_ORDINARY) at /data/src/5.5/sql/item_subselect.cc:891
#9  0x000000000084b4ac in Item_in_subselect::print (this=0x7f0369d98660, str=0x7f03712de320, query_type=QT_ORDINARY) at /data/src/5.5/sql/item_subselect.cc:2649
#10 0x000000000066c060 in st_select_lex::print (this=0x7f0369d49048, thd=0x7f036ae74060, str=0x7f03712de320, query_type=QT_ORDINARY) at /data/src/5.5/sql/sql_select.cc:23468
#11 0x00000000005f735e in st_select_lex_unit::print (this=0x7f0369d48968, str=0x7f03712de320, query_type=QT_ORDINARY) at /data/src/5.5/sql/sql_lex.cc:2388
#12 0x0000000000679deb in view_store_create_info (thd=0x7f036ae74060, table=0x7f0369d48150, buff=0x7f03712de320) at /data/src/5.5/sql/sql_show.cc:2087
#13 0x0000000000676880 in mysqld_show_create (thd=0x7f036ae74060, table_list=0x7f0369d48150) at /data/src/5.5/sql/sql_show.cc:1032
#14 0x00000000006042ea in mysql_execute_command (thd=0x7f036ae74060) at /data/src/5.5/sql/sql_parse.cc:2808
#15 0x000000000060c692 in mysql_parse (thd=0x7f036ae74060, rawbuf=0x7f0369d48078 "SHOW CREATE VIEW v1", length=19, parser_state=0x7f03712df650) at /data/src/5.5/sql/sql_parse.cc:5934
#16 0x00000000006003a7 in dispatch_command (command=COM_QUERY, thd=0x7f036ae74060, packet=0x7f036bb4e061 "SHOW CREATE VIEW v1", packet_length=19) at /data/src/5.5/sql/sql_parse.cc:1079
#17 0x00000000005ff561 in do_command (thd=0x7f036ae74060) at /data/src/5.5/sql/sql_parse.cc:793
#18 0x00000000007016df in do_handle_one_connection (thd_arg=0x7f036ae74060) at /data/src/5.5/sql/sql_connect.cc:1270
#19 0x000000000070146c in handle_one_connection (arg=0x7f036ae74060) at /data/src/5.5/sql/sql_connect.cc:1186
#20 0x00000000009435ab in pfs_spawn_thread (arg=0x7f036bb7a300) at /data/src/5.5/storage/perfschema/pfs.cc:1015
#21 0x00007f0370f1b0a4 in start_thread (arg=0x7f03712e0700) at pthread_create.c:309
#22 0x00007f036f34187d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Comment by Sander Pilon [ 2016-07-28 ]

Ok... so which view/function should we delete and would you say it's safe to upgrade after we remove them?

Comment by Elena Stepanova [ 2016-07-28 ]

CrewOne,

The view that I found and used was `Manpo`.`Boeken_in_ramsj`, and the missing function was `calc_prijs`. But I see that some other views reference a function of the same name (although maybe from a different schema) as well, and in general the schema has hundreds of views, with complicated structure and lots of referenced objects. If one is missing, I suppose it's possible that some others are missing, or can gone missing during the operation. So, I don't really think it's safe to upgrade after removing this view alone. If there is no pressing need to upgrade, and 10.1.14 works well for you, I'd rather wait. If you do need to upgrade asap, I can try to see which views are affected in the dump, but there will be no guarantee that some other views won't become affected if yet another object is dropped later.

Comment by Sander Pilon [ 2016-07-28 ]

Elena,

Thanks. I'll wait

Comment by Oleksandr Byelkin [ 2016-07-29 ]

revision-id: b5327887096f9f98a87d724163402748ca4f1fb8 (mariadb-5.5.50-15-gb532788)
parent(s): 15ef38d2ea97575c71b83db6669ee20000c23a6b
committer: Oleksandr Byelkin
timestamp: 2016-07-29 18:21:08 +0200
message:

MDEV-10419: crash in mariadb 10.1.16-MariaDB-1~trusty

Fixed initialization and usage of THD reference in subselect engines.

Comment by Sergei Petrunia [ 2016-07-30 ]

sanja, ok to push

Generated at Thu Feb 08 07:42:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.