[MDEV-6970] MariaDB 10.0.13 crash while using partitioning Created: 2014-10-29  Updated: 2018-07-12  Resolved: 2018-07-12

Status: Closed
Project: MariaDB Server
Component/s: Partitioning
Affects Version/s: 10.0.13, 10.0
Fix Version/s: 10.0.36

Type: Bug Priority: Critical
Reporter: Sergii Assignee: Sergei Golubchik
Resolution: Fixed Votes: 5
Labels: None
Environment:

Oracle Linux 6.5 2.6.32-431.29.2.el6.x86_64
Server version: 10.0.13-MariaDB-wsrep-log MariaDB Server, wsrep_25.10.r4123


Attachments: File 02122014.sql     Text File 02122014.txt     Text File MySQLCurrentSettings.txt     File dump.sql     Text File log_event.txt     File mdev6464-my.cnf.single     File mdev6464-schema.sql     File mysql.err     Text File neominder_sqlcrashlog.txt     Text File procedures.txt     Text File show_create.txt     Text File show_index.txt     Text File show_status.txt    
Issue Links:
Relates
relates to MDEV-6464 MariaDB 10.0.12 crash while using Zabbix Closed
Sprint: 10.0.20, 10.0.26, 10.0.29

 Description   

141029  4:00:04 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,.
something is definitely wrong and this may fail.
 
Server version: 10.0.13-MariaDB-wsrep-log
key_buffer_size=2097152
read_buffer_size=2097152
max_used_connections=513
max_threads=514
thread_count=105
It is possible that mysqld could use up to.
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2118249 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x7f8c75a70008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f8b15707ce0 thread_stack 0x48000
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0xb9541b]
/usr/sbin/mysqld(handle_fatal_signal+0x398)[0x744b48]
/lib64/libpthread.so.0[0x379e80f710]
/usr/sbin/mysqld[0x94bc7e]
/usr/sbin/mysqld[0x8a25b7]
/usr/sbin/mysqld(_ZN7handler13ha_index_nextEPh+0x10c)[0x74987c]
/usr/sbin/mysqld(_ZN7handler15read_range_nextEv+0x29)[0x74a259]
/usr/sbin/mysqld[0xb6ffb5]
/usr/sbin/mysqld(_ZN7handler21multi_range_read_nextEPPv+0xb2)[0x6ca3a2]
/usr/sbin/mysqld(_ZN18QUICK_RANGE_SELECT8get_nextEv+0x52)[0x815632]
/usr/sbin/mysqld[0x8331d8]
/usr/sbin/mysqld(_Z10sub_selectP4JOINP13st_join_tableb+0x1b2)[0x602232]
/usr/sbin/mysqld[0x61931d]
/usr/sbin/mysqld(_ZN4JOIN10exec_innerEv+0x6da)[0x62be2a]
/usr/sbin/mysqld(_ZN4JOIN4execEv+0x11)[0x62e001]
/usr/sbin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x1dd)[0x62abcd]
/usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x28d)[0x62e35d]
/usr/sbin/mysqld[0x5d3329]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x543e)[0x5debee]
/usr/sbin/mysqld[0x5e1752]
/usr/sbin/mysqld[0x5e210b]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x16d0)[0x5e3c30]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x132)[0x5e4402]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x54b)[0x6a31cb]
/usr/sbin/mysqld(handle_one_connection+0x42)[0x6a32c2]
/lib64/libpthread.so.0[0x379e8079d1]
/lib64/libc.so.6(clone+0x6d)[0x379e4e886d]
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
141029 04:00:04 mysqld_safe Number of processes running now: 0
141029 04:00:04 mysqld_safe WSREP: not restarting wsrep node automatically
141029 04:00:04 mysqld_safe mysqld from pid file /nmc/mysql/mysql.pid ended



 Comments   
Comment by Sergii [ 2014-10-29 ]

Hi, I'm attached event, procedure that started at 4 am and file with variables that we used. Now we also have general log enabled for a logging new crashes.

Comment by Sergii [ 2014-11-03 ]

part of the general.log starting at 04:00:00 o'clock in the morning

Comment by Sergii [ 2014-11-03 ]

Hi, Elena.
I attached the log file with error and part of general.log while the crash appear.

Comment by Elena Stepanova [ 2014-11-04 ]

Hi Sergii,

Thank you.
So indeed, as discussed in MDEV-6464, there is a lot of activity at this time besides re-partitioning. Here is the query which apparently caused the crash:

SELECT * FROM history_uint h WHERE h.itemid='521083' AND h.clock>1414720810 ORDER BY h.clock DESC LIMIT 2 OFFSET 0

Possibly it's a race condition between SELECT and re-partitioning.
Interestingly, there is no re-partitioning in this log, I presume it's because of your Galera setup? You run the event on a different node?

Would you be able to provide a datadump (structure + data) of `history_uint` table? Preferably from the day when the crash happened, but if not – whichever you have.
If the data is too big or private, you can upload it to our ftp.askmonty.org/private.

Thanks.

Comment by Sergii [ 2014-11-05 ]

Hi Elena,

Now we use just one node. Also disable replication because we have problems with the launch event at the two nodes simultaneously. I can't attach a dump because partition size is too big.

Comment by Elena Stepanova [ 2014-11-05 ]

Hi Sergii,

Just for a note, you can easily solve the problem with launching event on several nodes by setting EVENT_SCHEDULER = OFF on all notes but one where you want it to be run.

It's strange though that there is no sign of the event activity in the general log you provided, I wonder how it can be if the even is run on this node...

If you can't provide the dump, please paste output of
SHOW CREATE TABLE history_uint (preferably from right after the crash, but if you don't have it, then the current one)
SHOW INDEX IN history_uint
SHOW TABLE STATUS LIKE 'history_uint'

Based on that, I'll try to generate artificial data.

Thanks.

Comment by Sergii [ 2014-11-07 ]

Hi Elena,

Thanks for the advice. I attached the file which you ask.

Comment by Elena Stepanova [ 2014-11-29 ]

Hi blacksmith,

Do you keep getting the crashes? If so, how often, and have you maybe noticed any other similarities?
So far I couldn't reproduce it, so whatever information you have that you think might be significant would be much appreciated.

Comment by Sergii [ 2014-12-01 ]

Hi Elena.
Now the situation has become worse. Crash occurs at a frequency several times a week. This is due to the increasing load on the base. I'm enabled general.log and based on its I will prepare statistics. Do you have any recommendations for us?

Also we have a new bug. Maybe it is not relevant to the current bug if so I will describe it in a new ticket:

2014-11-25 04:35:18 7f7a2e296700 InnoDB: error: return value 16 when calling
InnoDB: pthread_mutex_destroy().
InnoDB: Byte contents of the pthread mutex at 0x7f7c25ccb400:
len 48; hex 000091a8c8360000502530b40000ff71af06000000000091b4525464000000000000009e73e000000000000000000000; asc 6 P%0 q RTd s ;
2014-11-25 04:35:18 7f7a2e296700 InnoDB: Assertion failure in thread 140162737202944 in file os0sync.cc line 265
InnoDB: Failing assertion: pthread_cond_destroy(cond) == 0
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
141125 4:35:18 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.0.13-MariaDB-wsrep-log
key_buffer_size=2097152
read_buffer_size=2097152
max_used_connections=420
max_threads=514
thread_count=407
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2118249 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x7f7b4d148008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f7a2e295ce0 thread_stack 0x48000
Fatal signal 11 while backtracing
141125 04:35:22 mysqld_safe Number of processes running now: 0
141125 04:35:22 mysqld_safe mysqld restarted

Comment by Sergii [ 2014-12-02 ]

Hi Elena
I attached the log file with error and part of general.log while the crash appear. Time when the error occurred 04:00:04 02/12/2014.

If such data are necessary to solve our problem, I will collect them after each crash.

Comment by Elena Stepanova [ 2014-12-02 ]

blacksmith,

Does it always crash with "Failing assertion: pthread_cond_destroy(cond) == 0" now, or is it different each time?

Upd: I see in the attachment that the answer is "no", there it crashed with signal 11. Have you seen any other kinds of crashes, or is it always one of these two?

Comment by Sergii [ 2014-12-02 ]

Now I do not have any statistics about second crash (signal 6). it occurred only 1 time. I have not seen any other kinds of crashes. Only 2 that I've described.

Comment by Elena Stepanova [ 2014-12-02 ]

I see.
I checked the logs, and there is still the mystery that struck me last time. As I understand, you use a single node, and you run your nightly partitioning job on this node, but there is no sign of it in the general log... I wonder how it can be so. Could you please quote an earlier part of the error log (before the crash), where it says something – anything – about the event scheduler?

Also, do you see any sign in table structures that the partitioning job is executed as expected?

Comment by Sergii [ 2014-12-02 ]

I attached the log file with error which you ask.

Of course I see the results of partitioning. The event is fulfilled correctly. Results are expected and correct.

I am concerned about record:
141201 4:01:56 [ERROR] mysqld: Table './mysql/event' is marked as crashed and should be repaired
141201 4:01:56 [ERROR] mysqld: Table 'event' is marked as crashed and should be repaired
141201 4:01:56 [Warning] Checking table: './mysql/event'
141201 4:01:56 [ERROR] mysql.event: 1 client is using or hasn't closed the table properly

but event table are correct after start of service.

Comment by Elena Stepanova [ 2014-12-02 ]

It's expected, so to speak.
The table is marked as crashed because it wasn't closed properly (naturally, if the server crashed before while an event was being executed).
You have an automatic MyISAM recovery on server startup, so the table gets repaired, and you see it in a normal state after startup.

Finally found the event – in the last general log you attached it's the very first line, procedure call. And the previous log starts a second later, I guess the call just didn't make it there. One mystery solved.

Comment by Elena Stepanova [ 2014-12-02 ]

You have binary log enabled. Would you be able to provide the log which covers the same 5 seconds 141202 4:00:00 - 4:00:04 (or maybe starting a few seconds earlier, just in case)?
It can be done two ways. You can either upload the whole log which relates to that period to our ftp.askmonty.org/private, or you can run

mysqlbinlog --verbose --base64-output=DECODE-ROWS [--start-datetime='YYYY-MM-DD hh:mm:ss' --stop-datetime='YYYY-MM-DD hh:mm:ss'] <binlog name>

, check that you get the right interval and attach it (it should be very small since the most of activity is selects which don't go to the binlog).

The point of this exercise is to (try to) see which part of the event gets executed by the time of the crash, and maybe this way to narrow down the concurrency issue.

Comment by Sergii [ 2014-12-03 ]

Ok. I will try tomorrow to prepare the data which you ask.

Comment by Elena Stepanova [ 2015-02-11 ]

Hi blacksmith,

Did you ever get to collecting the data as above?
Are you still observing the problem?

Comment by Sergii [ 2015-03-02 ]

Hi, Elena.
The problem is still relevant. Now we just off the load on the database during partitioning and also I can't collect data because crashe doesn't appear due to load off

Comment by Elena Stepanova [ 2015-06-04 ]

From MDEV-6464:

We are using partitioning for some tables. Partitioning is started by event scheduler at 4 am. Procedure that we use, you can see following the link: https://www.zabbix.org/wiki/Docs/howto/mysql_partitioning

Comment by Elena Stepanova [ 2015-06-04 ]

Debug assertion failure:

Stack trace from 10.0 commit a99efc00a68

0.0/sql/ha_partition.cc:5090: bool ha_partition::init_record_priority_queue(): Assertion `!m_ordered_rec_buffer' failed.
150604  1:14:43 [ERROR] mysqld got signal 6 ;
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f1c2f8dc088): SELECT * FROM `history_uint` H WHERE H.`itemid`= '456' AND H.`clock` > /* -10 */ 1433369672 ORDER BY H.`clock` DESC LIMIT 2 OFFSET 0
Connection ID (thread ID): 7
Status: NOT_KILLED

# 2015-06-04T01:15:01 [29787] #6  0x00007f1c58eb9311 in *__GI___assert_fail (assertion=0x112ba11 "!m_ordered_rec_buffer", file=<optimized out>, line=5090, function=0x112e700 "bool ha_partition::init_record_priority_queue()") at assert.c:81
# 2015-06-04T01:15:01 [29787] #7  0x0000000000e1fda4 in ha_partition::init_record_priority_queue (this=0x7f1c2f87c888) at 10.0/sql/ha_partition.cc:5090
# 2015-06-04T01:15:01 [29787] #8  0x0000000000e20262 in ha_partition::index_init (this=0x7f1c2f87c888, inx=0, sorted=true) at 10.0/sql/ha_partition.cc:5216
# 2015-06-04T01:15:01 [29787] #9  0x0000000000614ffb in handler::ha_index_init (this=0x7f1c2f87c888, idx=0, sorted=true) at 10.0/sql/handler.h:2618
# 2015-06-04T01:15:01 [29787] #10 0x000000000099225e in QUICK_RANGE_SELECT::reset (this=0x7f1c2f810a00) at 10.0/sql/opt_range.cc:11693
# 2015-06-04T01:15:01 [29787] #11 0x000000000099ad44 in QUICK_SELECT_DESC::reset (this=0x7f1c2f810a00) at 10.0/sql/opt_range.h:975
# 2015-06-04T01:15:01 [29787] #12 0x00000000006df422 in join_init_read_record (tab=0x7f1c2f9ded20) at 10.0/sql/sql_select.cc:18684
# 2015-06-04T01:15:01 [29787] #13 0x00000000006dd45c in sub_select (join=0x7f1c2f8dd388, join_tab=0x7f1c2f9ded20, end_of_records=false) at 10.0/sql/sql_select.cc:17798
# 2015-06-04T01:15:01 [29787] #14 0x00000000006dcd23 in do_select (join=0x7f1c2f8dd388, fields=0x7f1c40a5a1d0, table=0x0, procedure=0x0) at 10.0/sql/sql_select.cc:17463
# 2015-06-04T01:15:01 [29787] #15 0x00000000006ba1ad in JOIN::exec_inner (this=0x7f1c2f8dd388) at 10.0/sql/sql_select.cc:3079
# 2015-06-04T01:15:01 [29787] #16 0x00000000006b76c2 in JOIN::exec (this=0x7f1c2f8dd388) at 10.0/sql/sql_select.cc:2368
# 2015-06-04T01:15:01 [29787] #17 0x00000000006baa57 in mysql_select (thd=0x7f1c40a56070, rref_pointer_array=0x7f1c40a5a330, tables=0x7f1c2f8dc368, wild_num=1, fields=..., conds=0x7f1c2f8dcfa0, og_num=1, order=0x7f1c2f8dd1e0, group=0x0, having=0x0, proc_param=0x0, select_options=2148829952, result=0x7f1c2f8dd368, unit=0x7f1c40a599c8, select_lex=0x7f1c40a5a0b8) at 10.0/sql/sql_select.cc:3308
# 2015-06-04T01:15:01 [29787] #18 0x00000000006b1041 in handle_select (thd=0x7f1c40a56070, lex=0x7f1c40a59900, result=0x7f1c2f8dd368, setup_tables_done_option=0) at 10.0/sql/sql_select.cc:373
# 2015-06-04T01:15:01 [29787] #19 0x00000000006856f3 in execute_sqlcom_select (thd=0x7f1c40a56070, all_tables=0x7f1c2f8dc368) at 10.0/sql/sql_parse.cc:5274
# 2015-06-04T01:15:01 [29787] #20 0x000000000067da24 in mysql_execute_command (thd=0x7f1c40a56070) at 10.0/sql/sql_parse.cc:2562
# 2015-06-04T01:15:01 [29787] #21 0x000000000068824b in mysql_parse (thd=0x7f1c40a56070, rawbuf=0x7f1c2f8dc088 "SELECT * FROM `history_uint` H WHERE H.`itemid`= '456' AND H.`clock` > /* -10 */ 1433369672 ORDER BY H.`clock` DESC LIMIT 2 OFFSET 0", length=132, parser_state=0x7f1c5b06c650) at 10.0/sql/sql_parse.cc:6529
# 2015-06-04T01:15:01 [29787] #22 0x000000000067ac37 in dispatch_command (command=COM_QUERY, thd=0x7f1c40a56070, packet=0x7f1c40a5c071 "", packet_length=134) at 10.0/sql/sql_parse.cc:1308
# 2015-06-04T01:15:01 [29787] #23 0x0000000000679f1d in do_command (thd=0x7f1c40a56070) at 10.0/sql/sql_parse.cc:999
# 2015-06-04T01:15:01 [29787] #24 0x0000000000797fdf in do_handle_one_connection (thd_arg=0x7f1c472e5070) at 10.0/sql/sql_connect.cc:1375
# 2015-06-04T01:15:01 [29787] #25 0x0000000000797d32 in handle_one_connection (arg=0x7f1c472e5070) at 10.0/sql/sql_connect.cc:1289
# 2015-06-04T01:15:01 [29787] #26 0x00007f1c5acb3b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
# 2015-06-04T01:15:01 [29787] #27 0x00007f1c58f6995d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

Comment by Elena Stepanova [ 2015-06-04 ]

Slightly different stack trace (same tree):

10.0/sql/ha_partition.cc:5090: bool ha_partition::init_record_priority_queue(): Assertion `!m_ordered_rec_buffer' failed.
150604  2:29:09 [ERROR] mysqld got signal 6 ;

# 2015-06-04T02:29:17 [2391] #6  0x00007fcb03c47311 in *__GI___assert_fail (assertion=0x112ba11 "!m_ordered_rec_buffer", file=<optimized out>, line=5090, function=0x112e700 "bool ha_partition::init_record_priority_queue()") at assert.c:81
# 2015-06-04T02:29:17 [2391] #7  0x0000000000e1fda4 in ha_partition::init_record_priority_queue (this=0x7fcad18ce888) at 10.0/sql/ha_partition.cc:5090
# 2015-06-04T02:29:17 [2391] #8  0x0000000000e20262 in ha_partition::index_init (this=0x7fcad18ce888, inx=0, sorted=true) at 10.0/sql/ha_partition.cc:5216
# 2015-06-04T02:29:17 [2391] #9  0x0000000000614ffb in handler::ha_index_init (this=0x7fcad18ce888, idx=0, sorted=true) at 10.0/sql/handler.h:2618
# 2015-06-04T02:29:17 [2391] #10 0x00000000006def39 in join_read_last_key (tab=0x7fcad1839bd0) at 10.0/sql/sql_select.cc:18567
# 2015-06-04T02:29:17 [2391] #11 0x00000000006dd45c in sub_select (join=0x7fcad1823278, join_tab=0x7fcad1839bd0, end_of_records=false) at 10.0/sql/sql_select.cc:17798
# 2015-06-04T02:29:17 [2391] #12 0x00000000006dcd23 in do_select (join=0x7fcad1823278, fields=0x7fcaf12311d0, table=0x0, procedure=0x0) at 10.0/sql/sql_select.cc:17463
# 2015-06-04T02:29:17 [2391] #13 0x00000000006ba1ad in JOIN::exec_inner (this=0x7fcad1823278) at 10.0/sql/sql_select.cc:3079
# 2015-06-04T02:29:17 [2391] #14 0x00000000006b76c2 in JOIN::exec (this=0x7fcad1823278) at 10.0/sql/sql_select.cc:2368
# 2015-06-04T02:29:17 [2391] #15 0x00000000006baa57 in mysql_select (thd=0x7fcaf122d070, rref_pointer_array=0x7fcaf1231330, tables=0x7fcad18222f8, wild_num=1, fields=..., conds=0x7fcad1822f28, og_num=1, order=0x7fcad1823168, group=0x0, having=0x0, proc_param=0x0, select_options=2148797184, result=0x7fcad1823258, unit=0x7fcaf12309c8, select_lex=0x7fcaf12310b8) at 10.0/sql/sql_select.cc:3308
# 2015-06-04T02:29:17 [2391] #16 0x00000000006b1041 in handle_select (thd=0x7fcaf122d070, lex=0x7fcaf1230900, result=0x7fcad1823258, setup_tables_done_option=0) at 10.0/sql/sql_select.cc:373
# 2015-06-04T02:29:17 [2391] #17 0x00000000006856f3 in execute_sqlcom_select (thd=0x7fcaf122d070, all_tables=0x7fcad18222f8) at 10.0/sql/sql_parse.cc:5274
# 2015-06-04T02:29:17 [2391] #18 0x000000000067da24 in mysql_execute_command (thd=0x7fcaf122d070) at 10.0/sql/sql_parse.cc:2562
# 2015-06-04T02:29:17 [2391] #19 0x000000000068824b in mysql_parse (thd=0x7fcaf122d070, rawbuf=0x7fcad1822088 "SELECT * FROM `t1` WHERE `id`= 233 AND `tm` > 1433374143 ORDER BY `tm` DESC LIMIT 2", length=83, parser_state=0x7fcb05d2c650) at 10.0/sql/sql_parse.cc:6529
# 2015-06-04T02:29:17 [2391] #20 0x000000000067ac37 in dispatch_command (command=COM_QUERY, thd=0x7fcaf122d070, packet=0x7fcaf1239071 "", packet_length=85) at 10.0/sql/sql_parse.cc:1308
# 2015-06-04T02:29:17 [2391] #21 0x0000000000679f1d in do_command (thd=0x7fcaf122d070) at 10.0/sql/sql_parse.cc:999
# 2015-06-04T02:29:17 [2391] #22 0x0000000000797fdf in do_handle_one_connection (thd_arg=0x7fcaf122d070) at 10.0/sql/sql_connect.cc:1375
# 2015-06-04T02:29:17 [2391] #23 0x0000000000797d32 in handle_one_connection (arg=0x7fcaf122d070) at 10.0/sql/sql_connect.cc:1289
# 2015-06-04T02:29:17 [2391] #24 0x00007fcb05a41b50 in start_thread (arg=<optimized out>) at pthread_create.c:304

Comment by Elena Stepanova [ 2015-06-04 ]

I also got the crash on a release server (10.0.12) once, it looks similar to the initially reported one, although not identical; but on some reason it happens extremely rarely for me, while debug assertions are much more frequent.

Version: '10.0.12-MariaDB'  socket: '/tmp/mysql.sock'  port: 3306  MariaDB Server
150603  8:13:00 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.0.12-MariaDB
key_buffer_size=2097152
read_buffer_size=2097152
max_used_connections=9
max_threads=153
thread_count=11
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 945224 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x7f5934e67008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f59d7b22e40 thread_stack 0x48000
mysys/stacktrace.c:247(my_print_stacktrace)[0xbb674e]
sql/signal_handler.cc:153(handle_fatal_signal)[0x7152ec]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0)[0x7f59da1b90a0]
include/que0que.ic:63(que_fork_get_first_thr)[0x9f6fb0]
handler/ha_innodb.cc:8648(ha_innobase::general_fetch(unsigned char*, unsigned int, unsigned int))[0x952528]
sql/handler.cc:2646(handler::ha_index_prev(unsigned char*))[0x717eac]
sql/ha_partition.cc:6406(ha_partition::handle_ordered_prev(unsigned char*))[0xb46e83]
sql/handler.cc:2646(handler::ha_index_prev(unsigned char*))[0x717eac]
sql/opt_range.cc:11916(QUICK_SELECT_DESC::get_next())[0x81c224]
sql/records.cc:346(rr_quick)[0x8369d5]
sql/sql_select.cc:17613(sub_select(JOIN*, st_join_table*, bool))[0x5d4be7]
sql/sql_select.cc:17255(do_select)[0x5d52ed]
sql/sql_select.cc:3077(JOIN::exec_inner())[0x5e829c]
sql/sql_select.cc:2368(JOIN::exec())[0x5e6b0a]
sql/sql_select.cc:372(handle_select(THD*, LEX*, select_result*, unsigned long))[0x5ea71d]
sql/sql_parse.cc:5264(execute_sqlcom_select)[0x591180]
sql/sql_parse.cc:2554(mysql_execute_command(THD*))[0x593f17]
sql/sql_parse.cc:6409(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x59b2da]
sql/sql_parse.cc:1311(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x59d087]
sql/sql_parse.cc:1006(do_command(THD*))[0x59d7a5]
sql/sql_connect.cc:1379(do_handle_one_connection(THD*))[0x664874]
sql/sql_connect.cc:1295(handle_one_connection)[0x6649b2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50)[0x7f59da1b0b50]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f59d912c95d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f5932c2f020): SELECT * FROM `history_uint` H WHERE H.`itemid`= '31' AND H.`clock` > 1433308320 ORDER BY H.`clock` DESC LIMIT 2 OFFSET 0
Connection ID (thread ID): 16
Status: NOT_KILLED

Comment by Elena Stepanova [ 2015-06-04 ]

Concurrent tests to reproduce debug assertion failures above:

For "in join_read_last_key" variation:

RQG grammar 1.yy

thread1_init:
	SET GLOBAL query_cache_size = 1024*1024*128
	; DROP DATABASE `test`; CREATE DATABASE `test`; USE `test`
	; { $first_partition = 0; $last_partition = 1; '' } 
	CREATE TABLE `t1` (`id` INT, `tm` INT, INDEX (`id`,`tm`)) ENGINE=InnoDB 
	PARTITION BY RANGE (`tm`) (PARTITION p0 VALUES LESS THAN (UNIX_TIMESTAMP('2015-01-01 00:00:00')),
	PARTITION p1 VALUES LESS THAN (UNIX_TIMESTAMP('2015-01-01 00:00:10'))) ;
 
thread2:
	INSERT INTO `t1` VALUES values, values, values, values, values, values, values, values, values, values ;
 
thread1:
	ALTER TABLE `t1` DROP PARTITION { 'p'.($first_partition++) }; create_partition { sleep 5; '' }; 
 
thread3:
	BEGIN ; select ; select ; select ; select ; select ; select ; select ; select ; select ; select ; COMMIT ;
 
values:
	( item_id , { int(time()) } );
 
select:
	SELECT * FROM `t1` WHERE `id`= item_id AND `tm` > { int(time()-5) } ORDER BY `tm` DESC LIMIT 2 ;
 
create_partition:
	SET @stmt = CONCAT('ALTER TABLE `t1` ADD PARTITION (
	PARTITION { 'p'.(++$last_partition) } VALUES LESS THAN (',UNIX_TIMESTAMP(),') ENGINE = InnoDB)')
	; PREPARE stmt FROM @stmt; EXECUTE stmt; DEALLOCATE PREPARE stmt;
 
item_id:
	{ int(rand(500)) } ;

RQG command line

perl ./runall-new.pl --skip-gendata --threads=3 --grammar=1.yy --queries=100M --duration=3600 --basedir=<your basedir> --vardir=<your vardir>

For "in QUICK_RANGE_SELECT::reset" variation:

thread1_init:
		CREATE TABLE `history_uint` (`itemid` BIGINT UNSIGNED NOT NULL, `clock` INT DEFAULT '0'NOT NULL, INDEX `history_uint_1` (`itemid`,`clock`)) ENGINE=InnoDB
	; SET @stmt = CONCAT('ALTER TABLE `history_uint` PARTITION BY RANGE (`clock`) (PARTITION p0 VALUES LESS THAN (UNIX_TIMESTAMP("2015-01-01 00:00:00")) ENGINE = InnoDB, 
	PARTITION pnow VALUES LESS THAN (',UNIX_TIMESTAMP(),') ENGINE = InnoDB)')
	; PREPARE stmt FROM @stmt; EXECUTE stmt; DEALLOCATE PREPARE stmt; SET NAMES UTF8;
 
thread2_init:
	{ sleep 5; '' };
 
thread2:
	BEGIN ; insert ; insert ; insert ; insert ; insert ; COMMIT ;
 
thread3:
	CALL test.drop_partitions(); CALL test.create_partition(); { sleep 10; '' }; 
 
thread1:
	BEGIN ; select ; select ; select ; select ; select ; select ; select ; select ; select ; select ; COMMIT ;
 
insert:
	INSERT INTO `history_uint` VALUES values10;
 
values10:
	values, values, values, values, values, values, values, values, values, values ;
 
values:
	( { int(rand(500)) }, { int(time()) } );
 
select:
	SELECT * FROM `history_uint` H WHERE H.`itemid`= { "'".int(rand(500))."'" } AND H.`clock` > select_time ORDER BY H.`clock` DESC LIMIT 2 OFFSET 0 ;
 
select_time:
	/* -10 */ { int(time() - 10) } ;

Initial SQL 2.sql

DROP DATABASE IF EXISTS test; 
CREATE DATABASE test; 
USE test;
 
DELIMITER $$
 
CREATE PROCEDURE `create_partition`()
BEGIN
    DECLARE ROWS_CNT INT UNSIGNED;
    DECLARE BEGINTIME TIMESTAMP;
        DECLARE ENDTIME INT UNSIGNED;
        DECLARE PARTITIONNAME VARCHAR(64);
        SET BEGINTIME = NOW() + INTERVAL 10 SECOND;
        SET PARTITIONNAME = DATE_FORMAT( BEGINTIME, 'p%Y_%m_%d_%H_%i_%s' );
        SET ENDTIME = UNIX_TIMESTAMP(BEGINTIME + INTERVAL 10 SECOND);
        SELECT COUNT(*) INTO ROWS_CNT
                FROM information_schema.partitions
                WHERE table_schema = 'test' AND table_name = 'history_uint' AND partition_name = PARTITIONNAME;
    IF ROWS_CNT = 0 THEN
                     SET @SQL = CONCAT( 'ALTER TABLE `test`.`history_uint`',
                                ' ADD PARTITION (PARTITION ', PARTITIONNAME, ' VALUES LESS THAN (', ENDTIME, '));' );
                PREPARE STMT FROM @SQL;
                EXECUTE STMT;
                DEALLOCATE PREPARE STMT;
        ELSE
        SELECT 'partition already exists' AS result;
        END IF;
END$$
DELIMITER ;
 
----------------------------------------------------------------------------------------------------------------
 
----------------------------------------------------------------------------------------------------------------
 
DELIMITER $$
USE `test`$$
DROP PROCEDURE IF EXISTS `drop_partitions`$$
CREATE PROCEDURE `drop_partitions`()
BEGIN
    DECLARE PARTITIONNAME_TMP VARCHAR(64);
    DECLARE VALUES_LESS_TMP INT;
    DECLARE KEEP_HISTORY_BEFORE INT;
    DECLARE DONE INT DEFAULT 0;
    DECLARE get_partitions CURSOR FOR
        SELECT p.`partition_name`, LTRIM(RTRIM(p.`partition_description`))
            FROM information_schema.partitions p
            WHERE p.table_schema = 'test' AND p.table_name = 'history_uint'
            ORDER BY p.table_name, p.subpartition_ordinal_position;
     DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
    OPEN get_partitions;
    loop_check_prt: LOOP
        IF DONE THEN
            LEAVE loop_check_prt;
        END IF;
        FETCH get_partitions INTO PARTITIONNAME_TMP, VALUES_LESS_TMP;
        SET KEEP_HISTORY_BEFORE = UNIX_TIMESTAMP(NOW() - INTERVAL 5 MINUTE);
  IF KEEP_HISTORY_BEFORE >= VALUES_LESS_TMP THEN
                CALL drop_old_partition(PARTITIONNAME_TMP);
        END IF;
        END LOOP loop_check_prt;
        CLOSE get_partitions;
END$$
DELIMITER ;
 
----------------------------------------------------------------------------------------------------------------
 
DELIMITER $$
USE `test`$$
DROP PROCEDURE IF EXISTS `drop_old_partition`$$
CREATE PROCEDURE `drop_old_partition`(IN_PARTITIONNAME VARCHAR(64))
BEGIN
    DECLARE ROWS_CNT INT UNSIGNED;
        SELECT COUNT(*) INTO ROWS_CNT
                FROM information_schema.partitions
                WHERE table_schema = 'test' AND table_name = 'history_uint' AND partition_name = IN_PARTITIONNAME;
    IF ROWS_CNT = 1 THEN
                     SET @SQL = CONCAT( 'ALTER TABLE `test`.`history_uint`',
                                ' DROP PARTITION ', IN_PARTITIONNAME, ';' );
                PREPARE STMT FROM @SQL;
                EXECUTE STMT;
                DEALLOCATE PREPARE STMT;
        ELSE
        SELECT 'partition does not exist' AS result;
        END IF;
END$$
DELIMITER ;

  • Start server with --query-cache-size=128M;
  • Feed 2.sql to it;
  • run (assuming your server is running on port 3306, otherwise modify dsn)

    perl ./gentest.pl --dsn="dbi:mysql:host=127.0.0.1:port=3306:user=root:database=test" --threads=3 --grammar=2.yy --queries=100M --duration=3600

Comment by Elena Stepanova [ 2015-06-04 ]

cvicentiu,

I'm assigning it to you thinking there might be a slim chance it's related to MDEV-6500 you are working on now (query cache, partitions, something that should or shouldn't be there...).
I could not reproduce it on 5.5, so it's a task for 10.0. When its time comes, please check if your fix for MDEV-6500 fixes it also. If not, you can choose either to fix it separately, or to return it to me (or re-assign to whoever you think it belongs).

To whoever ends up fixing it:

Please pay special attention to the stack traces.
It is important to understand whether all of these are the same problem:
1) release stack trace from the bug description;
2) debug assertion with stack trace 1;
3) debug assertion with stack trace 2;
4) another release stack trace.

If they are not the same, and you are not sure you have fixed release stack trace(s) by fixing debug assertion failures, please do not close the bug report but return it to me instead for further testing.

Also, please take a look at the one-time failure described in this comment. If it's unrelated (which it probably is), please comment saying so, so that we know it did not get fixed along with the primary problem.

Comment by Thomas Daugherty [ 2015-07-11 ]

Just wanted to comment that I'm having this same issue. Stored procedures to drop/create partitions runs once per day. A few times a week a node will crash within a minute of the procedure running. This is a database for a Zabbix install, I'm using the stored procedures referenced above. Please let me know if there are any data you'd like me to submit.

Edit: Using 10.0.16

Comment by Jeff Gilbertson [ 2015-07-15 ]

Just a comment to say that I'm having the issue, too, exactly as described above.

Comment by DD Brown [ 2015-07-15 ]

This seems to be coming more of an issue in our environment. What started out as once every couple of months seems to be increasing in frequency. It is now happening several times a week.

Comment by Thomas Daugherty [ 2015-07-30 ]

elenst and cvicentiu,

Would it be possible to get this bug looked at again? It's really causing havoc for us. Its happening twice a week or more right after the create/drop partition stored procedure runs.

neominder_sqlcrashlog.txt

Comment by Eugen O [ 2015-08-29 ]

Hi,
I've hit the same situation, stack trace looks like this :

150827  5:28:54 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.0.20-MariaDB-log
key_buffer_size=536870912
read_buffer_size=131072
max_used_connections=62
max_threads=1502
thread_count=34
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3822319 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x19a89bf8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f8e7a7dcd78 thread_stack 0x48000
mysys/stacktrace.c:247(my_print_stacktrace)[0xb81c6e]
sql/signal_handler.cc:153(handle_fatal_signal)[0x726cab]
/lib64/libpthread.so.0(+0xf710)[0x7f8eafa8d710]
sql/sql_plist.h:128(I_P_List<MDL_ticket, I_P_List_adapter<MDL_ticket, &(MDL_ticket::next_in_lock), &(MDL_ticket::prev_in_lock)>, I_P_List_null_counter, I_P_List_fast_push_back<MDL_ticket> >::remove(MDL_ticket*))[0x68c800]
sql/sql_plist.h:126(I_P_List<MDL_ticket, I_P_List_adapter<MDL_ticket, &(MDL_ticket::next_in_context), &(MDL_ticket::prev_in_context)>, I_P_List_null_counter, I_P_List_no_push_back<MDL_ticket> >::remove(MDL_ticket*))[0x68c968]
sql/sql_parse.cc:5196(mysql_execute_command(THD*))[0x5acb05]
sql/sql_parse.cc:6529(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x5b325d]
sql/sql_parse.cc:1310(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x5b4f49]
sql/sql_connect.cc:1378(do_handle_one_connection(THD*))[0x68218e]
sql/sql_connect.cc:1295(handle_one_connection)[0x682262]
/lib64/libpthread.so.0(+0x79d1)[0x7f8eafa859d1]
/lib64/libc.so.6(clone+0x6d)[0x7f8eae19f8fd]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f8b14004c60): is an invalid pointer
Connection ID (thread ID): 1357300
 
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
150827 05:28:55 mysqld_safe Number of processes running now: 0
150827 05:28:55 mysqld_safe mysqld restarted
150827  5:28:56 [Note] /usr/sbin/mysqld (mysqld 10.0.20-MariaDB-log) starting as process 25031 ...
150827  5:28:56 [Note] InnoDB: Using mutexes to ref count buffer pool pages
150827  5:28:56 [Note] InnoDB: The InnoDB memory heap is disabled
150827  5:28:56 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
150827  5:28:56 [Note] InnoDB: Memory barrier is not used
150827  5:28:56 [Note] InnoDB: Compressed tables use zlib 1.2.3
150827  5:28:56 [Note] InnoDB: Using Linux native AIO
150827  5:28:56 [Note] InnoDB: Using CPU crc32 instructions
150827  5:28:56 [Note] InnoDB: Initializing buffer pool, size = 12.0G
150827  5:28:56 [Note] InnoDB: Completed initialization of buffer pool
150827  5:28:57 [Note] InnoDB: Highest supported file format is Barracuda.
150827  5:28:57 [Note] InnoDB: Log scan progressed past the checkpoint lsn 114661609834
150827  5:28:57 [Note] InnoDB: Database was not shutdown normally!
150827  5:28:57 [Note] InnoDB: Starting crash recovery.
150827  5:28:57 [Note] InnoDB: Reading tablespace information from the .ibd files...
150827  5:28:58 [Note] InnoDB: Restoring possible half-written data pages
150827  5:28:58 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Doing recovery: scanned up to log sequence number 114661791274
150827  5:28:59 [Note] InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percent: 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
150827  5:29:00 [Note] InnoDB: 128 rollback segment(s) are active.
150827  5:29:00 [Note] InnoDB: Waiting for purge to start
150827  5:29:00 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.24-72.2 started; log sequence number 114661791274
150827  5:29:00 [Note] Plugin 'FEEDBACK' is disabled.
150827  5:29:00 [Note] Server socket created on IP: '::'.
150827  5:29:00 [ERROR] mysqld: Table './mysql/event' is marked as crashed and should be repaired
150827  5:29:00 [Warning] Checking table:   './mysql/event'
150827  5:29:00 [ERROR] mysql.event: 1 client is using or hasn't closed the table properly
150827  5:29:00 [Note] Event Scheduler: Loaded 1 event
150827  5:29:00 [Note] Event Scheduler: scheduler thread started with id 1
150827  5:29:00 [Note] /usr/sbin/mysqld: ready for connections.
 

At that hour partitioning was occuring. Let me know if anything else is needed.

Comment by Rick James (Inactive) [ 2015-10-20 ]

Perhaps not important, but note the lack of an explicit PRIMARY KEY.

Comment by Sergei Golubchik [ 2018-07-12 ]

Both tests (run1 and run2) crashes quickly on commit 9a3ff0789f (when the tests were written), but do not crash on the latest 10.0 (tests run for an hour and exit successfully).

Apparently the crash was fixed meanwhile, so closing.

Generated at Thu Feb 08 07:15:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.