|
Hi, I'm attached event, procedure that started at 4 am and file with variables that we used. Now we also have general log enabled for a logging new crashes.
|
|
part of the general.log starting at 04:00:00 o'clock in the morning
|
|
Hi, Elena.
I attached the log file with error and part of general.log while the crash appear.
|
|
Hi Sergii,
Thank you.
So indeed, as discussed in MDEV-6464, there is a lot of activity at this time besides re-partitioning. Here is the query which apparently caused the crash:
SELECT * FROM history_uint h WHERE h.itemid='521083' AND h.clock>1414720810 ORDER BY h.clock DESC LIMIT 2 OFFSET 0
|
Possibly it's a race condition between SELECT and re-partitioning.
Interestingly, there is no re-partitioning in this log, I presume it's because of your Galera setup? You run the event on a different node?
Would you be able to provide a datadump (structure + data) of `history_uint` table? Preferably from the day when the crash happened, but if not – whichever you have.
If the data is too big or private, you can upload it to our ftp.askmonty.org/private.
Thanks.
|
|
Hi Elena,
Now we use just one node. Also disable replication because we have problems with the launch event at the two nodes simultaneously. I can't attach a dump because partition size is too big.
|
|
Hi Sergii,
Just for a note, you can easily solve the problem with launching event on several nodes by setting EVENT_SCHEDULER = OFF on all notes but one where you want it to be run.
It's strange though that there is no sign of the event activity in the general log you provided, I wonder how it can be if the even is run on this node...
If you can't provide the dump, please paste output of
SHOW CREATE TABLE history_uint (preferably from right after the crash, but if you don't have it, then the current one)
SHOW INDEX IN history_uint
SHOW TABLE STATUS LIKE 'history_uint'
Based on that, I'll try to generate artificial data.
Thanks.
|
|
Hi Elena,
Thanks for the advice. I attached the file which you ask.
|
|
Hi blacksmith,
Do you keep getting the crashes? If so, how often, and have you maybe noticed any other similarities?
So far I couldn't reproduce it, so whatever information you have that you think might be significant would be much appreciated.
|
|
Hi Elena.
Now the situation has become worse. Crash occurs at a frequency several times a week. This is due to the increasing load on the base. I'm enabled general.log and based on its I will prepare statistics. Do you have any recommendations for us?
Also we have a new bug. Maybe it is not relevant to the current bug if so I will describe it in a new ticket:
2014-11-25 04:35:18 7f7a2e296700 InnoDB: error: return value 16 when calling
InnoDB: pthread_mutex_destroy().
InnoDB: Byte contents of the pthread mutex at 0x7f7c25ccb400:
len 48; hex 000091a8c8360000502530b40000ff71af06000000000091b4525464000000000000009e73e000000000000000000000; asc 6 P%0 q RTd s ;
2014-11-25 04:35:18 7f7a2e296700 InnoDB: Assertion failure in thread 140162737202944 in file os0sync.cc line 265
InnoDB: Failing assertion: pthread_cond_destroy(cond) == 0
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
141125 4:35:18 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.0.13-MariaDB-wsrep-log
key_buffer_size=2097152
read_buffer_size=2097152
max_used_connections=420
max_threads=514
thread_count=407
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2118249 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0x7f7b4d148008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f7a2e295ce0 thread_stack 0x48000
Fatal signal 11 while backtracing
141125 04:35:22 mysqld_safe Number of processes running now: 0
141125 04:35:22 mysqld_safe mysqld restarted
|
|
Hi Elena
I attached the log file with error and part of general.log while the crash appear. Time when the error occurred 04:00:04 02/12/2014.
If such data are necessary to solve our problem, I will collect them after each crash.
|
|
blacksmith,
Does it always crash with "Failing assertion: pthread_cond_destroy(cond) == 0" now, or is it different each time?
Upd: I see in the attachment that the answer is "no", there it crashed with signal 11. Have you seen any other kinds of crashes, or is it always one of these two?
|
|
Now I do not have any statistics about second crash (signal 6). it occurred only 1 time. I have not seen any other kinds of crashes. Only 2 that I've described.
|
|
I see.
I checked the logs, and there is still the mystery that struck me last time. As I understand, you use a single node, and you run your nightly partitioning job on this node, but there is no sign of it in the general log... I wonder how it can be so. Could you please quote an earlier part of the error log (before the crash), where it says something – anything – about the event scheduler?
Also, do you see any sign in table structures that the partitioning job is executed as expected?
|
|
I attached the log file with error which you ask.
Of course I see the results of partitioning. The event is fulfilled correctly. Results are expected and correct.
I am concerned about record:
141201 4:01:56 [ERROR] mysqld: Table './mysql/event' is marked as crashed and should be repaired
141201 4:01:56 [ERROR] mysqld: Table 'event' is marked as crashed and should be repaired
141201 4:01:56 [Warning] Checking table: './mysql/event'
141201 4:01:56 [ERROR] mysql.event: 1 client is using or hasn't closed the table properly
but event table are correct after start of service.
|
|
It's expected, so to speak.
The table is marked as crashed because it wasn't closed properly (naturally, if the server crashed before while an event was being executed).
You have an automatic MyISAM recovery on server startup, so the table gets repaired, and you see it in a normal state after startup.
Finally found the event – in the last general log you attached it's the very first line, procedure call. And the previous log starts a second later, I guess the call just didn't make it there. One mystery solved.
|
|
You have binary log enabled. Would you be able to provide the log which covers the same 5 seconds 141202 4:00:00 - 4:00:04 (or maybe starting a few seconds earlier, just in case)?
It can be done two ways. You can either upload the whole log which relates to that period to our ftp.askmonty.org/private, or you can run
mysqlbinlog --verbose --base64-output=DECODE-ROWS [--start-datetime='YYYY-MM-DD hh:mm:ss' --stop-datetime='YYYY-MM-DD hh:mm:ss'] <binlog name>
|
, check that you get the right interval and attach it (it should be very small since the most of activity is selects which don't go to the binlog).
The point of this exercise is to (try to) see which part of the event gets executed by the time of the crash, and maybe this way to narrow down the concurrency issue.
|
|
Ok. I will try tomorrow to prepare the data which you ask.
|
|
Hi blacksmith,
Did you ever get to collecting the data as above?
Are you still observing the problem?
|
|
Hi, Elena.
The problem is still relevant. Now we just off the load on the database during partitioning and also I can't collect data because crashe doesn't appear due to load off
|
|
From MDEV-6464:
We are using partitioning for some tables. Partitioning is started by event scheduler at 4 am. Procedure that we use, you can see following the link: https://www.zabbix.org/wiki/Docs/howto/mysql_partitioning
|
|
Debug assertion failure:
|
Stack trace from 10.0 commit a99efc00a68
|
0.0/sql/ha_partition.cc:5090: bool ha_partition::init_record_priority_queue(): Assertion `!m_ordered_rec_buffer' failed.
|
150604 1:14:43 [ERROR] mysqld got signal 6 ;
|
|
Trying to get some variables.
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0x7f1c2f8dc088): SELECT * FROM `history_uint` H WHERE H.`itemid`= '456' AND H.`clock` > /* -10 */ 1433369672 ORDER BY H.`clock` DESC LIMIT 2 OFFSET 0
|
Connection ID (thread ID): 7
|
Status: NOT_KILLED
|
# 2015-06-04T01:15:01 [29787] #6 0x00007f1c58eb9311 in *__GI___assert_fail (assertion=0x112ba11 "!m_ordered_rec_buffer", file=<optimized out>, line=5090, function=0x112e700 "bool ha_partition::init_record_priority_queue()") at assert.c:81
|
# 2015-06-04T01:15:01 [29787] #7 0x0000000000e1fda4 in ha_partition::init_record_priority_queue (this=0x7f1c2f87c888) at 10.0/sql/ha_partition.cc:5090
|
# 2015-06-04T01:15:01 [29787] #8 0x0000000000e20262 in ha_partition::index_init (this=0x7f1c2f87c888, inx=0, sorted=true) at 10.0/sql/ha_partition.cc:5216
|
# 2015-06-04T01:15:01 [29787] #9 0x0000000000614ffb in handler::ha_index_init (this=0x7f1c2f87c888, idx=0, sorted=true) at 10.0/sql/handler.h:2618
|
# 2015-06-04T01:15:01 [29787] #10 0x000000000099225e in QUICK_RANGE_SELECT::reset (this=0x7f1c2f810a00) at 10.0/sql/opt_range.cc:11693
|
# 2015-06-04T01:15:01 [29787] #11 0x000000000099ad44 in QUICK_SELECT_DESC::reset (this=0x7f1c2f810a00) at 10.0/sql/opt_range.h:975
|
# 2015-06-04T01:15:01 [29787] #12 0x00000000006df422 in join_init_read_record (tab=0x7f1c2f9ded20) at 10.0/sql/sql_select.cc:18684
|
# 2015-06-04T01:15:01 [29787] #13 0x00000000006dd45c in sub_select (join=0x7f1c2f8dd388, join_tab=0x7f1c2f9ded20, end_of_records=false) at 10.0/sql/sql_select.cc:17798
|
# 2015-06-04T01:15:01 [29787] #14 0x00000000006dcd23 in do_select (join=0x7f1c2f8dd388, fields=0x7f1c40a5a1d0, table=0x0, procedure=0x0) at 10.0/sql/sql_select.cc:17463
|
# 2015-06-04T01:15:01 [29787] #15 0x00000000006ba1ad in JOIN::exec_inner (this=0x7f1c2f8dd388) at 10.0/sql/sql_select.cc:3079
|
# 2015-06-04T01:15:01 [29787] #16 0x00000000006b76c2 in JOIN::exec (this=0x7f1c2f8dd388) at 10.0/sql/sql_select.cc:2368
|
# 2015-06-04T01:15:01 [29787] #17 0x00000000006baa57 in mysql_select (thd=0x7f1c40a56070, rref_pointer_array=0x7f1c40a5a330, tables=0x7f1c2f8dc368, wild_num=1, fields=..., conds=0x7f1c2f8dcfa0, og_num=1, order=0x7f1c2f8dd1e0, group=0x0, having=0x0, proc_param=0x0, select_options=2148829952, result=0x7f1c2f8dd368, unit=0x7f1c40a599c8, select_lex=0x7f1c40a5a0b8) at 10.0/sql/sql_select.cc:3308
|
# 2015-06-04T01:15:01 [29787] #18 0x00000000006b1041 in handle_select (thd=0x7f1c40a56070, lex=0x7f1c40a59900, result=0x7f1c2f8dd368, setup_tables_done_option=0) at 10.0/sql/sql_select.cc:373
|
# 2015-06-04T01:15:01 [29787] #19 0x00000000006856f3 in execute_sqlcom_select (thd=0x7f1c40a56070, all_tables=0x7f1c2f8dc368) at 10.0/sql/sql_parse.cc:5274
|
# 2015-06-04T01:15:01 [29787] #20 0x000000000067da24 in mysql_execute_command (thd=0x7f1c40a56070) at 10.0/sql/sql_parse.cc:2562
|
# 2015-06-04T01:15:01 [29787] #21 0x000000000068824b in mysql_parse (thd=0x7f1c40a56070, rawbuf=0x7f1c2f8dc088 "SELECT * FROM `history_uint` H WHERE H.`itemid`= '456' AND H.`clock` > /* -10 */ 1433369672 ORDER BY H.`clock` DESC LIMIT 2 OFFSET 0", length=132, parser_state=0x7f1c5b06c650) at 10.0/sql/sql_parse.cc:6529
|
# 2015-06-04T01:15:01 [29787] #22 0x000000000067ac37 in dispatch_command (command=COM_QUERY, thd=0x7f1c40a56070, packet=0x7f1c40a5c071 "", packet_length=134) at 10.0/sql/sql_parse.cc:1308
|
# 2015-06-04T01:15:01 [29787] #23 0x0000000000679f1d in do_command (thd=0x7f1c40a56070) at 10.0/sql/sql_parse.cc:999
|
# 2015-06-04T01:15:01 [29787] #24 0x0000000000797fdf in do_handle_one_connection (thd_arg=0x7f1c472e5070) at 10.0/sql/sql_connect.cc:1375
|
# 2015-06-04T01:15:01 [29787] #25 0x0000000000797d32 in handle_one_connection (arg=0x7f1c472e5070) at 10.0/sql/sql_connect.cc:1289
|
# 2015-06-04T01:15:01 [29787] #26 0x00007f1c5acb3b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
|
# 2015-06-04T01:15:01 [29787] #27 0x00007f1c58f6995d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
|
|
|
Slightly different stack trace (same tree):
10.0/sql/ha_partition.cc:5090: bool ha_partition::init_record_priority_queue(): Assertion `!m_ordered_rec_buffer' failed.
|
150604 2:29:09 [ERROR] mysqld got signal 6 ;
|
# 2015-06-04T02:29:17 [2391] #6 0x00007fcb03c47311 in *__GI___assert_fail (assertion=0x112ba11 "!m_ordered_rec_buffer", file=<optimized out>, line=5090, function=0x112e700 "bool ha_partition::init_record_priority_queue()") at assert.c:81
|
# 2015-06-04T02:29:17 [2391] #7 0x0000000000e1fda4 in ha_partition::init_record_priority_queue (this=0x7fcad18ce888) at 10.0/sql/ha_partition.cc:5090
|
# 2015-06-04T02:29:17 [2391] #8 0x0000000000e20262 in ha_partition::index_init (this=0x7fcad18ce888, inx=0, sorted=true) at 10.0/sql/ha_partition.cc:5216
|
# 2015-06-04T02:29:17 [2391] #9 0x0000000000614ffb in handler::ha_index_init (this=0x7fcad18ce888, idx=0, sorted=true) at 10.0/sql/handler.h:2618
|
# 2015-06-04T02:29:17 [2391] #10 0x00000000006def39 in join_read_last_key (tab=0x7fcad1839bd0) at 10.0/sql/sql_select.cc:18567
|
# 2015-06-04T02:29:17 [2391] #11 0x00000000006dd45c in sub_select (join=0x7fcad1823278, join_tab=0x7fcad1839bd0, end_of_records=false) at 10.0/sql/sql_select.cc:17798
|
# 2015-06-04T02:29:17 [2391] #12 0x00000000006dcd23 in do_select (join=0x7fcad1823278, fields=0x7fcaf12311d0, table=0x0, procedure=0x0) at 10.0/sql/sql_select.cc:17463
|
# 2015-06-04T02:29:17 [2391] #13 0x00000000006ba1ad in JOIN::exec_inner (this=0x7fcad1823278) at 10.0/sql/sql_select.cc:3079
|
# 2015-06-04T02:29:17 [2391] #14 0x00000000006b76c2 in JOIN::exec (this=0x7fcad1823278) at 10.0/sql/sql_select.cc:2368
|
# 2015-06-04T02:29:17 [2391] #15 0x00000000006baa57 in mysql_select (thd=0x7fcaf122d070, rref_pointer_array=0x7fcaf1231330, tables=0x7fcad18222f8, wild_num=1, fields=..., conds=0x7fcad1822f28, og_num=1, order=0x7fcad1823168, group=0x0, having=0x0, proc_param=0x0, select_options=2148797184, result=0x7fcad1823258, unit=0x7fcaf12309c8, select_lex=0x7fcaf12310b8) at 10.0/sql/sql_select.cc:3308
|
# 2015-06-04T02:29:17 [2391] #16 0x00000000006b1041 in handle_select (thd=0x7fcaf122d070, lex=0x7fcaf1230900, result=0x7fcad1823258, setup_tables_done_option=0) at 10.0/sql/sql_select.cc:373
|
# 2015-06-04T02:29:17 [2391] #17 0x00000000006856f3 in execute_sqlcom_select (thd=0x7fcaf122d070, all_tables=0x7fcad18222f8) at 10.0/sql/sql_parse.cc:5274
|
# 2015-06-04T02:29:17 [2391] #18 0x000000000067da24 in mysql_execute_command (thd=0x7fcaf122d070) at 10.0/sql/sql_parse.cc:2562
|
# 2015-06-04T02:29:17 [2391] #19 0x000000000068824b in mysql_parse (thd=0x7fcaf122d070, rawbuf=0x7fcad1822088 "SELECT * FROM `t1` WHERE `id`= 233 AND `tm` > 1433374143 ORDER BY `tm` DESC LIMIT 2", length=83, parser_state=0x7fcb05d2c650) at 10.0/sql/sql_parse.cc:6529
|
# 2015-06-04T02:29:17 [2391] #20 0x000000000067ac37 in dispatch_command (command=COM_QUERY, thd=0x7fcaf122d070, packet=0x7fcaf1239071 "", packet_length=85) at 10.0/sql/sql_parse.cc:1308
|
# 2015-06-04T02:29:17 [2391] #21 0x0000000000679f1d in do_command (thd=0x7fcaf122d070) at 10.0/sql/sql_parse.cc:999
|
# 2015-06-04T02:29:17 [2391] #22 0x0000000000797fdf in do_handle_one_connection (thd_arg=0x7fcaf122d070) at 10.0/sql/sql_connect.cc:1375
|
# 2015-06-04T02:29:17 [2391] #23 0x0000000000797d32 in handle_one_connection (arg=0x7fcaf122d070) at 10.0/sql/sql_connect.cc:1289
|
# 2015-06-04T02:29:17 [2391] #24 0x00007fcb05a41b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
|
|
|
I also got the crash on a release server (10.0.12) once, it looks similar to the initially reported one, although not identical; but on some reason it happens extremely rarely for me, while debug assertions are much more frequent.
Version: '10.0.12-MariaDB' socket: '/tmp/mysql.sock' port: 3306 MariaDB Server
|
150603 8:13:00 [ERROR] mysqld got signal 11 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.0.12-MariaDB
|
key_buffer_size=2097152
|
read_buffer_size=2097152
|
max_used_connections=9
|
max_threads=153
|
thread_count=11
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 945224 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0x7f5934e67008
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7f59d7b22e40 thread_stack 0x48000
|
mysys/stacktrace.c:247(my_print_stacktrace)[0xbb674e]
|
sql/signal_handler.cc:153(handle_fatal_signal)[0x7152ec]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf0a0)[0x7f59da1b90a0]
|
include/que0que.ic:63(que_fork_get_first_thr)[0x9f6fb0]
|
handler/ha_innodb.cc:8648(ha_innobase::general_fetch(unsigned char*, unsigned int, unsigned int))[0x952528]
|
sql/handler.cc:2646(handler::ha_index_prev(unsigned char*))[0x717eac]
|
sql/ha_partition.cc:6406(ha_partition::handle_ordered_prev(unsigned char*))[0xb46e83]
|
sql/handler.cc:2646(handler::ha_index_prev(unsigned char*))[0x717eac]
|
sql/opt_range.cc:11916(QUICK_SELECT_DESC::get_next())[0x81c224]
|
sql/records.cc:346(rr_quick)[0x8369d5]
|
sql/sql_select.cc:17613(sub_select(JOIN*, st_join_table*, bool))[0x5d4be7]
|
sql/sql_select.cc:17255(do_select)[0x5d52ed]
|
sql/sql_select.cc:3077(JOIN::exec_inner())[0x5e829c]
|
sql/sql_select.cc:2368(JOIN::exec())[0x5e6b0a]
|
sql/sql_select.cc:372(handle_select(THD*, LEX*, select_result*, unsigned long))[0x5ea71d]
|
sql/sql_parse.cc:5264(execute_sqlcom_select)[0x591180]
|
sql/sql_parse.cc:2554(mysql_execute_command(THD*))[0x593f17]
|
sql/sql_parse.cc:6409(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x59b2da]
|
sql/sql_parse.cc:1311(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x59d087]
|
sql/sql_parse.cc:1006(do_command(THD*))[0x59d7a5]
|
sql/sql_connect.cc:1379(do_handle_one_connection(THD*))[0x664874]
|
sql/sql_connect.cc:1295(handle_one_connection)[0x6649b2]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50)[0x7f59da1b0b50]
|
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f59d912c95d]
|
|
Trying to get some variables.
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0x7f5932c2f020): SELECT * FROM `history_uint` H WHERE H.`itemid`= '31' AND H.`clock` > 1433308320 ORDER BY H.`clock` DESC LIMIT 2 OFFSET 0
|
Connection ID (thread ID): 16
|
Status: NOT_KILLED
|
|
|
Concurrent tests to reproduce debug assertion failures above:
|
RQG grammar 1.yy
|
thread1_init:
|
SET GLOBAL query_cache_size = 1024*1024*128
|
; DROP DATABASE `test`; CREATE DATABASE `test`; USE `test`
|
; { $first_partition = 0; $last_partition = 1; '' }
|
CREATE TABLE `t1` (`id` INT, `tm` INT, INDEX (`id`,`tm`)) ENGINE=InnoDB
|
PARTITION BY RANGE (`tm`) (PARTITION p0 VALUES LESS THAN (UNIX_TIMESTAMP('2015-01-01 00:00:00')),
|
PARTITION p1 VALUES LESS THAN (UNIX_TIMESTAMP('2015-01-01 00:00:10'))) ;
|
|
thread2:
|
INSERT INTO `t1` VALUES values, values, values, values, values, values, values, values, values, values ;
|
|
thread1:
|
ALTER TABLE `t1` DROP PARTITION { 'p'.($first_partition++) }; create_partition { sleep 5; '' };
|
|
thread3:
|
BEGIN ; select ; select ; select ; select ; select ; select ; select ; select ; select ; select ; COMMIT ;
|
|
values:
|
( item_id , { int(time()) } );
|
|
select:
|
SELECT * FROM `t1` WHERE `id`= item_id AND `tm` > { int(time()-5) } ORDER BY `tm` DESC LIMIT 2 ;
|
|
create_partition:
|
SET @stmt = CONCAT('ALTER TABLE `t1` ADD PARTITION (
|
PARTITION { 'p'.(++$last_partition) } VALUES LESS THAN (',UNIX_TIMESTAMP(),') ENGINE = InnoDB)')
|
; PREPARE stmt FROM @stmt; EXECUTE stmt; DEALLOCATE PREPARE stmt;
|
|
item_id:
|
{ int(rand(500)) } ;
|
|
RQG command line
|
perl ./runall-new.pl --skip-gendata --threads=3 --grammar=1.yy --queries=100M --duration=3600 --basedir=<your basedir> --vardir=<your vardir>
|
thread1_init:
|
CREATE TABLE `history_uint` (`itemid` BIGINT UNSIGNED NOT NULL, `clock` INT DEFAULT '0'NOT NULL, INDEX `history_uint_1` (`itemid`,`clock`)) ENGINE=InnoDB
|
; SET @stmt = CONCAT('ALTER TABLE `history_uint` PARTITION BY RANGE (`clock`) (PARTITION p0 VALUES LESS THAN (UNIX_TIMESTAMP("2015-01-01 00:00:00")) ENGINE = InnoDB,
|
PARTITION pnow VALUES LESS THAN (',UNIX_TIMESTAMP(),') ENGINE = InnoDB)')
|
; PREPARE stmt FROM @stmt; EXECUTE stmt; DEALLOCATE PREPARE stmt; SET NAMES UTF8;
|
|
thread2_init:
|
{ sleep 5; '' };
|
|
thread2:
|
BEGIN ; insert ; insert ; insert ; insert ; insert ; COMMIT ;
|
|
thread3:
|
CALL test.drop_partitions(); CALL test.create_partition(); { sleep 10; '' };
|
|
thread1:
|
BEGIN ; select ; select ; select ; select ; select ; select ; select ; select ; select ; select ; COMMIT ;
|
|
insert:
|
INSERT INTO `history_uint` VALUES values10;
|
|
values10:
|
values, values, values, values, values, values, values, values, values, values ;
|
|
values:
|
( { int(rand(500)) }, { int(time()) } );
|
|
select:
|
SELECT * FROM `history_uint` H WHERE H.`itemid`= { "'".int(rand(500))."'" } AND H.`clock` > select_time ORDER BY H.`clock` DESC LIMIT 2 OFFSET 0 ;
|
|
select_time:
|
/* -10 */ { int(time() - 10) } ;
|
|
Initial SQL 2.sql
|
DROP DATABASE IF EXISTS test;
|
CREATE DATABASE test;
|
USE test;
|
|
DELIMITER $$
|
|
CREATE PROCEDURE `create_partition`()
|
BEGIN
|
DECLARE ROWS_CNT INT UNSIGNED;
|
DECLARE BEGINTIME TIMESTAMP;
|
DECLARE ENDTIME INT UNSIGNED;
|
DECLARE PARTITIONNAME VARCHAR(64);
|
SET BEGINTIME = NOW() + INTERVAL 10 SECOND;
|
SET PARTITIONNAME = DATE_FORMAT( BEGINTIME, 'p%Y_%m_%d_%H_%i_%s' );
|
SET ENDTIME = UNIX_TIMESTAMP(BEGINTIME + INTERVAL 10 SECOND);
|
SELECT COUNT(*) INTO ROWS_CNT
|
FROM information_schema.partitions
|
WHERE table_schema = 'test' AND table_name = 'history_uint' AND partition_name = PARTITIONNAME;
|
IF ROWS_CNT = 0 THEN
|
SET @SQL = CONCAT( 'ALTER TABLE `test`.`history_uint`',
|
' ADD PARTITION (PARTITION ', PARTITIONNAME, ' VALUES LESS THAN (', ENDTIME, '));' );
|
PREPARE STMT FROM @SQL;
|
EXECUTE STMT;
|
DEALLOCATE PREPARE STMT;
|
ELSE
|
SELECT 'partition already exists' AS result;
|
END IF;
|
END$$
|
DELIMITER ;
|
|
----------------------------------------------------------------------------------------------------------------
|
|
----------------------------------------------------------------------------------------------------------------
|
|
DELIMITER $$
|
USE `test`$$
|
DROP PROCEDURE IF EXISTS `drop_partitions`$$
|
CREATE PROCEDURE `drop_partitions`()
|
BEGIN
|
DECLARE PARTITIONNAME_TMP VARCHAR(64);
|
DECLARE VALUES_LESS_TMP INT;
|
DECLARE KEEP_HISTORY_BEFORE INT;
|
DECLARE DONE INT DEFAULT 0;
|
DECLARE get_partitions CURSOR FOR
|
SELECT p.`partition_name`, LTRIM(RTRIM(p.`partition_description`))
|
FROM information_schema.partitions p
|
WHERE p.table_schema = 'test' AND p.table_name = 'history_uint'
|
ORDER BY p.table_name, p.subpartition_ordinal_position;
|
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
|
OPEN get_partitions;
|
loop_check_prt: LOOP
|
IF DONE THEN
|
LEAVE loop_check_prt;
|
END IF;
|
FETCH get_partitions INTO PARTITIONNAME_TMP, VALUES_LESS_TMP;
|
SET KEEP_HISTORY_BEFORE = UNIX_TIMESTAMP(NOW() - INTERVAL 5 MINUTE);
|
IF KEEP_HISTORY_BEFORE >= VALUES_LESS_TMP THEN
|
CALL drop_old_partition(PARTITIONNAME_TMP);
|
END IF;
|
END LOOP loop_check_prt;
|
CLOSE get_partitions;
|
END$$
|
DELIMITER ;
|
|
----------------------------------------------------------------------------------------------------------------
|
|
DELIMITER $$
|
USE `test`$$
|
DROP PROCEDURE IF EXISTS `drop_old_partition`$$
|
CREATE PROCEDURE `drop_old_partition`(IN_PARTITIONNAME VARCHAR(64))
|
BEGIN
|
DECLARE ROWS_CNT INT UNSIGNED;
|
SELECT COUNT(*) INTO ROWS_CNT
|
FROM information_schema.partitions
|
WHERE table_schema = 'test' AND table_name = 'history_uint' AND partition_name = IN_PARTITIONNAME;
|
IF ROWS_CNT = 1 THEN
|
SET @SQL = CONCAT( 'ALTER TABLE `test`.`history_uint`',
|
' DROP PARTITION ', IN_PARTITIONNAME, ';' );
|
PREPARE STMT FROM @SQL;
|
EXECUTE STMT;
|
DEALLOCATE PREPARE STMT;
|
ELSE
|
SELECT 'partition does not exist' AS result;
|
END IF;
|
END$$
|
DELIMITER ;
|
- Start server with --query-cache-size=128M;
- Feed 2.sql to it;
- run (assuming your server is running on port 3306, otherwise modify dsn)
perl ./gentest.pl --dsn="dbi:mysql:host=127.0.0.1:port=3306:user=root:database=test" --threads=3 --grammar=2.yy --queries=100M --duration=3600
|
|
|
cvicentiu,
I'm assigning it to you thinking there might be a slim chance it's related to MDEV-6500 you are working on now (query cache, partitions, something that should or shouldn't be there...).
I could not reproduce it on 5.5, so it's a task for 10.0. When its time comes, please check if your fix for MDEV-6500 fixes it also. If not, you can choose either to fix it separately, or to return it to me (or re-assign to whoever you think it belongs).
To whoever ends up fixing it:
Please pay special attention to the stack traces.
It is important to understand whether all of these are the same problem:
1) release stack trace from the bug description;
2) debug assertion with stack trace 1;
3) debug assertion with stack trace 2;
4) another release stack trace.
If they are not the same, and you are not sure you have fixed release stack trace(s) by fixing debug assertion failures, please do not close the bug report but return it to me instead for further testing.
Also, please take a look at the one-time failure described in this comment. If it's unrelated (which it probably is), please comment saying so, so that we know it did not get fixed along with the primary problem.
|
|
Just wanted to comment that I'm having this same issue. Stored procedures to drop/create partitions runs once per day. A few times a week a node will crash within a minute of the procedure running. This is a database for a Zabbix install, I'm using the stored procedures referenced above. Please let me know if there are any data you'd like me to submit.
Edit: Using 10.0.16
|
|
Just a comment to say that I'm having the issue, too, exactly as described above.
|
|
This seems to be coming more of an issue in our environment. What started out as once every couple of months seems to be increasing in frequency. It is now happening several times a week.
|
|
elenst and cvicentiu,
Would it be possible to get this bug looked at again? It's really causing havoc for us. Its happening twice a week or more right after the create/drop partition stored procedure runs.
neominder_sqlcrashlog.txt
|
|
Hi,
I've hit the same situation, stack trace looks like this :
150827 5:28:54 [ERROR] mysqld got signal 11 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.0.20-MariaDB-log
|
key_buffer_size=536870912
|
read_buffer_size=131072
|
max_used_connections=62
|
max_threads=1502
|
thread_count=34
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3822319 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0x19a89bf8
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7f8e7a7dcd78 thread_stack 0x48000
|
mysys/stacktrace.c:247(my_print_stacktrace)[0xb81c6e]
|
sql/signal_handler.cc:153(handle_fatal_signal)[0x726cab]
|
/lib64/libpthread.so.0(+0xf710)[0x7f8eafa8d710]
|
sql/sql_plist.h:128(I_P_List<MDL_ticket, I_P_List_adapter<MDL_ticket, &(MDL_ticket::next_in_lock), &(MDL_ticket::prev_in_lock)>, I_P_List_null_counter, I_P_List_fast_push_back<MDL_ticket> >::remove(MDL_ticket*))[0x68c800]
|
sql/sql_plist.h:126(I_P_List<MDL_ticket, I_P_List_adapter<MDL_ticket, &(MDL_ticket::next_in_context), &(MDL_ticket::prev_in_context)>, I_P_List_null_counter, I_P_List_no_push_back<MDL_ticket> >::remove(MDL_ticket*))[0x68c968]
|
sql/sql_parse.cc:5196(mysql_execute_command(THD*))[0x5acb05]
|
sql/sql_parse.cc:6529(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x5b325d]
|
sql/sql_parse.cc:1310(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x5b4f49]
|
sql/sql_connect.cc:1378(do_handle_one_connection(THD*))[0x68218e]
|
sql/sql_connect.cc:1295(handle_one_connection)[0x682262]
|
/lib64/libpthread.so.0(+0x79d1)[0x7f8eafa859d1]
|
/lib64/libc.so.6(clone+0x6d)[0x7f8eae19f8fd]
|
|
Trying to get some variables.
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0x7f8b14004c60): is an invalid pointer
|
Connection ID (thread ID): 1357300
|
|
Status: NOT_KILLED
|
|
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
|
|
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
|
information that should help you find out what is causing the crash.
|
150827 05:28:55 mysqld_safe Number of processes running now: 0
|
150827 05:28:55 mysqld_safe mysqld restarted
|
150827 5:28:56 [Note] /usr/sbin/mysqld (mysqld 10.0.20-MariaDB-log) starting as process 25031 ...
|
150827 5:28:56 [Note] InnoDB: Using mutexes to ref count buffer pool pages
|
150827 5:28:56 [Note] InnoDB: The InnoDB memory heap is disabled
|
150827 5:28:56 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
|
150827 5:28:56 [Note] InnoDB: Memory barrier is not used
|
150827 5:28:56 [Note] InnoDB: Compressed tables use zlib 1.2.3
|
150827 5:28:56 [Note] InnoDB: Using Linux native AIO
|
150827 5:28:56 [Note] InnoDB: Using CPU crc32 instructions
|
150827 5:28:56 [Note] InnoDB: Initializing buffer pool, size = 12.0G
|
150827 5:28:56 [Note] InnoDB: Completed initialization of buffer pool
|
150827 5:28:57 [Note] InnoDB: Highest supported file format is Barracuda.
|
150827 5:28:57 [Note] InnoDB: Log scan progressed past the checkpoint lsn 114661609834
|
150827 5:28:57 [Note] InnoDB: Database was not shutdown normally!
|
150827 5:28:57 [Note] InnoDB: Starting crash recovery.
|
150827 5:28:57 [Note] InnoDB: Reading tablespace information from the .ibd files...
|
150827 5:28:58 [Note] InnoDB: Restoring possible half-written data pages
|
150827 5:28:58 [Note] InnoDB: from the doublewrite buffer...
|
InnoDB: Doing recovery: scanned up to log sequence number 114661791274
|
150827 5:28:59 [Note] InnoDB: Starting an apply batch of log records to the database...
|
InnoDB: Progress in percent: 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
|
InnoDB: Apply batch completed
|
150827 5:29:00 [Note] InnoDB: 128 rollback segment(s) are active.
|
150827 5:29:00 [Note] InnoDB: Waiting for purge to start
|
150827 5:29:00 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.24-72.2 started; log sequence number 114661791274
|
150827 5:29:00 [Note] Plugin 'FEEDBACK' is disabled.
|
150827 5:29:00 [Note] Server socket created on IP: '::'.
|
150827 5:29:00 [ERROR] mysqld: Table './mysql/event' is marked as crashed and should be repaired
|
150827 5:29:00 [Warning] Checking table: './mysql/event'
|
150827 5:29:00 [ERROR] mysql.event: 1 client is using or hasn't closed the table properly
|
150827 5:29:00 [Note] Event Scheduler: Loaded 1 event
|
150827 5:29:00 [Note] Event Scheduler: scheduler thread started with id 1
|
150827 5:29:00 [Note] /usr/sbin/mysqld: ready for connections.
|
|
At that hour partitioning was occuring. Let me know if anything else is needed.
|
|
Perhaps not important, but note the lack of an explicit PRIMARY KEY.
|
|
Both tests (run1 and run2) crashes quickly on commit 9a3ff0789f (when the tests were written), but do not crash on the latest 10.0 (tests run for an hour and exit successfully).
Apparently the crash was fixed meanwhile, so closing.
|