|
Another similar crash:
210212 7:54:32 [ERROR] mysqld got exception 0xc0000005 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.5.8-MariaDB-log
|
key_buffer_size=67108864
|
read_buffer_size=1048576
|
max_used_connections=113
|
max_threads=65537
|
thread_count=133
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 88784 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x236552825b8
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
ha_spider.dll!spider_flush_table_mon_cache_init()
|
ha_spider.dll!spider_ping_table_init()
|
ha_spider.dll!spider_ping_table_init()
|
ha_spider.dll!spider_ping_table_init()
|
server.dll!ha_partition::handle_unordered_scan_next_partition()[ha_partition.cc:7533]
|
server.dll!ha_partition::multi_range_read_next()[ha_partition.cc:6656]
|
server.dll!QUICK_RANGE_SELECT::get_next()[opt_range.cc:12263]
|
server.dll!find_all_keys()[filesort.cc:892]
|
server.dll!filesort()[filesort.cc:352]
|
server.dll!create_sort_index()[sql_select.cc:23858]
|
server.dll!st_join_table::sort_table()[sql_select.cc:21605]
|
server.dll!join_init_read_record()[sql_select.cc:21542]
|
server.dll!sub_select()[sql_select.cc:20616]
|
server.dll!do_select()[sql_select.cc:20153]
|
server.dll!JOIN::exec_inner()[sql_select.cc:4459]
|
server.dll!JOIN::exec()[sql_select.cc:4241]
|
server.dll!mysql_select()[sql_select.cc:4657]
|
server.dll!handle_select()[sql_select.cc:417]
|
server.dll!execute_sqlcom_select()[sql_parse.cc:6266]
|
server.dll!mysql_execute_command()[sql_parse.cc:3968]
|
server.dll!mysql_parse()[sql_parse.cc:8048]
|
server.dll!dispatch_command()[sql_parse.cc:1875]
|
server.dll!do_command()[sql_parse.cc:1353]
|
server.dll!threadpool_process_request()[threadpool_common.cc:363]
|
server.dll!tp_callback()[threadpool_common.cc:194]
|
ntdll.dll!RtlReleaseSRWLockExclusive()
|
ntdll.dll!RtlReleaseSRWLockExclusive()
|
KERNEL32.DLL!BaseThreadInitThunk()
|
ntdll.dll!RtlUserThreadStart()
|
|
|
The stacktrace is not very accurate, since pdbs are not installed by default for plugins (it is separate download now)
|
|
During the investigation for the present bug, I hit another crashing bug. I created an issue for it: MDEV-26158.
|
|
valerii Hi! Just to make sure, let me ask you about the issue because you are the reporter. The title of the issue is "Spider crash when selecting all rows from the partitioned table". So, did you (or a customer) detect the query that caused the crash?
|
|
As you can see in the initial description, it's as simple as:
select * from db.spider_table limit 0, 1000;
Just the Spider table is partitioned over more than one back end nodes, "current" and "archive". More details and other similar crashing queries are in the issues associated. Please, check them.
|
|
Thanks. Sorry, I should have checked the issue description in more detail. I was confused with other issues.
|
|
valerii, how easily can they repeat a bug? is it just start the server and run a query? Or they need to put it in production and wait a few days?
Do they even have a setup to repeat the issue still?
|
|
one option could be, we provide a debug binary and the crash is repeated under rr. Then we get the rr trace and fix the bug. There may be many followup questions about exact hardware (/proc/cpuinfo), system libraries installed, kernel version, etc.
|
|
Thank you all for the advice. Using rr sounds good, at least not hopeless, to me. Of course, it would not be easy to satisfy prerequisites that rr would work as intended. I am not very proficient in rr, so please give me some time to look into it.
|
|
I reproduced and reopened MDEV-26158 and provided further details there.
|
|
Extensive testing has not specifically seen 'spider_flush_table_mon_cache_init' stack as of yet.
|
|
I have however seen a SIGSEGV in spider_check_direct_order_limit (mentioned earlier in this ticket) in two instances. Attempting reduction on those.
|
|
Roel The stack trace in the issue description might not be accurate, because spider_flush_table_mon_cache_init() is a trivial function and has no place to crash. I think that spider_check_direct_order_limit() is a possible candidate of the cause of the bug.
|
|
nayuta-yanagisawa understood. Thank you.
|
|
Various (quite sporadic) testcases can be/have been simplified to short ones, which at least once (in their current form) have produced stacks that include spider_check_direct_order_limit as a prominent frame in the stack. However, when those testcases are replayed manually, I keep running either MDEV-26539, MDEV-26583 or MDEV-26582.
It looks like there are conflicting crashes. Fixing or at least understanding MDEV-26539, MDEV-26583 and MDEV-26582 may help with this bug in one way or another - either by direct discovery of the underlying issue of this bug (if there are no conflicting crashes but rather two different types of crashes resulting from one core issue), or by clearing the path for a well-working (tough likely very sporadic) testcase for this bug (if there are conflicting crashes with two different underlying issues), or by clarifying that any spider_check_direct_order_limit bugs seen thus far are not related.
A few example stacks of what was seen in terms of spider_check_direct_order_limit:
|
10.7.0 1bc82aaf0a7746c0921a94034aff2d51f0d75cd0 (Optimized)
|
Core was generated by `/test/MD040921-mariadb-10.7.0-linux-x86_64-opt/bin/mysqld --no-defaults --max_a'.
|
Program terminated with signal SIGSEGV, Segmentation fault.
|
#0 spider_check_index_merge (table=0x14c0b80a0938,
|
select_lex=select_lex@entry=0x14c0b9626458)
|
at /test/10.7_opt/storage/spider/spd_table.cc:9648
|
[Current thread is 1 (Thread 0x14c194464700 (LWP 3272649))]
|
(gdb) bt
|
#0 spider_check_index_merge (table=0x14c0b80a0938, select_lex=select_lex@entry=0x14c0b9626458) at /test/10.7_opt/storage/spider/spd_table.cc:9648
|
#1 0x000014c17d7155c9 in spider_check_direct_order_limit (spider=spider@entry=0x14c0b80d2fb0) at /test/10.7_opt/storage/spider/spd_table.cc:9252
|
#2 0x000014c17d732f21 in ha_spider::check_direct_order_limit (this=0x14c0b80d2fb0) at /test/10.7_opt/storage/spider/ha_spider.cc:13096
|
#3 ha_spider::check_direct_order_limit (this=0x14c0b80d2fb0) at /test/10.7_opt/storage/spider/ha_spider.cc:13089
|
#4 0x000014c17d73cfcd in ha_spider::index_last_internal (buf=0x14c0b80c8c28 "", this=0x14c0b80d2fb0) at /test/10.7_opt/storage/spider/ha_spider.cc:3350
|
#5 ha_spider::index_last_internal (this=0x14c0b80d2fb0, buf=0x14c0b80c8c28 "") at /test/10.7_opt/storage/spider/ha_spider.cc:3310
|
#6 0x000014c17d746119 in ha_spider::get_auto_increment (this=0x14c0b80d2fb0, offset=<optimized out>, increment=3, nb_desired_values=1, first_value=0x14c194463ac0, nb_reserved_values=0x14c194463ac8) at /test/10.7_opt/storage/spider/ha_spider.cc:9812
|
#7 0x000056272f1fd848 in handler::update_auto_increment (this=this@entry=0x14c0b80d2fb0) at /test/10.7_opt/sql/handler.cc:3954
|
#8 0x000014c17d735286 in ha_spider::update_auto_increment (this=0x14c0b80d2fb0) at /test/10.7_opt/storage/spider/ha_spider.cc:9755
|
#9 0x000014c17d747345 in ha_spider::write_row (this=0x14c0b80d2fb0, buf=0x14c0b80c8bb8 "\376") at /test/10.7_opt/storage/spider/ha_spider.cc:10014
|
#10 0x000056272f2031f0 in handler::ha_write_row (this=0x14c0b80d2fb0, buf=0x14c0b80c8bb8 "\376") at /test/10.7_opt/sql/handler.cc:7514
|
#11 0x000056272ef79a8d in write_record (thd=thd@entry=0x14c0b971f808, table=0x14c0b80a0938, info=info@entry=0x14c0b97260e0, sink=sink@entry=0x0) at /test/10.7_opt/sql/sql_insert.cc:2135
|
#12 0x000056272ef7c4bd in Delayed_insert::handle_inserts (this=0x14c0b971f7e8) at /test/10.7_opt/sql/sql_insert.cc:3576
|
#13 0x000056272ef83915 in handle_delayed_insert (arg=arg@entry=0x14c0b971f7e8) at /test/10.7_opt/sql/sql_insert.cc:3316
|
#14 0x000056272f427298 in pfs_spawn_thread (arg=0x5627321be8e8) at /test/10.7_opt/storage/perfschema/pfs.cc:2201
|
#15 0x000014c194bae609 in start_thread (arg=<optimized out>) at pthread_create.c:477
|
#16 0x000014c19479c293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
And
|
10.7.0 1bc82aaf0a7746c0921a94034aff2d51f0d75cd0 (Optimized)
|
Core was generated by `/test/MD040921-mariadb-10.7.0-linux-x86_64-opt/bin/mysqld --no-defaults --max_a'.
|
Program terminated with signal SIGSEGV, Segmentation fault.
|
#0 spider_check_index_merge (table=0x1517480274a8,
|
select_lex=select_lex@entry=0x151748011048)
|
at /test/10.7_opt/storage/spider/spd_table.cc:9640
|
[Current thread is 1 (Thread 0x15187006a700 (LWP 2980262))]
|
(gdb) bt
|
#0 spider_check_index_merge (table=0x1517480274a8, select_lex=select_lex@entry=0x151748011048) at /test/10.7_opt/storage/spider/spd_table.cc:9640
|
#1 0x00001518645255c9 in spider_check_direct_order_limit (spider=spider@entry=0x15174801ab00) at /test/10.7_opt/storage/spider/spd_table.cc:9252
|
#2 0x0000151864542f21 in ha_spider::check_direct_order_limit (this=0x15174801ab00) at /test/10.7_opt/storage/spider/ha_spider.cc:13096
|
#3 ha_spider::check_direct_order_limit (this=0x15174801ab00) at /test/10.7_opt/storage/spider/ha_spider.cc:13089
|
#4 0x000015186454cfcd in ha_spider::index_last_internal (buf=0x151748028958 "", this=0x15174801ab00) at /test/10.7_opt/storage/spider/ha_spider.cc:3350
|
#5 ha_spider::index_last_internal (this=0x15174801ab00, buf=0x151748028958 "") at /test/10.7_opt/storage/spider/ha_spider.cc:3310
|
#6 0x0000151864556119 in ha_spider::get_auto_increment (this=0x15174801ab00, offset=<optimized out>, increment=1, nb_desired_values=1, first_value=0x151870069ac0, nb_reserved_values=0x151870069ac8) at /test/10.7_opt/storage/spider/ha_spider.cc:9812
|
#7 0x000056017b2af848 in handler::update_auto_increment (this=this@entry=0x15174801ab00) at /test/10.7_opt/sql/handler.cc:3954
|
#8 0x0000151864545286 in ha_spider::update_auto_increment (this=0x15174801ab00) at /test/10.7_opt/storage/spider/ha_spider.cc:9755
|
#9 0x0000151864557345 in ha_spider::write_row (this=0x15174801ab00, buf=0x151748028948 "\376") at /test/10.7_opt/storage/spider/ha_spider.cc:10014
|
#10 0x000056017b2b51f0 in handler::ha_write_row (this=0x15174801ab00, buf=0x151748028948 "\376") at /test/10.7_opt/sql/handler.cc:7514
|
#11 0x000056017b02ba8d in write_record (thd=thd@entry=0x151748063b88, table=0x1517480274a8, info=info@entry=0x15174806a460, sink=sink@entry=0x0) at /test/10.7_opt/sql/sql_insert.cc:2135
|
#12 0x000056017b02e4bd in Delayed_insert::handle_inserts (this=0x151748063b68) at /test/10.7_opt/sql/sql_insert.cc:3576
|
#13 0x000056017b035915 in handle_delayed_insert (arg=arg@entry=0x151748063b68) at /test/10.7_opt/sql/sql_insert.cc:3316
|
#14 0x000056017b4d9298 in pfs_spawn_thread (arg=0x56017d09ead8) at /test/10.7_opt/storage/perfschema/pfs.cc:2201
|
#15 0x0000151879a3a609 in start_thread (arg=<optimized out>) at pthread_create.c:477
|
#16 0x0000151879628293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
Other uniqueID's observed:
SIGSEGV|spider_get_select_limit_from_select_lex|spider_get_select_limit_from_select_lex|spider_check_direct_order_limit|ha_spider::check_direct_order_limit
|
SIGSEGV|spider_check_index_merge|spider_check_direct_order_limit|ha_spider::check_direct_order_limit|ha_spider::index_last_internal
|
The first one forms part of MDEV-26583.
|
|
In regards spider_check_direct_order_limit, mentioned by nayuta-yanagisawa, in this comment, we still have MDEV-26583 which is currently in review by holyfoot. It would be interesting to know if the customer uses INSERT DELAYED, and -irrespective of whether the customer/users used DELAYED - testing in the future with a build with that patch as well as the various other patches since the work on this bug started, would be a good avenue imho, especially noting that in the original stack spider_set_direct_limit_offset is highlighted as the crashing function, and that bugs MDEV-27522 and MDEV-27388 (and previously MDEV-27171) have this too as the crashing function. Especially the just lodged MDEV-27522 may be of interest, but that one has thus far only been seen in 10.5+. The other two point back to MDEV-27240 which was a core issue. IOW, when all of these tickets have been resolved there is a good possibility that the issue listed here (which may be the same as MDEV-26583, or a duplicate of MDEV-27240 and/or sister bugs) is resolved too.
|
|
In recent (note: pre-MDEV-27240 patch) runs, the following issues have shown up:
SIGSEGV|spider_set_direct_limit_offset|ha_spider::check_direct_order_limit|ha_spider::check_direct_order_limit|ha_spider::rnd_next_internal
|
SIGSEGV|spider_set_direct_limit_offset|ha_spider::check_direct_order_limit|ha_spider::rnd_next_internal|ha_spider::pre_rnd_next
|
SIGSEGV|spider_set_direct_limit_offset|ha_spider::check_direct_order_limit|ha_spider::rnd_next_internal|ha_spider::rnd_next
|
Note the similarity with the OP.
FAULTING_IP: ha_spider!spider_set_direct_limit_offset+45
|
server.dll!handler::ha_rnd_next()[handler.cc:3066]
|
server.dll!ha_partition::rnd_next()[ha_partition.cc:5220]
|
server.dll!handler::ha_rnd_next()[handler.cc:3066]
|
|
|
As per nayuta-yanagisawa, the UniqueID's in the last comment are likely to be fixed post the MDEV-27240 patch, which was just merged into 10.5 trunk and upmerged all the way to 10.8.
A testcase for this UniqueID:
SIGSEGV|spider_set_direct_limit_offset|ha_spider::check_direct_order_limit|ha_spider::rnd_next_internal|ha_spider::rnd_next
|
Being:
INSTALL PLUGIN spider SONAME 'ha_spider.so';
|
CREATE TABLE t0 (a INT,KEY a (a)) ENGINE=SPIDER;
|
CREATE TEMPORARY TABLE sql_temp0_innodb (a INT,b TEXT) ENGINE=SPIDER;
|
LOAD DATA INFILE''INTO TABLE t;
|
SELECT a,b FROM t0 outer_table WHERE a=(SELECT a FROM t0 WHERE outer_table.a=@arg0 AND a=0) AND b='';
|
ALTER TABLE t0 CHANGE c0 c0 INT UNSIGNED;
|
DELETE FROM t0;
|
Indeed proves the same as it crashes pre-fix but does not crash post-fix.
There is thus a high or very high possibility that this bug is fixed with the latest source.
|
|
I am closing this bug as fixed by the MDEV-27240 patch.
Anyone having run into this bug in production should retry with the latest 10.5 or 10.6 source code, which in due time will be incorporated in the next MariaDB sever release.
If anyone should see this bug, or a similar one, after revisions
10.5 e44439ab7354c5dff20707325694839e9346fb27
|
10.6 bd03c0e51629e1c3969a171137712a6bb854c232
|
10.7 b0998583f8e7331245597a7bf9a878cf6c7fa969 (not for production use yet)
|
10.8 e222e44d1bfc995870430bb90d8ac97e91f66cb4 (not for production use yet)
|
Please let us know and we will re-open the bug.
|