Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.5.9
-
Microsoft Windows Server 2016 Standard, VMware x-64 based PC, 32 GB Memory
Description
MariaDB has crashed due to the following error. Is it possible to know the exact reason for this crash? Attached mariadbd dump and database error file for reference.
2021-08-01 04:09:31 0x11d8 InnoDB: Assertion failure in file D:\winx64-packages\build\src\storage\innobase\fil\fil0fil.cc line 488
|
InnoDB: Failing assertion: space->is_ready_to_close() || space->purpose == FIL_TYPE_TEMPORARY || srv_fast_shutdown == 2 || !srv_was_started
|
InnoDB: We intentionally generate a memory trap.
|
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
|
InnoDB: If you get repeated assertion failures or crashes, even
|
InnoDB: immediately after the mysqld startup, there may be
|
InnoDB: corruption in the InnoDB tablespace. Please refer to
|
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
|
InnoDB: about forcing recovery.
|
210801 4:09:31 [ERROR] mysqld got exception 0x80000003 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.5.9-MariaDB-log
|
key_buffer_size=33554432
|
read_buffer_size=8388608
|
max_used_connections=64
|
max_threads=65537
|
thread_count=56
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 52371 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x274c52f9038
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
server.dll!my_parameter_handler()
|
ucrtbase.dll!raise()
|
ucrtbase.dll!abort()
|
server.dll!json_valid()
|
server.dll!json_valid()
|
server.dll!json_valid()
|
server.dll!json_valid()
|
server.dll!json_valid()
|
server.dll!json_valid()
|
server.dll!json_valid()
|
server.dll!json_valid()
|
server.dll!json_valid()
|
server.dll!json_valid()
|
server.dll!json_valid()
|
server.dll!json_valid()
|
server.dll!json_valid()
|
server.dll!json_valid()
|
server.dll!json_valid()
|
server.dll!?ha_open@handler@@QEAAHPEAUTABLE@@PEBDHIPEAUst_mem_root@@PEAV?$List@VString@@@@@Z()
|
server.dll!?open_table_from_share@@YA?AW4open_frm_error@@PEAVTHD@@PEAUTABLE_SHARE@@PEBUst_mysql_const_lex_string@@IIIPEAUTABLE@@_NPEAV?$List@VString@@@@@Z()
|
server.dll!?open_table@@YA_NPEAVTHD@@PEAUTABLE_LIST@@PEAVOpen_table_context@@@Z()
|
server.dll!?open_and_lock_tables@@YA_NPEAVTHD@@AEBUDDL_options_st@@PEAUTABLE_LIST@@_NIPEAVPrelocking_strategy@@@Z()
|
server.dll!?open_tables@@YA_NPEAVTHD@@AEBUDDL_options_st@@PEAPEAUTABLE_LIST@@PEAIIPEAVPrelocking_strategy@@@Z()
|
server.dll!?mysqld_show_create_get_fields@@YA_NPEAVTHD@@PEAUTABLE_LIST@@PEAV?$List@VItem@@@@PEAVString@@@Z()
|
server.dll!?mysqld_show_create@@YA_NPEAVTHD@@PEAUTABLE_LIST@@@Z()
|
server.dll!?mysql_execute_command@@YAHPEAVTHD@@@Z()
|
server.dll!?mysql_parse@@YAXPEAVTHD@@PEADIPEAVParser_state@@_N3@Z()
|
server.dll!?dispatch_command@@YA_NW4enum_server_command@@PEAVTHD@@PEADI_N3@Z()
|
server.dll!?do_command@@YA_NPEAVTHD@@@Z()
|
server.dll!?pool_of_threads_scheduler@@YAXPEAUscheduler_functions@@PEAKPEAV?$Atomic_counter@I@@@Z()
|
server.dll!?tp_callback@@YAXPEAUTP_connection@@@Z()
|
KERNEL32.DLL!TermsrvOpenRegEntry()
|
ntdll.dll!RtlReleasePebLock()
|
ntdll.dll!RtlReleaseSRWLockExclusive()
|
KERNEL32.DLL!BaseThreadInitThunk()
|
ntdll.dll!RtlUserThreadStart()
|
|
Trying to get some variables.
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0x274c2c215b0): show create table `iv_crl_ca_details`
|
Connection ID (thread ID): 39444
|
Status: NOT_KILLED
|
|
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
|
|
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
|
information that should help you find out what is causing the crash.
|
Writing a core file at C:\Program Files\McAfee\Network Security Manager\MariaDB\data\
|
Minidump written to C:\Program Files\McAfee\Network Security Manager\MariaDB\data\mariadbd.dmp
|
Attachments
- crashreport.txt
- 86 kB
- mariadbd.dmp
- 243 kB
- mariadbd.exe
- 24 kB
- mariadbd.pdb
- 716 kB
- mariadbd104.dmp
- 223 kB
- mariadbd182.dmp
- 163 kB
- MDEV-26293.cfg
- 43 kB
- MDEV-26293.yy
- 0.6 kB
- out.txt
- 24 kB
- screenshot-1.png
- 580 kB
- serverdll.zip
- 5.52 MB
- serverpdb.zip.001
- 5.00 MB
- serverpdb.zip.002
- 5.00 MB
- serverpdb.zip.003
- 5.00 MB
- serverpdb.zip.004
- 2.39 MB
- VK-REL-NSM.err
- 24 kB
- WIN104.err
- 10 kB
- WIN-G57FI8VPMSD.err
- 42 kB
Issue Links
- is caused by
-
MDEV-23855 InnoDB log checkpointing causes regression for write-heavy OLTP
-
- Closed
-
- relates to
-
MDEV-26381 InnoDB: Failing assertion: fsize != os_offset_t(-1) on CREATE TABLE with many partitions / ENOMEM (out of kernel memory) from fstat handling
-
- Closed
-
-
MDEV-32851 Assertion failure crash
-
- Closed
-
Activity
schandol, this does not seem to be our 10.5.9 either. At least both Visual Studio and cdb debugger detect mismatches. there is some noisy symbol loading information in out.txt that says that.
DBGHELP: C:\bug\server.dll - file not found
|
DBGHELP: c:\bug\mariadb-10.5.9-winx64\bin\server.dll - mismatched timestamp
|
DBGHELP: server.dll not found in c:\bug\mariadb-10.5.9-winx64\bin
|
DBGENG: C:\Program Files\McAfee\Network Security Manager\MariaDB\bin\server.dll - Couldn't map image from disk.
|
DBGENG: server.dll - Partial symbol image load missing image info
|
DBGHELP: Module is not fully loaded into memory.
|
DBGHELP: Searching for symbols using debugger-provided data.
|
DBGHELP: c:\bug\mariadb-10.5.9-winx64\bin\server.pdb - mismatched pdb
|
DBGHELP: c:\bug\mariadb-10.5.9-winx64\bin\dll\server.pdb - file not found
|
DBGHELP: c:\bug\mariadb-10.5.9-winx64\bin\symbols\dll\server.pdb - file not found
|
SYMSRV: BYINDEX: 0xA
|
What's interesting is that the paths n assertion are correct (D:\winx64-packages\build\src\storage\innobase\fil\fil0fil.cc - this is where it is built from in our build)
So I do not really know, looks like executable was tampered by someone else, or maybe .dmp is somehow damaged.
Vladislav Vaintroub, Thanks for the update. However, we are not bundling all the debug files. As suggested for deciphering will include mariadbd.pdb file also. Just curious to know whether any dependency files also needs to be bundled? If so can you please update.
We are using mariadb named binaries downloaded from https://downloads.mariadb.org/mariadb/10.5.9/. For the actual problem mentioned i.e., "that the paths are correct .... but looks like executable was tampered by someone else". Also as a note, we are able to restart the service and login to database. Whether this tampering has happened by some process while database is running?
I do not know, I can't load symbols for this dmp with the released version. I have no other clues than "mismatched binary" by cdb and Visual Studio. Maybe you have some better clues than I have, and I do something wrong, and your output of cdb command is different?
cdb -lines -z C:\bug\mariadb-10.5.9-winx64\bin\mariadbd.dmp -i C:\bug\mariadb-10.5.9-winx64\bin -y C:\bug\mariadb-10.5.9-winx64\bin -logo out.txt -c "!sym noisy;!sym prompts;.reload;lml;!uniqstack -p;q"
|
Of course with fixed path, the "bin" directory (in my case is the C:\bug\mariadb-10.5.9-winx64\bin) is the one that contains the crashdump mariadbd.dmp, server.dll, server.pdb (also mariadbd.exe, but this one is no more important, as server.dll now has all the server functionality)
Vladislav Vaintroub, we have hit this issue on two systems with similar error
InnoDB: Assertion failure in file D:\winx64-packages\build\src\storage\innobase\fil\fil0fil.cc line 488
InnoDB: Failing assertion: space->is_ready_to_close() || space->purpose == FIL_TYPE_TEMPORARY || srv_fast_shutdown == 2 || !srv_was_started
InnoDB: We intentionally generate a memory trap.
Also, the two systems uses windows 2016 as OS.
Attached the two crash dumps with err files for reference.
Can we have more priority for this one to know exact reason?
schandol, Well, analyzing a minidump is not a big deal. It is well documented, and it should be possible to do that with the exact command that I wrote in previous entry (cdb.exe should be installed). It does not work for me for your dumps, neither the old, nor the new.
Thus I alleged, your executable must be different somehow from what we ship. If I'm wrong, and you are right, you should easily execute a couple of cdb commands, as above https://jira.mariadb.org/browse/MDEV-26293?focusedCommentId=195859&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-195859 , or Visual Studio commands, and prove me wrong. However, the stackdumping code in the program itself, and cdb , and Visual Studio , none of those think they can do anything with that dump, and I'm relying on one of them to work, so there is nothing I can do.
See, "no matching binary found" for mariadbd.exe, for server.dll , and (you can't see it) for server_audit.dll . The symbol path includes 10.5.9 bin directory, and there is pdb for server.dll and for mariadbd.exe in there. Proof :
Ok I know what exactly happened. Some releases were rebuilt due to https://jira.mariadb.org/browse/MDEV-24944 , and downloads were fixed, so that they contained newer builds .
If you downloaded just after release, the timestamp of the executable is Feb.19
This is what cdb says for your .dmp
00007ff7`f0580000 00007ff7`f058a000 mariadbd (deferred)
|
Image path: C:\Program Files\McAfee\Network Security Manager\MariaDB\bin\mariadbd.exe
|
Image name: mariadbd.exe
|
Timestamp: Fri Feb 19 11:36:22 2021 (602F94A6)
|
...
|
the one I got downloaded now has the timestamp of March,22. Output of dumpbin /headers contains
FILE HEADER VALUES
|
8664 machine (x64)
|
7 number of sections
|
6058F60E time date stamp Mon Mar 22 20:54:54 2021
|
...
|
Both files are built from the same source revision 3a8ca9096ea82ca61811450775511533d6cb1bb4, but I can't use debug symbols of the new build on the old exe.
Thus schandol, if you did not save the server.dll,server.pdb and mariadbd.exe,mariadbd.pdb somewhere (which are in the ZIP distro that you downloaded) , if you have not got them, we are out of luck, because I don't that those files either. I would suggest to use the executables from the current download, or better yet, 10.5.12 or whatever is current. I would suggest not throw away the pdbs in the future, they can be useful.
If you saved them, can you please attach them to the ticket.
schandol, thanks, and server.dll/pdb also, if possible , please
Vladislav Vaintroub, Thanks for the update. Added three files server.dll, mariadbd.exe,mariadbd.pdb. The server.dll file has been added as a zip file since uploading file limit is 10 MB. However, the server.pdb file size is 55.9 MB, Exceeding 10 MB for all the windows/linux zip formats. Is there any alternate way to upload this file?
Also, mentioned that the binary set of files have been updated for 10.5.9 in downloads section of mariadb. Curious to know the fixes that went into with the latest 10.5.9 when compared to that one used at our end.
schandol, thanks. you could just share the remaining file on a google/azure drive, and add the link here. Or, you can use 7zip to split large archives like here https://www.techulator.com/resources/6601-How-split-join-files-using-7-zip-free-tool.aspx
new 10.5.9 is built from the very same sources, the binaries and MSI was just signed differently, with timestamping. only MDEV-24944 was fixed, nothing else.
Vladislav Vaintroub, Uploaded the file by splitting into 4 files using 7 zip.
So, marko.
One of the crashes is this
server.dll!ut_dbg_assertion_failed(const char * expr, const char * file, unsigned int line) Line 60 C++
|
server.dll!fil_node_t::prepare_to_close_or_detach() Line 487 C++
|
server.dll!fil_node_t::close() Line 466 C++
|
server.dll!fil_space_t::try_to_close(bool print_info) Line 110 C++
|
server.dll!fil_node_open_file(fil_node_t * node) Line 438 C++
|
server.dll!fil_space_t::prepare(bool have_mutex) Line 666 C++
|
server.dll!fil_space_t::get(unsigned __int64 id) Line 1446 C++
|
server.dll!buf_flush_space(const unsigned int id) Line 1226 C++
|
server.dll!buf_do_flush_list_batch(unsigned __int64 max_n, unsigned __int64 lsn) Line 1450 C++
|
server.dll!buf_flush_lists(unsigned __int64 max_n, unsigned __int64 lsn) Line 1580 C++
|
server.dll!buf_flush_page_cleaner(void * __formal) Line 2195 C++
|
another one is this
> server.dll!my_parameter_handler(const wchar_t * expression, const wchar_t * function, const wchar_t * file, unsigned int line, unsigned __int64 pReserved) Line 278 C
|
ucrtbase.dll!raise() Unknown
|
ucrtbase.dll!abort() Unknown
|
server.dll!ut_dbg_assertion_failed(const char * expr, const char * file, unsigned int line) Line 60 C++
|
server.dll!fil_node_t::prepare_to_close_or_detach() Line 487 C++
|
server.dll!fil_node_t::close() Line 466 C++
|
server.dll!fil_space_t::try_to_close(bool print_info) Line 110 C++
|
server.dll!fil_node_open_file(fil_node_t * node) Line 438 C++
|
server.dll!fil_space_t::prepare(bool have_mutex) Line 666 C++
|
server.dll!fil_space_t::acquire() Line 535 C++
|
server.dll!btr_cur_prefetch_siblings(const buf_block_t * block, const dict_index_t * index) Line 3341 C++
|
server.dll!btr_cur_optimistic_delete_func(btr_cur_t * cursor, mtr_t * mtr) Line 5620 C++
|
server.dll!row_purge_remove_clust_if_poss_low(purge_node_t * node, unsigned __int64 mode) Line 138 C++
|
server.dll!row_purge_remove_clust_if_poss(purge_node_t * node) Line 185 C++
|
server.dll!row_purge_record_func(purge_node_t * node, unsigned char * undo_rec, bool updated_extern) Line 1047 C++
|
server.dll!row_purge(purge_node_t * node, unsigned char * undo_rec, que_thr_t * thr) Line 1112 C++
|
server.dll!row_purge_step(que_thr_t * thr) Line 1159 C++
|
server.dll!que_thr_step(que_thr_t * thr) Line 946 C++
|
server.dll!que_run_threads_low(que_thr_t * thr) Line 1011 C++
|
server.dll!que_run_threads(que_thr_t * thr) Line 1050 C++
|
server.dll!srv_task_execute() Line 1872 C++
|
server.dll!purge_worker_callback(void * __formal) Line 2034 C++
|
server.dll!tpool::task_group::execute(tpool::task * t) Line 57 C++
|
ntdll.dll!TppSimplepExecuteCallback() Unknown
|
ntdll.dll!TppWorkerThread() Unknown
|
There is a third one, with all stacks , it is in crashreport.txt, look around STACK_TEXT. There are also all threads stacktraces in this file
wlad, can you find all threads that are accessing the same tablespace object? Is the value of fil_space_t::n_pending available?
fil_space_t::n_pending is not available. The crashreport.txt can give an idea of what threads are doing. I'm not sure how to automate finding threads using the same object there.
Can we reproduce this with a small value of innodb_open_files? What is the value of that parameter, and roughly how many distinct .ibd files or InnoDB tablespaces are being accessed concurrently?
Vladislav Vaintroub / Marko Mäkelä , Just curious to know if we found out the reason for the mentioned abrupt shutdowns. Can you please provide an update.
schandol, I guess the last marko's question was for your.
Also after producing the stacktrace, which marko asked for, I resort to uninvolved observer's role
Vladislav Vaintroub, I was in the impression that the chat was between You and Marko Mäkelä. We are using innodb_open_files parameter with value as 300. There are 752 tables in the application. Also the configuration has been done using innodb_file_per_table format. So with this there will be 752 ibd files. Also, there will be one tablespace with a default value of 100M which will be autoextended.
Marko Mäkelä, Can you suggest a value for this parameter based on crash dumps or analysis that has been done from your end. Also, I think we are accessing roughly around 10 distinct .ibd files as per the database log or slowquery log file during this crash. Is there any way to find exact access count?
schandol, based on the information that is available so far, I cannot suggest a safe value for this parameter. I would not think that 300 is unreasonably small. Because of several layers of caching, there is no sharp correlation between SQL statements being executed and the number of files being open. It might be turn out that innodb_open_files is useless on modern operating systems, and having a large number of open file descriptors is not an actual problem anymore. If that is the case, setting the parameter to the maximum possible value should work around the problem.
wlad is our Windows expert, and he can perform tests on Windows. Because the file system APIs work differently between Windows and (say) Linux, and because we have not observed this problem on Linux, where most of our internal testing takes place, I think that our best chances to repeat this problem would be on Microsoft Windows.
What kind of storage is being used? HDD or SSD? How big are the tables?
The errors are usually portable, and reproducible elsewhere if reasonable efforts are made. The exception from there is real Windows APIs , like asynchronous IO.
Having said that, you can have innodb_open_files easily as million, the number of open handles in a Windows program is limited by 2^24.
I note (https://jira.mariadb.org/secure/attachment/58503/screenshot-1.png) that a Virus scanner (McAfee Network Security Manager) is operating on this system and that it somehow seems to be interacting or trying to interact with mariadb.exe/server.dll. Virus scanners may cause issues with mysqld. Please try excluding mysqld.exe/dll's (in fact the whole directory/subdirectories in which the server is installed as well as the data directory and it's subdirectories) from any virus/network scanner activity (in disk and memory), at the very least as a test. Other possible mysqld interaction causes are anti-virus/anti-malware, firewall, and UAC. Also ensure that the OS is fully patched/updated.
2021-08-11 5:20:15 39251 [Note] InnoDB: Cannot close file .\lf\iv_dev_prof_data.ibd because of 22 pending operations
|
....
|
2021-08-11 5:20:56 39251 [Note] InnoDB: Cannot close file .\lf\iv_dev_prof_data.ibd because of 0 pending operations and pending fsync
|
2021-08-11 5:20:56 39251 [Note] InnoDB: Cannot close file .\lf\iv_dev_prof_data.ibd because of 0 pending operations and pending fsync
|
2021-08-11 5:20:56 39251 [Note] InnoDB: Cannot close file .\lf\iv_dev_prof_data.ibd because of 0 pending operations and pending fsync
|
2021-08-11 5:20:56 39251 [Note] InnoDB: Cannot close file .\lf\iv_dev_prof_data.ibd because of 0 pending operations and pending fsync
|
2021-08-11 5:20:56 39251 [Note] InnoDB: Cannot close file .\lf\iv_dev_prof_data.ibd because of 0 pending operations and pending fsync
|
2021-08-11 05:21:49 0x187c InnoDB: Assertion failure in file D:\winx64-packages\build\src\storage\innobase\fil\fil0fil.cc line 488
|
InnoDB: Failing assertion: space->is_ready_to_close() || space->purpose == FIL_TYPE_TEMPORARY || srv_fast_shutdown == 2 || !srv_was_started
|
InnoDB: We intentionally generate a memory trap.
|
There are regular pending operations and pending fsync's. In this case the crash happens very close to them. I would not be surpised if the issues were connected, and anti-virus or some other 3rd party interaction remains a major possibility imho. schandol What sort of SAN or similar are you using please?
Also interesting is that the FAILURE_BUCKET_ID is BREAKPOINT_80000003_server.dll!my_parameter_handler which may indicate underlying issues on the system or OS itself, or a memory leak in the server. Some limited info on these example URL's:
https://techcult.com/fix-error-0x80000003/
https://www.auslogics.com/en/articles/fix-breakpoint-error-0x80000003/ (I don't recommend using the driver updater!)
https://www.thewindowsclub.com/breakpoint-reached-error-0x80000003
Marko Mäkelä, Vladislav Vaintroub and Roel Van de Paar, Thanks for the updates.
The kind of storage used in these systems is HDD. Generally there will be around six tables which are of in units to tens of GBs. In all the crashed systems the highest table data is around 10 GB and remaining 5 tables will vary between 2 GB to 7 GB. The remaining tables are of few MBs.
Regarding innodb_open_files parameter noticed that we can give up to 4096 in one of the system. However more than has been given (for example 8092) it is taking as 256 (equal to table_open_cache value). But in documentation (https://mariadb.com/kb/en/innodb-system-variables/#innodb_open_files) it has been mentioned that we can have upto 4294967295. Is it dependent on other parameters? Also how we can set with higher value?
As a note, still now we have not faced issues by having having 300 as its value in previous mariadb versions of 10.3.Now we are facing with 10.5 for the first time.
On other hand on other system if this parameter has been set with still lower value say 10, it is getting crashed with in 2 hours after raising semaphore and wait issues. We have changed to 4096 to check its behavior.
Also McAfee Network Security Manager is not an antivirus application. On all the affected systems the virus scan is not running.
However for other points mentioned will check with concerned and will update accordingly.
schandol, in 10.5 the code flow is different from older versions mostly due to MDEV-23855, which reduced contention on fil_system.mutex especially for write workloads.
Customer got the same assertion failure in different context:
Thread 1 (Thread 0x7e6d68762700 (LWP 263532)):
|
#0 0x00007fefb327aaa1 in pthread_kill () from /lib64/libpthread.so.0
|
No symbol table info available.
|
#1 0x0000562c5f35827e in handle_fatal_signal (sig=6) at /usr/src/debug/MariaDB-/src_0/sql/signal_handler.cc:330
|
curr_time = 1655794807
|
tm = {tm_sec = 7, tm_min = 0, tm_hour = 14, tm_mday = 21, tm_mon = 5, tm_year = 122, tm_wday = 2, tm_yday = 171, tm_isdst = 0, tm_gmtoff = 25200, tm_zone = 0x562c62d416d0 "+07"}
|
thd = 0x7e6bb40009b8
|
print_invalid_query_pointer = true
|
#2 <signal handler called>
|
No symbol table info available.
|
#3 0x00007fefb13453d7 in raise () from /lib64/libc.so.6
|
No symbol table info available.
|
#4 0x00007fefb1346ac8 in abort () from /lib64/libc.so.6
|
No symbol table info available.
|
#5 0x0000562c5f02d3af in ut_dbg_assertion_failed (expr=expr@entry=0x562c5fc0c130 "space->is_ready_to_close() || space->purpose == FIL_TYPE_TEMPORARY || srv_fast_shutdown == 2 || !srv_was_started", file=file@entry=0x562c5fc0c060 "/home/jenkins/workspace/MariaDBE-Custom-RPM/label/rhel-7/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX_ON_ES_BACKUP_DEBUGSOURCE/storage/innobase/fil/fil0fil.cc", line=line@entry=488) at /usr/src/debug/MariaDB-/src_0/storage/innobase/ut/ut0dbg.cc:60
|
No locals.
|
#6 0x0000562c5f87aac8 in fil_node_t::prepare_to_close_or_detach (this=this@entry=0x7e6bd502fba8) at /usr/src/debug/MariaDB-/src_0/storage/innobase/fil/fil0fil.cc:487
|
No locals.
|
#7 0x0000562c5f87ab27 in fil_node_t::close (this=0x7e6bd502fba8) at /usr/src/debug/MariaDB-/src_0/storage/innobase/fil/fil0fil.cc:463
|
No locals.
|
#8 0x0000562c5f87aca3 in fil_space_t::try_to_close (print_info=false) at /usr/src/debug/MariaDB-/src_0/storage/innobase/fil/fil0fil.cc:109
|
node = 0x7e6bd502fba8
|
space = 0x7e6bd517f948
|
#9 0x0000562c5f87fc03 in fil_node_open_file (node=node@entry=0x7e6bc4307098) at /usr/src/debug/MariaDB-/src_0/storage/innobase/fil/fil0fil.cc:438
|
count = 0
|
#10 0x0000562c5f03245c in fil_space_t::prepare (this=0x7e6bc4304238, have_mutex=have_mutex@entry=false) at /usr/src/debug/MariaDB-/src_0/storage/innobase/fil/fil0fil.cc:666
|
node = 0x7e6bc4307098
|
is_open = <optimized out>
|
#11 0x0000562c5f7f4535 in acquire (this=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/include/fil0fil.h:535
|
No locals.
|
#12 btr_cur_prefetch_siblings (block=block@entry=0x7eea2b818c00, index=index@entry=0x7e6b71ba93a0) at /usr/src/debug/MariaDB-/src_0/storage/innobase/btr/btr0cur.cc:3343
|
page = <optimized out>
|
#13 0x0000562c5f7fcc11 in btr_cur_optimistic_insert (flags=flags@entry=0, cursor=cursor@entry=0x7e6d68760190, offsets=offsets@entry=0x7e6d687600f0, heap=heap@entry=0x7e6d687600e0, entry=entry@entry=0x7e6bade5d690, rec=rec@entry=0x7e6d687608b0, big_rec=big_rec@entry=0x7e6d687600d0, n_ext=n_ext@entry=0, thr=thr@entry=0x7e6b9efc8bc8, mtr=mtr@entry=0x7e6d68760b10) at /usr/src/debug/MariaDB-/src_0/storage/innobase/btr/btr0cur.cc:3470
|
big_rec_vec = 0x0
|
err = DB_FAIL
|
max_size = <optimized out>
|
n_recs = <optimized out>
|
page_cursor = <optimized out>
|
block = 0x7eea2b818c00
|
page = 0x7eea2c068000 <Address 0x7eea2c068000 out of bounds>
|
dummy = 0x7e6bb4a296f6 "\201\064\212Í€"
|
rec_size = 196
|
index = 0x7e6b71ba93a0
|
reorg = <optimized out>
|
inherit = true
|
#14 0x0000562c5f7341c4 in row_ins_clust_index_entry_low (flags=flags@entry=0, mode=<optimized out>, mode@entry=2, index=index@entry=0x7e6b71ba93a0, n_uniq=n_uniq@entry=1, entry=entry@entry=0x7e6bade5d690, n_ext=n_ext@entry=0, thr=thr@entry=0x7e6b9efc8bc8) at /usr/src/debug/MariaDB-/src_0/storage/innobase/row/row0ins.cc:2739
|
insert_rec = 0x7f6a87477100 <Address 0x7f6a87477100 out of bounds>
|
pcur = {btr_cur = {index = 0x7e6b71ba93a0, page_cur = {index = 0x0, rec = 0x7eea2c06b4eb <Address 0x7eea2c06b4eb out of bounds>, offsets = 0x0, block = 0x7eea2b818c00}, purge_node = 0x0, left_block = 0x0, thr = 0x7e6b9efc8bc8, flag = BTR_CUR_BINARY, tree_height = 3, up_match = 0, up_bytes = 0, low_match = 0, low_bytes = 0, n_fields = 0, n_bytes = 0, fold = 0, path_arr = 0x0, rtr_info = 0x0}, latch_mode = 2, old_stored = false, old_rec = 0x0, old_n_fields = 0, rel_pos = 0, block_when_stored = {m_block = 0x0, m_page_id = {m_id = 0}}, modify_clock = 0, pos_state = BTR_PCUR_IS_POSITIONED, search_mode = PAGE_CUR_LE, trx_if_known = 0x0, old_rec_buf = 0x0, buf_size = 0}
|
cursor = 0x7e6d68760190
|
err = <optimized out>
|
big_rec = 0x0
|
mtr = {m_last = 0x0, m_last_offset = 0, m_log_mode = 0, m_modifications = 0, m_made_dirty = 1, m_inside_ibuf = 0, m_trim_pages = 0, m_memo = {m_heap = 0x0, m_list = {<ilist<mtr_buf_t::block_t, void>> = {sentinel_ = {next = 0x7e6d68760b48, prev = 0x7e6d68760b48}}, size_ = 1}, m_size = 64, m_first_block = {<ilist_node<void>> = {next = 0x7e6d68760b28, prev = 0x7e6d68760b28}, m_data = "\000\000\000\000\000\000\000\000 \000\000\000m~\000\000\000\000\000\000\000\000\000\000\001\000\000\000\323\177\000\000\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\000\214\201+\352~\000\000\002", '\000' <repeats 15 times>, "\022\000\000\000k~\000\000\000j\337}\vmF\251", '\000' <repeats 16 times>, "p\021vhm~\000\000\230\376¶k~", '\000' <repeats 35 times>, "h\221k\246~\000\000\030\000\f_,V\000\000\000\000\000\000\000\000\000\000(\fvhm~\000\000(\fvhm~\000\000\001", '\000' <repeats 15 times>..., m_used = 64}}, m_log = {m_heap = 0x0, m_list = {<ilist<mtr_buf_t::block_t, void>> = {sentinel_ = {next = 0x7e6d68760d78, prev = 0x7e6d68760d78}}, size_ = 1}, m_size = 0, m_first_block = {<ilist_node<void>> = {next = 0x7e6d68760d58, prev = 0x7e6d68760d58}, m_data = "`\220E`,V\000\000pf\276\341\062V\000\000r\000\000\000\000\000\000\000\330J\324b,V\000\000\021\000\000\000\000\000\000\000\060\016vhm~\000\000\241\251\230_,V\000\000pf\276\341\062V\000\000\000\000\000\000\000\000\000\000n\000\000\000\022\000\000\000(o\250\267k~\000\000\310\065!\350j~\000\000\250\245\256\261k~\000\000\030\066[\324k~\000\000\212\001\213\001\214\001\215\001\060\016vhm~\000\000 o\250\267k~\000\000pf\276\341\062V\000\000\277O\230_,V\000\000 o\250\267k~\000\000pf\276\341\062V\000\000\300\016vhm~\000\000\021\355$_,V\000\000H\000\000\000\000\000\000\000\004\000\000\000\000\000\000\000"..., m_used = 0}}, m_user_space = 0x7e6bc4304238, m_commit_lsn = 0, m_freed_space = 0x0, m_freed_pages = 0x0}
|
auto_inc = <optimized out>
|
offsets_heap = 0x0
|
offsets_ = {300, 66, 32814, 4, 10, 16, 23, 27, 34, 36, 46, 48, 51, 53, 56, 60, 64, 67, 99, 107, 123, 127, 131, 134, 138, 140, 142, 174, 175, 176, 177, 178, 179, 187, 199, 207, 219, 222, 225, 228, 230, 234, 237, 241, 245, 251, 257, 265, 266, 274, 275, 276, 277, 278, 280, 281, 283, 285, 287, 290, 295, 296, 297, 298, 299, 305, 306, 834, 841, 424, 427, 437, 447, 450, 453, 463, 473, 476, 479, 489, 300, 3, 32774, 4, 10, 42, 531, 541, 551, 554, 557, 567, 577, 586, 595, 604, 613, 622, 631, 640, 649, 658, 667, 676, 685, 694, 703, 712, 721, 730, 739, 748, 300, 8, 32777, 4, 10, 16, 23, 24, 678, 679, 686, 856, 865, 874, 883, 892, 901, 910, 919, 928, 937, 946, 955, 964, 973, 982, 991, 1000, 1009, 1018, 1027, 1036, 1045, 1054, 1063, 1072, 1081, 1090, 1099, 1108, 1117, 1126, 1135, 1144, 1153, 1162, 1171, 1180, 1189, 1198, 1207, 1216, 1225, 1234, 1243, 1252, 1261, 1273, 1289, 1301, 1317, 1321, 1393, 1399, 1519, 1522, 1523, 1645, 1648, 1673, 1676, 1684, 1693, 1697, 1705, 1708, 1738, 1768, 1828, 1832, 1839, 0, 0, 0, 19672, 40316, 32363, 0...}
|
offsets = 0x7e6d68760650
|
#15 0x0000562c5f736149 in row_ins_clust_index_entry (index=index@entry=0x7e6b71ba93a0, entry=entry@entry=0x7e6bade5d690, thr=thr@entry=0x7e6b9efc8bc8, n_ext=n_ext@entry=0) at /usr/src/debug/MariaDB-/src_0/storage/innobase/row/row0ins.cc:3210
|
err = <optimized out>
|
n_uniq = 1
|
flags = 0
|
skip_locking = <optimized out>
|
orig_n_fields = 25
|
#16 0x0000562c5f73685c in row_ins_index_entry (thr=0x7e6b9efc8bc8, entry=0x7e6bade5d690, index=0x7e6b71ba93a0) at /usr/src/debug/MariaDB-/src_0/storage/innobase/row/row0ins.cc:3333
|
No locals.
|
#17 row_ins_index_entry_step (thr=0x7e6b9efc8bc8, node=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/row/row0ins.cc:3502
|
No locals.
|
#18 row_ins (thr=0x7e6b9efc8bc8, node=<optimized out>) at /usr/src/debug/MariaDB-/src_0/storage/innobase/row/row0ins.cc:3661
|
index = <optimized out>
|
type = <optimized out>
|
#19 row_ins_step (thr=thr@entry=0x7e6b9efc8bc8) at /usr/src/debug/MariaDB-/src_0/storage/innobase/row/row0ins.cc:3807
|
node = <optimized out>
|
err = <optimized out>
|
sel_node = <optimized out>
|
#20 0x0000562c5f74936b in row_insert_for_mysql (mysql_rec=<optimized out>, prebuilt=0x7e6b9efc7b70, ins_mode=ROW_INS_NORMAL) at /usr/src/debug/MariaDB-/src_0/storage/innobase/row/row0mysql.cc:1421
|
err = 1604526928
|
was_lock_wait = <optimized out>
|
node = 0x7e6b9efc8478
|
savept = {least_undo_no = 19}
|
trx = 0x7fefac7c6ff0
|
table = 0x7e6b701515e0
|
blob_heap = <optimized out>
|
#21 0x0000562c5f67b604 in ha_innobase::write_row (this=0x7e6b9d9b6080, record=0x7e6b9d23c388 "BONEW1PI339J8GLW", ' ' <repeats 80 times>, "\201\064\212\315TRM 0351040", ' ' <repeats 14 times>, "\r") at /usr/src/debug/MariaDB-/src_0/storage/innobase/handler/ha_innodb.cc:7637
|
No locals.
|
#22 0x0000562c5f36547f in handler::ha_write_row (this=0x7e6b9d9b6080, buf=buf@entry=0x7e6b9d23c388 "BONEW1PI339J8GLW", ' ' <repeats 80 times>, "\201\064\212\315TRM 0351040", ' ' <repeats 14 times>, "\r") at /usr/src/debug/MariaDB-/src_0/sql/handler.cc:7210
|
No locals.
|
#23 0x0000562c5f5ca318 in ha_partition::write_row (this=0x7e6b9eaaf900, buf=0x7e6b9d23c388 "BONEW1PI339J8GLW", ' ' <repeats 80 times>, "\201\064\212\315TRM 0351040", ' ' <repeats 14 times>, "\r") at /usr/src/debug/MariaDB-/src_0/sql/ha_partition.cc:4492
|
part_id = 431
|
func_value = 3541341631
|
sms = {thd = 0x7e6bb40009b8, old_mode = 524288}
|
saved_auto_inc_field_not_null = true
|
error = 0
|
have_auto_increment = <optimized out>
|
thd = 0x7e6bb40009b8
|
#24 0x0000562c5f36547f in handler::ha_write_row (this=0x7e6b9eaaf900, buf=0x7e6b9d23c388 "BONEW1PI339J8GLW", ' ' <repeats 80 times>, "\201\064\212\315TRM 0351040", ' ' <repeats 14 times>, "\r") at /usr/src/debug/MariaDB-/src_0/sql/handler.cc:7210
|
No locals.
|
#25 0x0000562c5f471705 in Rows_log_event::write_row (this=this@entry=0x7e6bef5c3848, rgi=rgi@entry=0x7e6a2f7fddc0, overwrite=<optimized out>) at /usr/src/debug/MariaDB-/src_0/sql/log_event_server.cc:7287
|
key = {m_ptr = 0x0}
|
table = 0x7e6b9e17d428
|
error = <optimized out>
|
keynum = <optimized out>
|
invoke_triggers = false
|
#26 0x0000562c5f471b6d in Write_rows_log_event::do_exec_row (this=0x7e6bef5c3848, rgi=0x7e6a2f7fddc0) at /usr/src/debug/MariaDB-/src_0/sql/log_event_server.cc:7507
|
tmp = 0x562c5fa27835 "Executing"
|
message = <optimized out>
|
error = <optimized out>
|
#27 0x0000562c5f466fb4 in Rows_log_event::do_apply_event (this=0x7e6bef5c3848, rgi=0x7e6a2f7fddc0) at /usr/src/debug/MariaDB-/src_0/sql/log_event_server.cc:5679
|
old_thd = 0x7e6bb40009b8
|
transactional_table = true
|
after_image = <optimized out>
|
saved_sql_mode = 1411383296
|
rli = 0x5632e1d8cb68
|
lex = <optimized out>
|
table = 0x7e6b9e17d428
|
error = <optimized out>
|
new_trg_event_map = <optimized out>
|
#28 0x0000562c5f09b36f in apply_event_and_update_pos_apply (ev=0x7e6bef5c3848, thd=0x7e6bb40009b8, rgi=0x7e6a2f7fddc0, reason=<optimized out>) at /usr/src/debug/MariaDB-/src_0/sql/log_event.h:1498
|
No locals.
|
#29 0x0000562c5f294f6e in rpt_handle_event (qev=qev@entry=0x7e6bedb60138, rpt=0x7e6bec009c60) at /usr/src/debug/MariaDB-/src_0/sql/rpl_parallel.cc:62
|
err = <optimized out>
|
rli = 0x5632e1d8cb68
|
rgi = 0x7e6a2f7fddc0
|
thd = 0x7e6bb40009b8
|
ev = 0x7e6bef5c3848
|
#30 0x0000562c5f29750b in handle_rpl_parallel_thread (arg=arg@entry=0x7e6bec009c60) at /usr/src/debug/MariaDB-/src_0/sql/rpl_parallel.cc:1363
|
event_type = WRITE_ROWS_EVENT_V1
|
rgi = 0x7e6a2f7fddc0
|
entry = 0x7e6bec01d318
|
end_of_group = <optimized out>
|
group_ending = 0
|
wait_count = <optimized out>
|
qev = <optimized out>
|
next_qev = 0x7e6bef53cdd8
|
in_event_group = true
|
group_rgi = 0x7e6a2f7fddc0
|
event_gtid_sub_id = 1752521350
|
sql_info = {cached_charset = "!\000!\000!", rpl_filter = 0x562c62d548b0}
|
err = <optimized out>
|
rpt = <optimized out>
|
group_standalone = false
|
thd = 0x7e6bb40009b8
|
old_stage = {m_key = 0, m_name = 0x562c5fa27a27 "After apply log event", m_flags = 1615069400}
|
events = <optimized out>
|
skip_event_group = false
|
gco = <optimized out>
|
#31 0x0000562c5f5cdc1d in pfs_spawn_thread (arg=0x7e6bec00b998) at /usr/src/debug/MariaDB-/src_0/storage/perfschema/pfs.cc:2201
|
typed_arg = 0x7e6bec00b998
|
user_arg = 0x7e6bec009c60
|
pfs = <optimized out>
|
user_start_routine = 0x562c5f296dd0 <handle_rpl_parallel_thread(void*)>
|
klass = <optimized out>
|
#32 0x00007fefb3275ea5 in start_thread () from /lib64/libpthread.so.0
|
No symbol table info available.
|
#33 0x00007fefb140d9fd in clone () from /lib64/libc.so.6
|
No symbol table info available.
|
valerii, can you also get the following output from that crash (which is related to enforcing the innodb_open_files limit)?
frame 6
|
print *this
|
print *space
|
Furthermore, in the output of thread apply all backtrace, do you see any pointers to either object in any other thread?
Neither the being-closed tablespace object nor the name of the partitioned table occur in any other thread.
The value of space->n_pending in the core dump is CLOSING+1, which would seem to satisfy the assertion expression space->is_ready_to_close().
I think that we should try to reproduce this with astress test run with a small value of innodb_open_files and large numbers of partitions, on a non-debug executable. Once we have something reproducible, it should be easier to figure out what is going on.
I am able to replay the assert on
|
origin/10.5 674842bee0a564a2d94e99c4e1319a716aa10aa9 2022-06-16T00:12:11+03:00
|
compiled without debug.
|
All attempts to replay with invoking 'rr' or using a build with debug had no luck.
|
|
RQG
|
===
|
# git clone https://github.com/mleich1/rqg --branch experimental RQG
|
#
|
# GIT_SHOW: HEAD -> experimental, origin/experimental a32979c6d789ddf0982e6a56af687565be00590c 2022-06-14T17:21:56+02:00
|
# rqg.pl : Version 4.0.6 (2022-05)
|
#
|
# $RQG_HOME/rqg.pl \
|
# --duration=300 \
|
# --queries=10000000 \
|
# --no_mask \
|
# --seed=random \
|
# --rpl_mode=none \
|
# --max_gd_duration=1200 \
|
# --engine=InnoDB \
|
# --mysqld=--file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
|
# --mysqld=--innodb-lock-wait-timeout=50 \
|
# --mysqld=--sync-binlog=1 \
|
# --mysqld=--net_read_timeout=30 \
|
# --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
|
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
|
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
|
# --mysqld=--slave_net_timeout=60 \
|
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
|
# --mysqld=--innodb_open_files=15 \
|
# --mysqld=--plugin-load-add=file_key_management.so \
|
# --mysqld=--wait_timeout=28800 \
|
# --mysqld=--interactive_timeout=28800 \
|
# --mysqld=--lock-wait-timeout=86400 \
|
# --mysqld=--log_bin_trust_function_creators=1 \
|
# --mysqld=--loose-idle_write_transaction_timeout=0 \
|
# --mysqld=--connect_timeout=60 \
|
# --mysqld=--loose-max-statement-time=30 \
|
# --mysqld=--loose-idle_transaction_timeout=0 \
|
# --mysqld=--innodb-buffer-pool-size=8M \
|
# --mysqld=--loose-innodb_read_only_compressed=OFF \
|
# --mysqld=--log-bin \
|
# --mysqld=--log-output=none \
|
# --mysqld=--loose-table_lock_wait_timeout=50 \
|
# --mysqld=--net_write_timeout=60 \
|
# --mysqld=--innodb_page_size=8K \
|
# --mysqld=--loose-plugin-load-add=provider_lz4.so \
|
# --reporters=Backtrace,Deadlock1,ErrorLog \
|
# --validators=None \
|
# --grammar=MDEV-26293.yy \
|
# --threads=7 \
|
# <local settings>
|
|
In average several hundred RQG runs are required for one replay.
|
|
The simplified RQG grammar MDEV-26293 shows
|
alter:
|
ALTER TABLE table_name_part alter_operation ;
|
|
alter_operation:
|
ADD PARTITION (PARTITION p3 VALUES LESS THAN MAXVALUE) |
|
ORDER BY `col_int_nokey` ;
|
|
create_part:
|
CREATE TABLE IF NOT EXISTS table_name_part (
|
`col_int_nokey` INTEGER,
|
`col_int_key` INTEGER NOT NULL,
|
KEY (`col_int_key`)
|
) ENGINE = INNODB ;
|
|
exec_sql:
|
alter |
|
alter |
|
alter |
|
alter |
|
alter |
|
alter |
|
create_part |
|
create_part |
|
create_part ;
|
|
query:
|
exec_sql ;
|
|
query_init:
|
;
|
|
table_name_part:
|
d |
|
f |
|
g |
|
h |
|
j |
|
k |
|
n |
|
q |
|
t |
|
y ;
|
|
thread_connect:
|
;
|
I was unable to reproduce this locally. A core dump that mleich provided illustrated the problem. I think that the scenario was as follows:
- fil_space_t::try_to_close() had called fil_space_t::set_closing(), which returned 0, meaning that no references exist and no NEEDS_FSYNC flag was set.
- Another thread executed fil_space_t::acquire_low(), which incremented the n_pending from CLOSING to CLOSING+1.
- The assertion failed due to the unexpected reference.
My suggested fix is to avoid incrementing the reference count if the CLOSING flag is set and the caller is not holding fil_system.mutex. Incrementing the reference is OK if the NEEDS_FSYNC flag is set. If the reference was not incremented due to the CLOSING flag, fil_space_t::acquire() must acquire fil_system.mutex and try incrementing the reference again. Because fil_space_t::try_to_close() is holding fil_system.mutex, that fix should prevent this race condition.
There is a mysqld.pdb or mariadbd.pdb depending on the version, that is distributed in all MariaDB packages by us.
schandol, it would be nice if you put that into your McAfee distribution, next to the executable, so that we do not always have to decipher the stacktrace