[MDEV-20481] innodb.blob-crash crashes during recovery - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5
Fix Version/s: 10.3.28, 10.4.18, 10.5.9, 10.6.0
Component/s: Storage Engine - InnoDB, Tests
Labels:
None
Environment:
The description is from a crash with bb-10.3-monty, a merge of 10.2 to 10.3
The reported crash is on Windows 64bits

Description

innodb.blob-crash '16k,debug,innodb' w3 [ fail ]
Test ended at 2019-09-03 15:35:37

CURRENT_TEST: innodb.blob-crash
...
2019-09-03 15:35:30 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=9301498
Assertion failed: recv_sys->mlog_checkpoint_lsn <= recv_sys->recovered_lsn, file D:\winx64-debug\build\src\storage\innobase\log\log0recv.cc, line 2838
190903 15:35:30 [ERROR] mysqld got exception 0x80000003 ;

mysqld.exe!my_sigabrt_handler()[my_thr_init.c:485]
mysqld.exe!raise()[signal.cpp:516]
mysqld.exe!abort()[abort.cpp:71]
mysqld.exe!common_assert_to_stderr<wchar_t>()[assert.cpp:149]
mysqld.exe!_wassert()[assert.cpp:404]
mysqld.exe!recv_parse_log_recs()[log0recv.cc:2837]
mysqld.exe!recv_scan_log_recs()[log0recv.cc:3309]
mysqld.exe!recv_group_scan_log_recs()[log0recv.cc:3403]
mysqld.exe!recv_recovery_from_checkpoint_start()[log0recv.cc:3705]
mysqld.exe!srv_start()[srv0start.cc:1973]
mysqld.exe!innodb_init()[ha_innodb.cc:4262]
mysqld.exe!ha_initialize_handlerton()[handler.cc:549]
mysqld.exe!plugin_initialize()[sql_plugin.cc:1433]

There are 5 failures in buildbot cross reference about other failures with blob.crash in 10.4 and 10.5. The other crashes comes from this line
mysqld: /home/buildbot/buildbot/build/mariadb-10.4.8/storage/innobase/fil/fil0fil.cc:4567: static ulint Check::validate(const fil_space_t*): Assertion `space->id != TRX_SYS_SPACE || space == fil_system.sys_space' failed.

Attachments

Issue Links

relates to

MDEV-26733 assert on shutdown lock->lock_word == X_LOCK_DECR in test

Open

MDEV-13830 Assertion failed: recv_sys->mlog_checkpoint_lsn <= recv_sys->recovered_lsn

Closed

MDEV-19298 Assertion `space->id != 0xFFFFFFFEU || space == fil_system.temp_space' failed in Check::validate upon crash upgrade from 10.2.22

Closed

MDEV-20159 Assertion `space->id != TRX_SYS_SPACE || space == fil_system.sys_space' failed in buildbot

Closed

MDEV-21843 innodb_encryption-page-compression crashes with "Assertion `first || buf_pool_get_dirty_pages_count(i d) == 0'

Closed

Activity

Ascending order - Click to sort in descending order

View 7 older comments

Alice Sherepa added a comment - 2020-10-14 10:09

http://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/21312/steps/test/logs/stdio

innodb.blob-crash '4k,debug,innodb'      w3 [ fail ]

        Test ended at 2020-10-09 10:15:45

201009 10:15:43 [ERROR] mysqld got exception 0x80000003 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.5.7-MariaDB-debug-log

key_buffer_size=1048576

read_buffer_size=131072

max_used_connections=0

max_threads=65537

thread_count=0

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4635 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

server.dll!my_sigabrt_handler()[my_thr_init.c:465]

ucrtbase.dll!raise()

ucrtbase.dll!abort()

server.dll!ib::fatal::~fatal()[ut0ut.cc:580]

server.dll!buf_pool_t::assert_all_freed()[buf0buf.cc:4398]

server.dll!buf_pool_invalidate()[buf0buf.cc:4438]

server.dll!recv_sys_t::apply()[log0recv.cc:2782]

server.dll!recv_group_scan_log_recs()[log0recv.cc:3140]

server.dll!recv_recovery_from_checkpoint_start()[log0recv.cc:3590]

server.dll!srv_start()[srv0start.cc:1531]

server.dll!innodb_init()[ha_innodb.cc:3899]

server.dll!ha_initialize_handlerton()[handler.cc:645]

server.dll!plugin_initialize()[sql_plugin.cc:1459]

server.dll!plugin_init()[sql_plugin.cc:1751]

server.dll!init_server_components()[mysqld.cc:4913]

server.dll!win_main()[mysqld.cc:5496]

server.dll!mysql_service()[mysqld.cc:5744]

server.dll!mysqld_main()[mysqld.cc:5962]

mariadbd.exe!main()[main.cc:26]

mariadbd.exe!__scrt_common_main_seh()[exe_common.inl:288]

KERNEL32.DLL!BaseThreadInitThunk()

ntdll.dll!RtlUserThreadStart()

The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains

information that should help you find out what is causing the crash.

Writing a core file at D:\winx64-debug\build\mysql-test\var\3\mysqld.1\data\

Alice Sherepa added a comment - 2020-10-14 10:09 http://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/21312/steps/test/logs/stdio innodb.blob-crash '4k,debug,innodb' w3 [ fail ] Test ended at 2020-10-09 10:15:45 201009 10:15:43 [ERROR] mysqld got exception 0x80000003 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.5.7-MariaDB-debug-log key_buffer_size=1048576 read_buffer_size=131072 max_used_connections=0 max_threads=65537 thread_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4635 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... server.dll!my_sigabrt_handler()[my_thr_init.c:465] ucrtbase.dll!raise() ucrtbase.dll!abort() server.dll!ib::fatal::~fatal()[ut0ut.cc:580] server.dll!buf_pool_t::assert_all_freed()[buf0buf.cc:4398] server.dll!buf_pool_invalidate()[buf0buf.cc:4438] server.dll!recv_sys_t::apply()[log0recv.cc:2782] server.dll!recv_group_scan_log_recs()[log0recv.cc:3140] server.dll!recv_recovery_from_checkpoint_start()[log0recv.cc:3590] server.dll!srv_start()[srv0start.cc:1531] server.dll!innodb_init()[ha_innodb.cc:3899] server.dll!ha_initialize_handlerton()[handler.cc:645] server.dll!plugin_initialize()[sql_plugin.cc:1459] server.dll!plugin_init()[sql_plugin.cc:1751] server.dll!init_server_components()[mysqld.cc:4913] server.dll!win_main()[mysqld.cc:5496] server.dll!mysql_service()[mysqld.cc:5744] server.dll!mysqld_main()[mysqld.cc:5962] mariadbd.exe!main()[main.cc:26] mariadbd.exe!__scrt_common_main_seh()[exe_common.inl:288] KERNEL32.DLL!BaseThreadInitThunk() ntdll.dll!RtlUserThreadStart() The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash. Writing a core file at D:\winx64-debug\build\mysql-test\var\3\mysqld.1\data\

Marko Mäkelä added a comment - 2023-01-17 14:22

The last failure of innodb.blob-crash for 10.5 with "still fixed or dirty" occured in 2020. That seems to be exactly the failure that alice posted above:

10.5 a891fe6aa5ede1bf1f4551cfa75cf856
innodb.blob-crash '4k,debug,innodb' w3 [ fail ]
…
2020-10-09 10:15:42 0 [Note] InnoDB: Starting a batch to recover 338 pages from redo log.
2020-10-09 10:15:42 0 [Note] InnoDB: Starting a batch to recover 317 pages from redo log.
2020-10-09 10:15:43 0 [Note] InnoDB: Starting a batch to recover 317 pages from redo log.
2020-10-09 10:15:43 0 [ERROR] [FATAL] InnoDB: Page [page id: space=5, page number=2] still fixed or dirty

~~MDEV-23399~~ and ~~MDEV-23855~~ were pushed to the 10.5 branch soon after this. They could have fixed this trouble for this particular test.

I remember seeing some possibly related failure in 10.5, even recently. I think it was this one:

void

rw_lock_free_func(

/*==============*/

	rw_lock_t*	lock)	/*!< in/out: rw-lock */

	ut_ad(rw_lock_validate(lock));

	ut_a(lock->lock_word == X_LOCK_DECR);

In ~~MDEV-24142~~ (10.6), that code was replaced with debug assertions in srw_lock::destroy() and sux_lock::free(), but I do not remember seeing those assertions failing.

Marko Mäkelä added a comment - 2023-01-17 14:22 The last failure of innodb.blob-crash for 10.5 with "still fixed or dirty" occured in 2020. That seems to be exactly the failure that alice posted above: 10.5 a891fe6aa5ede1bf1f4551cfa75cf856 innodb.blob-crash '4k,debug,innodb' w3 [ fail ] … 2020-10-09 10:15:42 0 [Note] InnoDB: Starting a batch to recover 338 pages from redo log. 2020-10-09 10:15:42 0 [Note] InnoDB: Starting a batch to recover 317 pages from redo log. 2020-10-09 10:15:43 0 [Note] InnoDB: Starting a batch to recover 317 pages from redo log. 2020-10-09 10:15:43 0 [ERROR] [FATAL] InnoDB: Page [page id: space=5, page number=2] still fixed or dirty MDEV-23399 and MDEV-23855 were pushed to the 10.5 branch soon after this. They could have fixed this trouble for this particular test. I remember seeing some possibly related failure in 10.5, even recently. I think it was this one: void rw_lock_free_func( /*==============*/ rw_lock_t* lock) /*!< in/out: rw-lock */ { ut_ad(rw_lock_validate(lock)); ut_a(lock->lock_word == X_LOCK_DECR); In MDEV-24142 (10.6), that code was replaced with debug assertions in srw_lock::destroy() and sux_lock::free() , but I do not remember seeing those assertions failing.

Marko Mäkelä added a comment - 2023-01-17 14:56

The related failure that I am referring to is MDEV-26733, seemingly specific to 10.5 and the test innodb.undo_truncate.

Marko Mäkelä added a comment - 2023-01-17 14:56 The related failure that I am referring to is MDEV-26733 , seemingly specific to 10.5 and the test innodb.undo_truncate .

Marko Mäkelä added a comment - 2023-01-17 15:19

Because I had some trouble with the Buildbot cross-reference, elenst provided to me a list of some failures for the failing still fixed or dirty check on shutdown. I accidentally asked her to search anything after 2019-10-09, instead of 2020-10-09, which was the last failure returned also by her search.

Failures are rather rare: none in 2020-01, 2020-02, 2020-11, 2020-12, and otherwise it was failing about once per month in some main branch. Here are some of other latest failures:

10.3 fbef428645e01113521824c2eb0a42fe
CURRENT_TEST: innodb_gis.types
…
2020-04-03 8:18:10 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=31648455
2020-04-03 8:18:11 0 [Note] InnoDB: Starting a batch to recover 307 pages from redo log.
2020-04-03 8:18:11 0 [Note] InnoDB: Starting a batch to recover 293 pages from redo log.
2020-04-03 8:18:11 0 [ERROR] [FATAL] InnoDB: Page [page id: space=0, page number=511] still fixed or dirty

10.5 9ae015878f11be3e3033fd1b35357ea5
CURRENT_TEST: innodb.blob-crash
…
2020-03-10 15:37:09 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47877
2020-03-10 15:37:09 0 [Note] InnoDB: Starting a batch to recover 110 pages from redo log.
2020-03-10 15:37:09 0 [Note] InnoDB: Starting a batch to recover 89 pages from redo log.
2020-03-10 15:37:09 0 [ERROR] [FATAL] InnoDB: Page [page id: space=5, page number=2] still fixed or dirty

10.5 90b7ac28a917fde7bceed573956aa4c6
CURRENT_TEST: innodb.blob-crash
…
2020-03-17 16:43:28 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=7956119
2020-03-17 16:43:28 0 [Note] InnoDB: Starting a batch to recover 168 pages from redo log.
2020-03-17 16:43:28 0 [Note] InnoDB: Starting a batch to recover 170 pages from redo log.
2020-03-17 16:43:28 0 [Note] InnoDB: Starting a batch to recover 170 pages from redo log.
2020-03-17 16:43:29 0 [ERROR] [FATAL] InnoDB: Page [page id: space=5, page number=0] still fixed or dirty

~~MDEV-24449~~ was fixed in 10.2 on December 28, and in 10.5 on December 21. The test innodb.blob-crash does define a secondary index on the table (t3), so the change buffer might be used for it. Likewise, the test innodb_gis.types, which often runs out of memory on recovery (~~MDEV-22512~~), defines a secondary index on the table t_wl6455. I think that it is plausible that this bug was fixed by ~~MDEV-24449~~.

Marko Mäkelä added a comment - 2023-01-17 15:19 Because I had some trouble with the Buildbot cross-reference, elenst provided to me a list of some failures for the failing still fixed or dirty check on shutdown. I accidentally asked her to search anything after 2019-10-09, instead of 2020-10-09, which was the last failure returned also by her search. Failures are rather rare: none in 2020-01, 2020-02, 2020-11, 2020-12, and otherwise it was failing about once per month in some main branch. Here are some of other latest failures: 10.3 fbef428645e01113521824c2eb0a42fe CURRENT_TEST: innodb_gis.types … 2020-04-03 8:18:10 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=31648455 2020-04-03 8:18:11 0 [Note] InnoDB: Starting a batch to recover 307 pages from redo log. 2020-04-03 8:18:11 0 [Note] InnoDB: Starting a batch to recover 293 pages from redo log. 2020-04-03 8:18:11 0 [ERROR] [FATAL] InnoDB: Page [page id: space=0, page number=511] still fixed or dirty 10.5 9ae015878f11be3e3033fd1b35357ea5 CURRENT_TEST: innodb.blob-crash … 2020-03-10 15:37:09 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=47877 2020-03-10 15:37:09 0 [Note] InnoDB: Starting a batch to recover 110 pages from redo log. 2020-03-10 15:37:09 0 [Note] InnoDB: Starting a batch to recover 89 pages from redo log. 2020-03-10 15:37:09 0 [ERROR] [FATAL] InnoDB: Page [page id: space=5, page number=2] still fixed or dirty 10.5 90b7ac28a917fde7bceed573956aa4c6 CURRENT_TEST: innodb.blob-crash … 2020-03-17 16:43:28 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=7956119 2020-03-17 16:43:28 0 [Note] InnoDB: Starting a batch to recover 168 pages from redo log. 2020-03-17 16:43:28 0 [Note] InnoDB: Starting a batch to recover 170 pages from redo log. 2020-03-17 16:43:28 0 [Note] InnoDB: Starting a batch to recover 170 pages from redo log. 2020-03-17 16:43:29 0 [ERROR] [FATAL] InnoDB: Page [page id: space=5, page number=0] still fixed or dirty MDEV-24449 was fixed in 10.2 on December 28, and in 10.5 on December 21. The test innodb.blob-crash does define a secondary index on the table ( t3 ), so the change buffer might be used for it. Likewise, the test innodb_gis.types , which often runs out of memory on recovery ( MDEV-22512 ), defines a secondary index on the table t_wl6455 . I think that it is plausible that this bug was fixed by MDEV-24449 .

Marko Mäkelä added a comment - 2023-01-17 15:27

I believe that this was fixed by ~~MDEV-24449~~ for 10.3 and 10.4, and by ~~MDEV-24448~~ for 10.5 and 10.6.

Marko Mäkelä added a comment - 2023-01-17 15:27 I believe that this was fixed by MDEV-24449 for 10.3 and 10.4, and by MDEV-24448 for 10.5 and 10.6.

MariaDB Server

innodb.blob-crash crashes during recovery

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration