Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL)
Description
On Buildbot, we have a few cases of assertion failures in mariabackup --prepare, because buf_pool.n_pend_reads is not zero. mleich was able to reproduce the error under rr record, and I analyzed a case which is reported in MDEV-24448.
In the analyzed trace, we are recovering 4 page reads. The read callback for 3 of the pages was finished (with no log to be applied, because the pages were up to date), but for the last page, the read callback was still in progress. Yet, srv_start() would concurrently initiate shutdown. We failed to wait for all asynchronous reads to complete. This omission could cause the failure to apply log to some data pages.
While this error started to occur on Buildbot after we replaced the InnoDB mutexes with native ones (MDEV-21452), I think that it is possible in 10.5, and probably in older versions as well, even though the code was heavily refactored in 10.5, starting with MDEV-16264. The earliest known failure for the 10.6 main branch was as follows:
10.6 cf2480dd77a45cf56389015a1eabf567 |
mariabackup.xb_fulltext_encrypted w2 [ fail ]
|
Test ended at 2020-12-16 04:19:07
|
|
CURRENT_TEST: mariabackup.xb_fulltext_encrypted
|
mysqltest: At line 19: exec of '/mnt/buildbot/build/mariadb-10.6.0/extra/mariabackup/mariabackup --innobackupex --apply-log /mnt/buildbot/build/mariadb-10.6.0/mysql-test/var/2/tmp/backup 2>&1' failed, error: 34304, status: 134, errno: 11
|
…
|
2020-12-16 4:19:05 0 [Note] InnoDB: Starting final batch to recover 7 pages from redo log.
|
mariabackup: /home/buildbot/buildbot/build/mariadb-10.6.0/storage/innobase/srv/srv0start.cc:1582: dberr_t srv_start(bool): Assertion `!buf_pool.any_io_pending()' failed.
|
I think that there are two causes to this bug in 10.5 and 10.6:
- io_callback() was invoking read_slots->release() before invoking the callback function (for write callbacks the existing order is OK)
- recv_sys_t::apply() is missing a call to read_slots->wait() before flushing the buffer pool. The recv_sys.apply(true) was called by srv_start(), so this should fix the observed race condition (
MDEV-9663).
Attachments
Issue Links
- relates to
-
MDEV-22373 Unable to find a record to delete-mark ends up crashing mysqld process after upgrading from 10.1.43 to 10.4
-
- Closed
-
-
MDEV-25198 mariadb backup fails with signal 6 and signal 11 during prepare
-
- Closed
-
-
MDEV-25344 Two hosts crashed at the same time with: Failing assertion: !cursor->index->is_committed()
-
- Closed
-
-
MDEV-27734 Set innodb_change_buffering=none by default
-
- Closed
-
-
MDEV-29694 Remove the InnoDB change buffer
-
- Closed
-
-
MDEV-9663 InnoDB assertion failure: *cursor->index->name == TEMP_INDEX_PREFIX, or !cursor->index->is_committed()
-
- Closed
-
-
MDEV-14643 InnoDB: Failing assertion: !cursor->index->is_committed()
-
- Closed
-
-
MDEV-15608 Crash during transaction rollback when using optimistic parallel replication, few threads, non-durable configuration.
-
- Closed
-
-
MDEV-16264 Implement a common work queue for InnoDB background tasks
-
- Closed
-
-
MDEV-20934 Infinite loop on innodb_fast_shutdown=0 with inconsistent change buffer
-
- Closed
-
-
MDEV-21452 Use condition variables and normal mutexes instead of InnoDB os_event and mutex
-
- Closed
-
-
MDEV-21811 MariaDB not starting
-
- Closed
-
-
MDEV-23192 Crash in row_search_mvcc() probably related to secondary index corruption
-
- Closed
-
-
MDEV-24141 Failing assertion
-
- Closed
-
-
MDEV-24448 mariabackup: storage/innobase/srv/srv0start.cc:1582: dberr_t srv_start(bool): Assertion `!buf_pool.any_io_pending()' failed.
-
- Closed
-
-
MDEV-24709 Assertion !recv_no_ibuf_operations failed in ibuf_page_low()
-
- Closed
-
-
MDEV-30422 Merge new release of InnoDB 5.7.41 to 10.3
-
- Closed
-
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue relates to |
Link |
This issue relates to |
Component/s | Storage Engine - InnoDB [ 10129 ] | |
Summary | mariabackup --prepare may fails to apply some log | Crash recovery or mariabackup --prepare may fail to apply some log |
Link |
This issue is duplicated by |
Labels | corruption rr-profile-analyzed | rr-profile-analyzed |
Status | Open [ 1 ] | In Progress [ 3 ] |
Link |
This issue relates to |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Affects Version/s | 5.5 [ 15800 ] | |
Affects Version/s | 10.1 [ 16100 ] | |
Affects Version/s | 10.2 [ 14601 ] | |
Affects Version/s | 10.3 [ 22126 ] | |
Affects Version/s | 10.4 [ 22408 ] | |
Affects Version/s | 10.0 [ 16000 ] | |
Affects Version/s | 10.5 [ 23123 ] | |
Affects Version/s | 10.6 [ 24028 ] | |
Assignee | Marko Mäkelä [ marko ] | Thirunarayanan Balathandayuthapani [ thiru ] |
Description |
Note: I did not analyze this in older versions than 10.5. On Buildbot, we have a few cases of assertion failures in {{mariabackup --prepare}}, because {{buf_pool.n_pend_reads}} is not zero. [~mleich] was able to reproduce the error under {{rr record}}, and I analyzed a case.
In the analyzed trace, we are recovering 4 page reads. The read callback for 3 of the pages was finished (with no log to be applied, because the pages were up to date), but for the last page, the read callback was still in progress. Yet, {{srv_start()}} would concurrently initiate shutdown. We failed to wait for all asynchronous reads to complete. This omission could cause the failure to apply log to some data pages. While this error started to occur on Buildbot after we replaced the InnoDB mutexes with native ones ( {noformat:title=10.6 cf2480dd77a45cf56389015a1eabf567} mariabackup.xb_fulltext_encrypted w2 [ fail ] Test ended at 2020-12-16 04:19:07 CURRENT_TEST: mariabackup.xb_fulltext_encrypted mysqltest: At line 19: exec of '/mnt/buildbot/build/mariadb-10.6.0/extra/mariabackup/mariabackup --innobackupex --apply-log /mnt/buildbot/build/mariadb-10.6.0/mysql-test/var/2/tmp/backup 2>&1' failed, error: 34304, status: 134, errno: 11 … 2020-12-16 4:19:05 0 [Note] InnoDB: Starting final batch to recover 7 pages from redo log. mariabackup: /home/buildbot/buildbot/build/mariadb-10.6.0/storage/innobase/srv/srv0start.cc:1582: dberr_t srv_start(bool): Assertion `!buf_pool.any_io_pending()' failed. {noformat} I think that there are two causes to this bug in 10.5 and 10.6: * {{io_callback()}} was invoking {{read_slots->release()}} before invoking the callback function (for write callbacks the existing order is OK) * {{recv_sys_t::apply()}} is missing a call to {{read_slots->wait()}} before flushing the buffer pool. The {{recv_sys.apply(true)}} was called by {{srv_start()}}, so this should fix the observed race condition. |
On Buildbot, we have a few cases of assertion failures in {{mariabackup --prepare}}, because {{buf_pool.n_pend_reads}} is not zero. [~mleich] was able to reproduce the error under {{rr record}}, and I analyzed a case which is reported in In the analyzed trace, we are recovering 4 page reads. The read callback for 3 of the pages was finished (with no log to be applied, because the pages were up to date), but for the last page, the read callback was still in progress. Yet, {{srv_start()}} would concurrently initiate shutdown. We failed to wait for all asynchronous reads to complete. This omission could cause the failure to apply log to some data pages. While this error started to occur on Buildbot after we replaced the InnoDB mutexes with native ones ( {noformat:title=10.6 cf2480dd77a45cf56389015a1eabf567} mariabackup.xb_fulltext_encrypted w2 [ fail ] Test ended at 2020-12-16 04:19:07 CURRENT_TEST: mariabackup.xb_fulltext_encrypted mysqltest: At line 19: exec of '/mnt/buildbot/build/mariadb-10.6.0/extra/mariabackup/mariabackup --innobackupex --apply-log /mnt/buildbot/build/mariadb-10.6.0/mysql-test/var/2/tmp/backup 2>&1' failed, error: 34304, status: 134, errno: 11 … 2020-12-16 4:19:05 0 [Note] InnoDB: Starting final batch to recover 7 pages from redo log. mariabackup: /home/buildbot/buildbot/build/mariadb-10.6.0/storage/innobase/srv/srv0start.cc:1582: dberr_t srv_start(bool): Assertion `!buf_pool.any_io_pending()' failed. {noformat} I think that there are two causes to this bug in 10.5 and 10.6: * {{io_callback()}} was invoking {{read_slots->release()}} before invoking the callback function (for write callbacks the existing order is OK) * {{recv_sys_t::apply()}} is missing a call to {{read_slots->wait()}} before flushing the buffer pool. The {{recv_sys.apply(true)}} was called by {{srv_start()}}, so this should fix the observed race condition ( |
Labels | rr-profile-analyzed | corruption not-10.5 |
Summary | Crash recovery or mariabackup --prepare may fail to apply some log | Potential corruption of last recovered page |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue is duplicated by |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Assignee | Thirunarayanan Balathandayuthapani [ thiru ] | Marko Mäkelä [ marko ] |
Link |
This issue relates to |
Summary | Potential corruption of last recovered page | Corruption of system tablespace or last recovered page |
issue.field.resolutiondate | 2020-12-28 10:36:40.0 | 2020-12-28 10:36:40.204 |
Fix Version/s | 10.2.37 [ 25112 ] | |
Fix Version/s | 10.3.28 [ 25111 ] | |
Fix Version/s | 10.4.18 [ 25110 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Resolution | Fixed [ 1 ] | |
Status | Stalled [ 10000 ] | Closed [ 6 ] |
Link |
This issue relates to |
Link |
This issue relates to |
Labels | corruption not-10.5 | CS0180575 corruption not-10.5 |
Link | This issue blocks TODO-2696 [ TODO-2696 ] |
Link |
This issue relates to |
Labels | CS0180575 corruption not-10.5 | CS0156770 CS0180575 corruption not-10.5 |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Link | This issue relates to MENT-1004 [ MENT-1004 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31117 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 31165 ] |
Labels | CS0156770 CS0180575 corruption not-10.5 | CS0156770 CS0180575 ServiceNow corruption not-10.5 |
Labels | CS0156770 CS0180575 ServiceNow corruption not-10.5 | 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z CS0156770 CS0180575 corruption not-10.5 |
Labels | 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z CS0156770 CS0180575 corruption not-10.5 | 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z corruption not-10.5 |
Labels | 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z corruption not-10.5 | corruption not-10.5 |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 32331 ] |
Workflow | MariaDB v3 [ 117227 ] | MariaDB v4 [ 158711 ] |
Link |
This issue relates to |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 33285 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 34202 ] |
Link |
This issue relates to |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 34687 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 34706 ] |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 34706 ] |
Link |
This issue relates to |
Remote Link | This issue links to "Page (MariaDB Confluence)" [ 35966 ] |
Zendesk Related Tickets | 201658 171635 | |
Zendesk active tickets | 201658 |