Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Duplicate
-
10.11.3, 10.6.12
-
None
Description
After upgrading to ubuntu 22.04, and alongside that to mariadb 10.6, we started experiencing table corruption during our nightly restores. Because these restores are to staging servers loss of data was acceptable, so we fixed the problem by simply removing the corrupted table. But this obviously won't fly should we ever have to do a production restore (Which is very rarely).
The database portion of the restores specifically are done using mydumper with the default amount of 4 threads.
What makes the problem particularly nasty is that it cannot be consistently reproduced, we ourselves noticed it seemed to happen in roughly 1 out of every 7 restores (Where each restore takes about ~40-50 minutes).
This made us think it was related to parallelism. So we tried running mydumper single-threaded, which did not solve the problem.
We have also tried upgrading to various versions of mariadb/mydumper, most notably:
mariadb | mydumper |
10.6.12 | 0.15 |
10.11.3 | 0.10 |
10.11.3 | 0.15 |
But with all of the above version combinations the problem still occurred
Eventually we found that we could no longer reproduce the problem while running mariadb 10.5.21 with mydumper 0.10, but we are still unsure of the underlying cause.
Provided files:
The included table structure is just one table of our db, as we were able to reproduce it by only restoring backups of this table.
Because it is quite time consuming to reproduce and we would have to generate a large amount of dummy data (Existing dumps all contain customer data), we have not included a database dump for now.
But we still wanted to create this bug report just in case you might already see something strange based on the included table structure and error log.
In case there is no immediate apparent problem and you still want to look further into this, we would of course be happy to provide a database dump.
Update:
After running on 10.6.15 I was again able to reproduce it. I generated a stacktrace form the core dump (mariadbd_full_bt_all_threads.txt)
Attachments
Issue Links
- duplicates
-
MDEV-34453 Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1
-
- Closed
-
- relates to
-
MDEV-16281 Implement parallel CREATE INDEX, ALTER TABLE, or bulk load
-
- Open
-
-
MDEV-31441 BLOB corruption on UPDATE of PRIMARY KEY with FOREIGN KEY
-
- Closed
-
-
MDEV-31817 SIGSEGV after btr_page_get_father_block() returns nullptr on corrupted data
-
- Closed
-
-
MDEV-35413 InnoDB: Cannot load compressed BLOB (ROW_FORMAT=COMPRESSED table)
-
- Closed
-
-
MDEV-30882 Crash on ROLLBACK of DELETE or UPDATE in a ROW_FORMAT=COMPRESSED table
-
- Closed
-
-
MDEV-35779 Index Corruption on Database Restore
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Description |
After upgrading to ubuntu 22.04, and alongside that to mariadb 10.6, we started experiencing table corruption during our nightly restores. Because these restores are to staging servers loss of data was acceptable, so we fixed the problem by simply removing the corrupted table. But this obviously won't fly should we ever have to do a production restore (Which is very rarely).
The database portion of the restores specifically are done using mydumper with the default amount of 4 threads. What makes the problem particularly nasty is that it cannot be consistently reproduced, we ourselves noticed it seemed to happen in roughly 1 out of every 7 restores (Where each restore takes about ~40-50 minutes). This made us think it was related to parallelism. So we tried running mydumper single-threaded, which did not solve the problem. We have also tried upgrading to various versions of mariadb/mydumper, most notably: | mariadb | mydumper | |10.6.12 | 0.15 | |11.1 | 0.10 | |11.1 | 0.15| But with all of the above version combinations the problem still occurred Eventually we found that we could no longer reproduce the problem while running mariadb 10.5.21 with mydumper 0.10, but we are still unsure of the underlying cause. Provided files: The included table structure is just one table of our db, as we were able to reproduce it by only restoring backups of this table. Because it is quite time consuming to reproduce and we would have to generate a large amount of dummy data (Existing dumps all contain customer data), we have not included a database dump for now. But we still wanted to create this bug report just in case you might already see something strange based on the included table structure and error log. In case there is no immediate apparent problem and you still want to look further into this, we would of course be happy to provide a database dump. |
After upgrading to ubuntu 22.04, and alongside that to mariadb 10.6, we started experiencing table corruption during our nightly restores. Because these restores are to staging servers loss of data was acceptable, so we fixed the problem by simply removing the corrupted table. But this obviously won't fly should we ever have to do a production restore (Which is very rarely).
The database portion of the restores specifically are done using mydumper with the default amount of 4 threads. What makes the problem particularly nasty is that it cannot be consistently reproduced, we ourselves noticed it seemed to happen in roughly 1 out of every 7 restores (Where each restore takes about ~40-50 minutes). This made us think it was related to parallelism. So we tried running mydumper single-threaded, which did not solve the problem. We have also tried upgrading to various versions of mariadb/mydumper, most notably: | mariadb | mydumper | |10.6.12 | 0.15 | |10.11.3 | 0.10 | |10.11.3 | 0.15| But with all of the above version combinations the problem still occurred Eventually we found that we could no longer reproduce the problem while running mariadb 10.5.21 with mydumper 0.10, but we are still unsure of the underlying cause. Provided files: The included table structure is just one table of our db, as we were able to reproduce it by only restoring backups of this table. Because it is quite time consuming to reproduce and we would have to generate a large amount of dummy data (Existing dumps all contain customer data), we have not included a database dump for now. But we still wanted to create this bug report just in case you might already see something strange based on the included table structure and error log. In case there is no immediate apparent problem and you still want to look further into this, we would of course be happy to provide a database dump. |
Description |
After upgrading to ubuntu 22.04, and alongside that to mariadb 10.6, we started experiencing table corruption during our nightly restores. Because these restores are to staging servers loss of data was acceptable, so we fixed the problem by simply removing the corrupted table. But this obviously won't fly should we ever have to do a production restore (Which is very rarely).
The database portion of the restores specifically are done using mydumper with the default amount of 4 threads. What makes the problem particularly nasty is that it cannot be consistently reproduced, we ourselves noticed it seemed to happen in roughly 1 out of every 7 restores (Where each restore takes about ~40-50 minutes). This made us think it was related to parallelism. So we tried running mydumper single-threaded, which did not solve the problem. We have also tried upgrading to various versions of mariadb/mydumper, most notably: | mariadb | mydumper | |10.6.12 | 0.15 | |10.11.3 | 0.10 | |10.11.3 | 0.15| But with all of the above version combinations the problem still occurred Eventually we found that we could no longer reproduce the problem while running mariadb 10.5.21 with mydumper 0.10, but we are still unsure of the underlying cause. Provided files: The included table structure is just one table of our db, as we were able to reproduce it by only restoring backups of this table. Because it is quite time consuming to reproduce and we would have to generate a large amount of dummy data (Existing dumps all contain customer data), we have not included a database dump for now. But we still wanted to create this bug report just in case you might already see something strange based on the included table structure and error log. In case there is no immediate apparent problem and you still want to look further into this, we would of course be happy to provide a database dump. |
After upgrading to ubuntu 22.04, and alongside that to mariadb 10.6, we started experiencing table corruption during our nightly restores. Because these restores are to staging servers loss of data was acceptable, so we fixed the problem by simply removing the corrupted table. But this obviously won't fly should we ever have to do a production restore (Which is very rarely).
The database portion of the restores specifically are done using mydumper with the default amount of 4 threads. What makes the problem particularly nasty is that it cannot be consistently reproduced, we ourselves noticed it seemed to happen in roughly 1 out of every 7 restores (Where each restore takes about ~40-50 minutes). This made us think it was related to parallelism. So we tried running mydumper single-threaded, which did not solve the problem. We have also tried upgrading to various versions of mariadb/mydumper, most notably: | mariadb | mydumper | |10.6.12 | 0.15 | |10.11.3 | 0.10 | |10.11.3 | 0.15| But with all of the above version combinations the problem still occurred Eventually we found that we could no longer reproduce the problem while running mariadb 10.5.21 with mydumper 0.10, but we are still unsure of the underlying cause. Provided files: The included table structure is just one table of our db, as we were able to reproduce it by only restoring backups of this table. Because it is quite time consuming to reproduce and we would have to generate a large amount of dummy data (Existing dumps all contain customer data), we have not included a database dump for now. But we still wanted to create this bug report just in case you might already see something strange based on the included table structure and error log. In case there is no immediate apparent problem and you still want to look further into this, we would of course be happy to provide a database dump. Update: After running on 10.6.15 I was again able to reproduce it. I generated a stacktrace form the core dump ( |
Description |
After upgrading to ubuntu 22.04, and alongside that to mariadb 10.6, we started experiencing table corruption during our nightly restores. Because these restores are to staging servers loss of data was acceptable, so we fixed the problem by simply removing the corrupted table. But this obviously won't fly should we ever have to do a production restore (Which is very rarely).
The database portion of the restores specifically are done using mydumper with the default amount of 4 threads. What makes the problem particularly nasty is that it cannot be consistently reproduced, we ourselves noticed it seemed to happen in roughly 1 out of every 7 restores (Where each restore takes about ~40-50 minutes). This made us think it was related to parallelism. So we tried running mydumper single-threaded, which did not solve the problem. We have also tried upgrading to various versions of mariadb/mydumper, most notably: | mariadb | mydumper | |10.6.12 | 0.15 | |10.11.3 | 0.10 | |10.11.3 | 0.15| But with all of the above version combinations the problem still occurred Eventually we found that we could no longer reproduce the problem while running mariadb 10.5.21 with mydumper 0.10, but we are still unsure of the underlying cause. Provided files: The included table structure is just one table of our db, as we were able to reproduce it by only restoring backups of this table. Because it is quite time consuming to reproduce and we would have to generate a large amount of dummy data (Existing dumps all contain customer data), we have not included a database dump for now. But we still wanted to create this bug report just in case you might already see something strange based on the included table structure and error log. In case there is no immediate apparent problem and you still want to look further into this, we would of course be happy to provide a database dump. Update: After running on 10.6.15 I was again able to reproduce it. I generated a stacktrace form the core dump ( |
After upgrading to ubuntu 22.04, and alongside that to mariadb 10.6, we started experiencing table corruption during our nightly restores. Because these restores are to staging servers loss of data was acceptable, so we fixed the problem by simply removing the corrupted table. But this obviously won't fly should we ever have to do a production restore (Which is very rarely).
The database portion of the restores specifically are done using mydumper with the default amount of 4 threads. What makes the problem particularly nasty is that it cannot be consistently reproduced, we ourselves noticed it seemed to happen in roughly 1 out of every 7 restores (Where each restore takes about ~40-50 minutes). This made us think it was related to parallelism. So we tried running mydumper single-threaded, which did not solve the problem. We have also tried upgrading to various versions of mariadb/mydumper, most notably: | mariadb | mydumper | |10.6.12 | 0.15 | |10.11.3 | 0.10 | |10.11.3 | 0.15| But with all of the above version combinations the problem still occurred Eventually we found that we could no longer reproduce the problem while running mariadb 10.5.21 with mydumper 0.10, but we are still unsure of the underlying cause. Provided files: The included table structure is just one table of our db, as we were able to reproduce it by only restoring backups of this table. Because it is quite time consuming to reproduce and we would have to generate a large amount of dummy data (Existing dumps all contain customer data), we have not included a database dump for now. But we still wanted to create this bug report just in case you might already see something strange based on the included table structure and error log. In case there is no immediate apparent problem and you still want to look further into this, we would of course be happy to provide a database dump. Update: After running on 10.6.15 I was again able to reproduce it. I generated a stacktrace form the core dump (mariadbd_full_bt_all_threads.txt) |
Attachment | mariadbd_full_bt_all_threads.txt [ 72008 ] |
Attachment | syslog-restore-10.6.15.txt [ 72011 ] |
Attachment | mariadbd_full_bt_all_threads.txt [ 72008 ] |
Attachment | mariadbd_full_bt_all_threads.txt [ 72012 ] |
Attachment | mariadbd_full_bt_all_threads.txt [ 72012 ] |
Attachment | mariadbd_full_bt_all_threads.txt [ 72013 ] |
Link |
This issue relates to |
Fix Version/s | 10.6 [ 24028 ] |
Component/s | Storage Engine - InnoDB [ 10129 ] |
Assignee | Marko Mäkelä [ marko ] |
Link | This issue relates to MDEV-16281 [ MDEV-16281 ] |
Link |
This issue relates to |
Status | Open [ 1 ] | Needs Feedback [ 10501 ] |
Status | Needs Feedback [ 10501 ] | Open [ 1 ] |
Status | Open [ 1 ] | Confirmed [ 10101 ] |
Link |
This issue relates to |
Component/s | Backup [ 13902 ] | |
Summary | Table corruption during database restore | ROW_FORMAT=COMPRESSED table corruption due to ROLLBACK |
Attachment | MDEV-32174_ps.txt [ 72878 ] |
Assignee | Marko Mäkelä [ marko ] | Roel Van de Paar [ roel ] |
Status | Confirmed [ 10101 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | In Testing [ 10301 ] |
Link |
This issue relates to |
Link |
This issue relates to |
Assignee | Roel Van de Paar [ roel ] | Marko Mäkelä [ marko ] |
Status | In Testing [ 10301 ] | Stalled [ 10000 ] |
Status | Stalled [ 10000 ] | Needs Feedback [ 10501 ] |
Comment |
[ Had an interesting run this morning with this. This time I used:
{code:sql} threads=8000 # Number of concurrent threads queries=100 # Number of t1/t2 INSERTs per thread/per test round {code} And here is what I saw: {noformat:title= /test/ Count: 100 Count: 200 Count: 300 Count: 400 Count: 500 ...normal count operation continues... Count: 2600 Count: 2700 Count: 2800 Count: 2900 Count: 3000 -------------- INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID())) -------------- ERROR 1062 (23000) at line 1: Duplicate entry '2198-2025-02-10 06:38:40' for key 'PRIMARY' -------------- INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID())) -------------- ERROR 1062 (23000) at line 1: Duplicate entry '1931-2025-02-10 06:38:40' for key 'PRIMARY' -------------- INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID())) -------------- ...additional similar duplicate entries... ERROR 1062 (23000) at line 1: Duplicate entry '2781-2025-02-10 06:38:40' for key 'PRIMARY' -------------- INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID())) -------------- ERROR 1062 (23000) at line 1: Duplicate entry '3123-2025-02-10 06:38:40' for key 'PRIMARY' -------------- INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID())) -------------- ERROR 1062 (23000) at line 1: Duplicate entry '3179-2025-02-10 06:38:40' for key 'PRIMARY' Count: 3100 Count: 3200 Count: 3300 Count: 3400 Count: 3500 ...normal count operation continues... Count: 4600 Count: 4700 Count: 4800 Count: 4900 Count: 5000 -------------- INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID())) -------------- ERROR 1062 (23000) at line 1: Duplicate entry '858-2025-02-10 06:39:03' for key 'PRIMARY' Count: 5100 Count: 5200 Count: 5300 Count: 5400 Count: 5500 Count: 5600 Count: 5700 Count: 5800 Count: 5900 Count: 6000 ./script1.sh: line 23: 2193819 Killed ${client} -A --skip-ssl-verify-server-cert --skip-ssl --force --binary-mode -u ${user} -S ${socket} -D ${db} -e "${SQL}" ./script1.sh: line 23: 2193829 Killed ${client} -A --skip-ssl-verify-server-cert --skip-ssl --force --binary-mode -u ${user} -S ${socket} -D ${db} -e "${SQL}" ...many similar such messages... ./script1.sh: line 23: 2194480 Killed ${client} -A --skip-ssl-verify-server-cert --skip-ssl --force --binary-mode -u ${user} -S ${socket} -D ${db} -e "${SQL}" ./script1.sh: line 23: 2194481 Killed ${client} -A --skip-ssl-verify-server-cert --skip-ssl --force --binary-mode -u ${user} -S ${socket} -D ${db} -e "${SQL}" Count: 6100 Count: 6200 Count: 6300 Count: 6400 Count: 6500 ...script continues... {noformat} Firstly it seems interesting that the single '858-2025-02-10 06:39:03' duplicate entry occurence happened quite offset from the earlier batch of duplicate entries, again suggesting some sort of (highly-sporadic) race condition bug, as we were discussing. Immediately after this (let's say around count 5500), I used CHECK TABLE and the table was fine: {noformat:title= 10.6.21>CHECK TABLE t2 EXTENDED; +---------+-------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------+-------+----------+----------+ | test.t2 | check | status | OK | +---------+-------+----------+----------+ 1 row in set (0.037 sec) {noformat} However, a little later the client kills happened, likely as a result of automated server resource monitoring we use. Immediately after this, {{CHECK TABLE}} reported a warning: {noformat:title= 10.6.21>CHECK TABLE t2 EXTENDED; +---------+-------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Op | Msg_type | Msg_text | +---------+-------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | test.t2 | check | Warning | InnoDB: Clustered index record with stale history in table `test`.`t2`: COMPACT RECORD(info_bits=0, 5 fields): {[4] (0x80000002),[5] i (0x99B5D469D5),[6] \ (0x000000015CBF),[7] T(0xD000000C0D0554),[8] (0x0000000000000000)} | | test.t2 | check | status | OK | +---------+-------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 2 rows in set (0.655 sec) {noformat} And checking the error log we see it's right in admist the killed connections: {noformat:title= ...more aborted connections before... 2025-02-10 6:39:32 5829 [Warning] Aborted connection 5829 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets) 2025-02-10 6:39:32 5832 [Warning] Aborted connection 5832 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets) 2025-02-10 6:39:33 4052 [Warning] InnoDB: Clustered index record with stale history in table `test`.`t2`: COMPACT RECORD(info_bits=0, 5 fields): {[4] (0x80000002),[5] i (0x99B5D469D5),[6] \ (0x000000015CBF),[7] T(0xD000000C0D0554),[8] (0x0000000000000000)} 2025-02-10 6:39:33 5848 [Warning] Aborted connection 5848 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets) 2025-02-10 6:39:33 5912 [Warning] Aborted connection 5912 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets) 2025-02-10 6:39:34 5924 [Warning] Aborted connection 5924 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets) ...more aborted connections after... {noformat} ] |
Status | Needs Feedback [ 10501 ] | Open [ 1 ] |
Link |
This issue duplicates |
Fix Version/s | 11.8.0 [ 29960 ] | |
Fix Version/s | 11.7.1 [ 29913 ] | |
Fix Version/s | 11.6.2 [ 29908 ] | |
Fix Version/s | 11.4.4 [ 29907 ] | |
Fix Version/s | 11.2.6 [ 29906 ] | |
Fix Version/s | 10.11.10 [ 29904 ] | |
Fix Version/s | 10.6.20 [ 29903 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Resolution | Duplicate [ 3 ] | |
Status | Open [ 1 ] | Closed [ 6 ] |
Hi bjhilbrands,
To give us a little more detail:
sudo apt-get install mariadb-server-core-dbg
And using abrt-retrace and starting a gdb session run thread apply all bt -frame-arguments all full.
If you are able to test with 10.6.15 that would be much appreciated. repo config.
or even the 10.11 though an earlier version will tell us where to fix it more reliably.