[MDEV-21100] [ERROR] [FATAL] InnoDB: Unknown error code 19: Required history data has been deleted Created: 2019-11-20 Updated: 2023-11-30 Resolved: 2023-04-11 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, Storage Engine - InnoDB |
| Affects Version/s: | 10.3.10 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Sebastian Halfar | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Won't Fix | Votes: | 1 |
| Labels: | None | ||
| Environment: |
Ubuntu 18.04 with 3 nodes Galera-Cluster |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Description |
|
The Cluster survived the first crash, but we would like to know what the stacktrace tells you guys. Hope i have all you need for a hint. Could upgrade but first i would like to know if its version or code failure. Thanks in advance. |
| Comments |
| Comment by Elena Stepanova [ 2019-11-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
From the error log:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2019-11-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
As far as I can tell, the error code DB_MISSING_HISTORY should be impossible. The condition purge_sys.view.changes_visible() should not hold for record versions that are visible in any active read view. I do not remember seeing this error with anything else than Galera. The previous occurrence could have been about 2 years ago. Also back then, we did not get anything reproducible. I do not know exactly how the Galera library interacts with InnoDB, so I cannot exclude the possibility that Galera is breaking some assumptions around the purge view. With Galera, data nodes can be cloned to each other. Could a snapshot state transfer have gone wrong in the past? How was this node initialized? If there has been an in-place upgrade of existing data files to 10.3.10, were the files originally created with something older than 10.3? If yes, was a slow shutdown (innodb_fast_shutdown=0) performed prior to the upgrade? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sebastian Halfar [ 2019-11-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
As far as I know, the creation of the files was not on 10.3 its should be an older version. Dont think any snapshot state transfer has gone wrong and we initialized the node with mysql start, which joined the node to the cluster. I dont think we did slow shutdown in the past. But i will ask my colleague about the slow shutdown and the version of mariadb when he startet the DB. Thanks for helping. The cluster is running since 01.10.2018 without issues. On 26.11.19 12:43, Marko Mäkelä (Jira) wrote: Marko Mäkelä reassigned Assignee: Jan Lindström (was: Marko Mäkelä) As far as I can tell, the error code DB_MISSING_HISTORY should be impossible. The condition purge_sys.view.changes_visible() should not hold for record versions that are visible in any active read view. I do not remember seeing this error with anything else than Galera. The previous occurrence could have been about 2 years ago. Also back then, we did not get anything reproducible. I do not know exactly how the Galera library interacts with InnoDB, so I cannot exclude the possibility that Galera is breaking some assumptions around the purge view. With Galera, data nodes can be cloned to each other. Could a snapshot state transfer have gone wrong in the past? How was this node initialized? If there has been an in-place upgrade of existing data files to 10.3.10, were the files originally created with something older than 10.3? If yes, was a slow shutdown (innodb_fast_shutdown=0) performed prior to the upgrade? [ERROR] [FATAL] InnoDB: Unknown error code 19: Required history data has been deleted Key: Attachments: galera3, maxscale2.2.21 The Cluster survived the first crash, but we would like to know what the stacktrace tells you guys. Hope i have all you need for a hint. – | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2019-12-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you still repeat this? If yes can you provide some instructions how to repeat. Also please provide cluster configuration and full unedited error logs. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sebastian Halfar [ 2019-12-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello Jan, so far it didn't happened again and it never happened before. The logs are nearly unedited, just removed IP and timestamp. If there has been an in-place upgrade of existing data files to 10.3.10, were the files originally created with something older than 10.3? > yes If yes, was a slow shutdown (innodb_fast_shutdown=0) performed prior to the upgrade? > no | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ovidiu Stanila [ 2019-12-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello Jan, We've recently hit this issue on MariaDB 10.3.21 (see attached crash error log) mariadb-10.3.21-crash.txt We usually keep MariaDB server up to date, upgrading to the latest version in a week or two after a new version is released. None of the upgrades were done with innodb_fast_shutdown=0 . We're running on an up-to-date system with CentOS 7.7.1908 , kernel: 3.10.0-1062.9.1.el7.x86_64 Just let us know if you need any additional info regarding latest crash. We don't have much info to go about regarding previous crash (10.3.18), besides the attached log file. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sebastian Halfar [ 2020-01-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello Jan, this issue hit us again. We are still on MariaDB 10.3.10 galera_2-crash.txt Kind regards | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-01-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Seb0, I hope that you are not executing any ALTER TABLE…ADD COLUMN without , FORCE, because I had one more thought: Due to | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sebastian Halfar [ 2020-01-17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Marko Mäkelä, do you mean did we ever used "ALTER TABLE…ADD COLUMN without , FORCE", or at the moment of the crash? I will asked backend and will come back to you. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ovidiu Stanila [ 2020-05-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Same issue, same system, same configuration, same behaviour, on MariaDB 10.3.22 Any idea what could cause these random crashes? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ovidiu Stanila [ 2020-05-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hit this issue again, same behaviour. mariadb-10.3.23-crash.txt
We never encountered this problem more than once after an upgrade, might this be related to something left over from the upgrade? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ovidiu Stanila [ 2020-06-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Encountered this issue again mariadb-10.3.23-crash-2.txt | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ovidiu Stanila [ 2020-07-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hit this again, twice in a row for today, 2 hours apart: mariadb_10.3.23_crash_2020-07-20_02.txt | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ovidiu Stanila [ 2020-07-31 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
One more occurrence of this issue: mariadb_10.3.23_crash_2020-07-31.txt | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ovidiu Stanila [ 2020-12-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We're still experiencing this issue, from time to time, most recently after upgrading to latest version in 10.3 branch (10.3.27) mariadb_10.3.27_crash_2020-12-10.txt | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ovidiu Stanila [ 2021-02-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello @Marko , we're still hitting this, randomly from time to time. No idea what could cause this, didn't find a specific pattern, except it's always referencing the same table (there are many concurrent queries done on that one all the time, read and write). Here's the latest crash report: mariadb_10.3.27_crash_2020-02-01.txt We plan on moving to 10.4 and are waiting for the next release to come out, but I fear we wont escape this even then. Any ideas on what could trigger this? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-06-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
ovidiu.stanila, sorry, I missed your updates, and accidentally found this ticket while searching for similar issues before filing I think that we would need a somewhat repeatable test case in order to fix this. There have been locking bugs in Galera. It is difficult to tell whether In MariaDB Server 10.4 or later, galera-4 is being used instead of galera-3. As part of that change, many things were cleaned up. Therefore it could make sense to try a newer major version. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-06-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Seb0, in my note about | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ovidiu Stanila [ 2021-06-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello @Marko , for us I don't think this is related to Galera, we don't use that on the server where we've noticed the problem, but at that time I didn't find any other reference to this kind of error except in this ticket and thought the two might be related somehow. Thanks for checking, we'll upgrade to 10.4.20 when that's available and make a note if we happen to hit this problem again, but so far that workflow workaround managed to avoid it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ovidiu Stanila [ 2022-07-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello @Jan, it's been over a year since we updated to MariaDB 10.4.20+ , now on MariaDB 10.4.25, and the issue is still there. We randomly hit it every couple of months or so, no clear pattern as to why this happens. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-07-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
ovidiu.stanila, if you used the default wsrep_sst_method=rsync, I believe that it failed to block all concurrent writes unless you happened to run the donor with innodb_use_native_aio=0. This was fixed in A fundamental downside of physical replication (in this context, anything else than wsrep_sst_method=mysqldump) is that it will copy also any form of corruption. Especially if you are making snapshots from any node in the cluster, say, A→B→C→A, you may end up propagating corruption to the entire cluster, including the originally healthy node A in my example. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ovidiu Stanila [ 2022-07-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
@Marko, that's the thing, we don't use Galera on these servers. It's the "main" master (the other one is just a hot copy used in emergency cases) in a classic master-master setup. But this was the only bug report that had the same error as the one we were experiencing. I think for us it's mostly something related to the InnoDB engine, indexing. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2022-07-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I would suspect that something bad has happened to the data files in the past. It could be something that broke the crash recovery (such as, an error made when copying data files while a server was running, or someone removed ib_logfile* or used innodb_force_recovery=6 to ‘fix’ a recovery error), or an upgrade to 10.3 from very old data files, before I do not think that we see this error in our internal stress testing. By design, InnoDB should never purge any history of committed transactions that may still be visible to old read views. ovidiu.stanila, if your scenario does not involve Galera (and its state transfers), then it would be better to file a separate ticket about it, especially if you have a reproducible test case that is based on SQL statements and not starting a server on some previously written data files. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ovidiu Stanila [ 2022-07-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The problem started showing up after upgrading from 10.2.25 to 10.3.16 which went smoothly and we didn't have any issues. All updates were done incrementally from one version to the other (10.2 -> 10.3 -> 10.4) , same for minor versions (10.2.23 -> 10.2.24 -> 10.2.25). We didn't use innodb_force_recovery on this instance as it was fairly stable thought it's history, except for this issue which doesn't affect us that much because the recovery is fast. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-04-11 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
A possible explanation of this bug could be | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
MDEV-32115 is a potential data corruption bug in wsrep_sst_method=rsync that appears to affect the 10.4 release series only. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-11-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here are some more possible causes of this:
|