During the restore step the customer was getting "Restore failed on 10.2.27 with signal 6 and 11"
Oct 9 08:19:43 db166020 kernel: [5710622.181729] mariabackup[143845]: segfault at 0 ip 0000561e04a2dc88 sp 00007f2b853d97f0 error 6 in mariabackup[561e041da000+913000]
Below is the stack trace from the mariabackup output:
2021-10-098:19:42139824895743744 [ERROR] [FATAL] InnoDB: is_short 0, info_and_status_bits 0, offset 10140, o_offset 9, mismatch index 18446744073709551594, end_seg_len 31 pars
ed len 3
2110098:19:42 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.2.27-MariaDB
key_buffer_size=0
read_buffer_size=131072
max_used_connections=0
max_threads=1
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5419 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /glide/mysqld/customecert_3400_peta/temp/restore_2021-10-08_1820332/customer_3400_db170011_s_2021-10-08_0612271
Resource Limits:
Fatal signal 11while backtracing
The customer managed to capture thread dumps and the mariabakcup log, which I have attached to this ticket.
Unfortunately, there was no cored dump capture during said event.
Furthermore, the customer was able to avoid the mariabackup crash during the restore by increasing the previously sizeable "--use-memory" of 32GB to 256GB. However they did not see the following log:
[Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 5129 OS file reads, 0 OS file writes, 0 OS fsyncs.
Before MDEV-19586, MDEV-21351, and MDEV-26784 and possibly other changes in MariaDB Server 10.5, recovery as well as mariadb-backup --prepare may run out of memory. The intended behaviour is to reserve at most â…“ of the buffer pool for buffered log records.
Porting MDEV-21351 from 10.5 to earlier versions is highly nontrivial.
Marko Mäkelä
added a comment - Before MDEV-19586 , MDEV-21351 , and MDEV-26784 and possibly other changes in MariaDB Server 10.5, recovery as well as mariadb-backup --prepare may run out of memory. The intended behaviour is to reserve at most â…“ of the buffer pool for buffered log records.
Porting MDEV-21351 from 10.5 to earlier versions is highly nontrivial.
The mariadb-backup parameter --use-memory is equivalent to the innodb_buffer_pool_size of the server during recovery. In the implementation of log-based recovery, up to one third of that memory will be used for buffering log records, and the rest for copies of data pages.
Before 10.5, it was actually quite a bit more complicated than that. First of all, a single log record could be split into multiple parts if it was longer than RECV_DATA_BLOCK_SIZE. In the MDEV-12353 format, such splitting is not necessary, and MDEV-21351 simplified the memory management during recovery.
I am afraid that before 10.5, it is possible (but unlikely) that recovery may run out of memory even when using the same size of buffer pool that the server was using previously. Preparing a backup is similar to executing crash recovery. The reason for this is that the enforcement of the one-third rule is somewhat inaccurate and scattered in the code. Accurate implementation of the memory limit was possible in 10.5 thanks to the data structure simplification and the rewrite of the recovery logic.
Starting with 10.5, the expectation is that recovery will work even if the buffer pool size is significantly reduced, or when using an innodb_log_file_size that is several times innodb_buffer_pool_size. The latest fix MDEV-26784 was a glitch that might affect only extremely small buffer pools (something like 256×innodb_page_size or less).
The recommendation is to prepare backups with the same version of the backup tool that was used for creating a backup. Personally, I would recommend always preparing every backup, to ensure the validity of the backup. The preparation should best be run on a separate system, to avoid disturbing the database server with extra I/O and memory usage.
Due to the log file format change of MDEV-12353, mariadb-backup version 10.5 or later will refuse to prepare a backup that was created with an older version, just like the server will refuse to ‘crash-upgrade’ from an earlier version than 10.5.
Marko Mäkelä
added a comment - The mariadb-backup parameter --use-memory is equivalent to the innodb_buffer_pool_size of the server during recovery. In the implementation of log-based recovery, up to one third of that memory will be used for buffering log records, and the rest for copies of data pages.
Before 10.5, it was actually quite a bit more complicated than that. First of all, a single log record could be split into multiple parts if it was longer than RECV_DATA_BLOCK_SIZE . In the MDEV-12353 format, such splitting is not necessary, and MDEV-21351 simplified the memory management during recovery.
I am afraid that before 10.5, it is possible (but unlikely) that recovery may run out of memory even when using the same size of buffer pool that the server was using previously. Preparing a backup is similar to executing crash recovery. The reason for this is that the enforcement of the one-third rule is somewhat inaccurate and scattered in the code. Accurate implementation of the memory limit was possible in 10.5 thanks to the data structure simplification and the rewrite of the recovery logic.
Starting with 10.5, the expectation is that recovery will work even if the buffer pool size is significantly reduced, or when using an innodb_log_file_size that is several times innodb_buffer_pool_size . The latest fix MDEV-26784 was a glitch that might affect only extremely small buffer pools (something like 256× innodb_page_size or less).
The recommendation is to prepare backups with the same version of the backup tool that was used for creating a backup. Personally, I would recommend always preparing every backup, to ensure the validity of the backup. The preparation should best be run on a separate system, to avoid disturbing the database server with extra I/O and memory usage.
Due to the log file format change of MDEV-12353 , mariadb-backup version 10.5 or later will refuse to prepare a backup that was created with an older version, just like the server will refuse to ‘crash-upgrade’ from an earlier version than 10.5.
There were no serious mariabackup related tests for MariaDB version < 10.5 within the last months.
The dataset used for RQG tests invoking mariabackup is rather small.
Matthias Leich
added a comment -
There were no serious mariabackup related tests for MariaDB version < 10.5 within the last months.
The dataset used for RQG tests invoking mariabackup is rather small.
Geoff Montee (Inactive)
added a comment - Details and recommendations about memory usage while preparing a backup have been added to the following documentation pages:
https://mariadb.com/docs/recovery/mariadb-enterprise-backup/#preparing-a-full-backup-for-recovery
https://mariadb.com/docs/reference/mdb/cli/mariadb-backup/use-memory/
People
Geoff Montee (Inactive)
Scott Sommerville (Inactive)
Votes:
1Vote for this issue
Watchers:
9Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":1352.5,"ttfb":482.9000005722046,"pageVisibility":"visible","entityId":106159,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":32,"apdex":0.5,"journeyId":"a9ed31b9-0b9b-4208-837d-3c09a5372e89","navigationType":0,"readyForUser":1532.9000005722046,"redirectCount":0,"resourceLoadedEnd":1120.3000001907349,"resourceLoadedStart":492.80000019073486,"resourceTiming":[{"duration":232.39999961853027,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":492.80000019073486,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":492.80000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":725.1999998092651,"responseStart":0,"secureConnectionStart":0},{"duration":232.80000019073486,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/2bf333562ca6724060a9d5f1535471f6/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":493.1000003814697,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":493.1000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":725.9000005722046,"responseStart":0,"secureConnectionStart":0},{"duration":381.30000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":493.30000019073486,"connectEnd":493.30000019073486,"connectStart":493.30000019073486,"domainLookupEnd":493.30000019073486,"domainLookupStart":493.30000019073486,"fetchStart":493.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":730.6000003814697,"responseEnd":874.6000003814697,"responseStart":747.1000003814697,"secureConnectionStart":493.30000019073486},{"duration":520,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/2bf333562ca6724060a9d5f1535471f6/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true","startTime":493.5,"connectEnd":493.5,"connectStart":493.5,"domainLookupEnd":493.5,"domainLookupStart":493.5,"fetchStart":493.5,"redirectEnd":0,"redirectStart":0,"requestStart":730.5,"responseEnd":1013.5,"responseStart":781,"secureConnectionStart":493.5},{"duration":263.8999996185303,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":493.6000003814697,"connectEnd":493.6000003814697,"connectStart":493.6000003814697,"domainLookupEnd":493.6000003814697,"domainLookupStart":493.6000003814697,"fetchStart":493.6000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":730.6999998092651,"responseEnd":757.5,"responseStart":756.3000001907349,"secureConnectionStart":493.6000003814697},{"duration":265.69999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":493.80000019073486,"connectEnd":493.80000019073486,"connectStart":493.80000019073486,"domainLookupEnd":493.80000019073486,"domainLookupStart":493.80000019073486,"fetchStart":493.80000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":732.4000005722046,"responseEnd":759.5,"responseStart":758.1999998092651,"secureConnectionStart":493.80000019073486},{"duration":265.30000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":494.1000003814697,"connectEnd":494.1000003814697,"connectStart":494.1000003814697,"domainLookupEnd":494.1000003814697,"domainLookupStart":494.1000003814697,"fetchStart":494.1000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":732.5,"responseEnd":759.4000005722046,"responseStart":757.8000001907349,"secureConnectionStart":494.1000003814697},{"duration":237.10000038146973,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":494.19999980926514,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":494.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":731.3000001907349,"responseStart":0,"secureConnectionStart":0},{"duration":276.19999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":494.4000005722046,"connectEnd":494.4000005722046,"connectStart":494.4000005722046,"domainLookupEnd":494.4000005722046,"domainLookupStart":494.4000005722046,"fetchStart":494.4000005722046,"redirectEnd":0,"redirectStart":0,"requestStart":738.4000005722046,"responseEnd":770.6000003814697,"responseStart":769.3000001907349,"secureConnectionStart":494.4000005722046},{"duration":238.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":494.6000003814697,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":494.6000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":733.1000003814697,"responseStart":0,"secureConnectionStart":0},{"duration":275.70000076293945,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":494.69999980926514,"connectEnd":494.69999980926514,"connectStart":494.69999980926514,"domainLookupEnd":494.69999980926514,"domainLookupStart":494.69999980926514,"fetchStart":494.69999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":738.6999998092651,"responseEnd":770.4000005722046,"responseStart":768.6999998092651,"secureConnectionStart":494.69999980926514},{"duration":506.9000005722046,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":496.19999980926514,"connectEnd":496.19999980926514,"connectStart":496.19999980926514,"domainLookupEnd":496.19999980926514,"domainLookupStart":496.19999980926514,"fetchStart":496.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":942.8000001907349,"responseEnd":1003.1000003814697,"responseStart":1001.6000003814697,"secureConnectionStart":496.19999980926514},{"duration":624,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":496.30000019073486,"connectEnd":496.30000019073486,"connectStart":496.30000019073486,"domainLookupEnd":496.30000019073486,"domainLookupStart":496.30000019073486,"fetchStart":496.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":1105.1000003814697,"responseEnd":1120.3000001907349,"responseStart":1119.1000003814697,"secureConnectionStart":496.30000019073486},{"duration":185,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":1021.5,"connectEnd":1021.5,"connectStart":1021.5,"domainLookupEnd":1021.5,"domainLookupStart":1021.5,"fetchStart":1021.5,"redirectEnd":0,"redirectStart":0,"requestStart":1169.1000003814697,"responseEnd":1206.5,"responseStart":1206,"secureConnectionStart":1021.5}],"fetchStart":0,"domainLookupStart":245,"domainLookupEnd":254,"connectStart":254,"connectEnd":273,"secureConnectionStart":262,"requestStart":274,"responseStart":483,"responseEnd":485,"domLoading":490,"domInteractive":1637,"domContentLoadedEventStart":1637,"domContentLoadedEventEnd":1689,"domComplete":2697,"loadEventStart":2697,"loadEventEnd":2697,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1607.1999998092651},{"name":"bigPipe.sidebar-id.end","time":1613},{"name":"bigPipe.activity-panel-pipe-id.start","time":1613.1999998092651},{"name":"bigPipe.activity-panel-pipe-id.end","time":1615.5},{"name":"activityTabFullyLoaded","time":1709.8000001907349}],"measures":[],"correlationId":"ca7dc05261d718","effectiveType":"4g","downlink":9.2,"rtt":0,"serverDuration":127,"dbReadsTimeInMs":16,"dbConnsTimeInMs":25,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
Before
MDEV-19586,MDEV-21351, andMDEV-26784and possibly other changes in MariaDB Server 10.5, recovery as well as mariadb-backup --prepare may run out of memory. The intended behaviour is to reserve at most â…“ of the buffer pool for buffered log records.Porting
MDEV-21351from 10.5 to earlier versions is highly nontrivial.