[MDEV-22423] mariabackup hangs on prepare Created: 2020-04-30 Updated: 2021-12-07 Resolved: 2020-05-12 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | mariabackup |
| Affects Version/s: | 10.3.22 |
| Fix Version/s: | 10.2.32 |
| Type: | Bug | Priority: | Major |
| Reporter: | Claudio Nanni | Assignee: | Vladislav Lesin |
| Resolution: | Not a Bug | Votes: | 2 |
| Labels: | None | ||
| Environment: |
Ubuntu 18.04 LTS |
||
| Attachments: |
|
| Description |
|
The prepare gets stuck here for hours: Suspecting: I asked to remove the --export, same result. We tried to collect info with 'gdb' and 'perf' and they don't return anything, in only one case gdb returned this snapshot (one single sample) attached as backtrace-txt.log One of the three threads apparently is in xtrabackup.cc:4284: /* Wait for threads to exit */ } So wondering if it's stuck in a sleep loop, not sure if it's compatible with the observations. |
| Comments |
| Comment by Marko Mäkelä [ 2020-04-30 ] | |||||||||||||||||||
|
claudio.nanni, are you sure that the backtrace-txt.log The mariabackup --prepare of a backup is close to invoking InnoDB crash recovery. I do not see any recovery here; only log-copying and data-file-copying threads that should not be run during --prepare. For debugging --prepare, I would suggest invoking a debug version, to get verbose output from the recovery code:
| |||||||||||||||||||
| Comment by Claudio Nanni [ 2020-04-30 ] | |||||||||||||||||||
|
marko yes it should be --prepare mariabackup --prepare --dbug=d,ib_log --use-memory=4G --target-dir=/mybackup/backup/
And hangs. | |||||||||||||||||||
| Comment by Vladislav Lesin [ 2020-05-12 ] | |||||||||||||||||||
|
According to strace log sent by the customer, bariabackup hangs after the following calls:
It turned out, the target backup dir located on NFS, and advisory locks can hang on NFS volumes. There are several workarounds, but the hanging was not caused by mariabackup code, that is why the issue is closed. | |||||||||||||||||||
| Comment by Brendan P [ 2021-12-07 ] | |||||||||||||||||||
|
Unfortunately I can consistently repeat this bug on a good galera SST dump, 6 times out of 10, on a very large data set approx 16TB on a 1TB system with --use-memory of 800G, all stored on a 24TB fast ssd raid0 array. Mariabackup either hangs with no cpu waiting on a thread or hangs with 2 or more cores like this: {{[pid 18785] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable) No errors in the log, all is well until this point on crash recovery phase (being an SST): Each attempt can freeze at a random point, never the same amount of pages to recover, and randomly it just works, passes, member rejoins the cluster like this isn't ever an issue. example of a hung mariabackup: I will rerun with --dbug=d,ib_log during log apply phase to verify what can be found, it used to be this rarely would happen, but now it is far more regular of an occurrence. I've seen other reports of this happening but no real followups. I figured at first it was a ram limitation related matter but it can happen earlier on in the first batch read, before it even consumes a large amount of memory. ├─mysqld─┬─sh───wsrep_sst_maria─┬─logger ] ] Thanks in advance, Bren |