[MDEV-28859] MariaDB Assert Crash Using mysqlbackup Created: 2022-06-15  Updated: 2023-10-23  Resolved: 2023-10-23

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.8.3
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Lee Thompson Assignee: Daniel Black
Resolution: Incomplete Votes: 0
Labels: None
Environment:

Running in Docker


Issue Links:
Relates
relates to MDEV-27593 Crashing on I/O error is unhelpful Closed
relates to MDEV-28857 innodb assertion cb->m_err == DB_SUCC... Closed

 Description   

My main MariaDB container has started crashing, sometimes, during nightly backups. It does not always crash which makes me think it's not a data issue. When it does crash, it is not at the same point.

The backup script (bash) has not changed in 2 years and this just started recently, I'm not sure why. After this crash occurs the server responds with "Too many connections".

The part of the script running the backup issues this command:

sudo docker exec $CONTAINER_NAME mysqldump --user=$MARIADB_USER --lock-tables --all-databases > $BACKUP_TARGET_HOST

MariaDB Log (minus the bug reporting advice)

2022-06-15 13:49:12 0x7f956a7fc640  InnoDB: Assertion failure in file ./storage/innobase/os/os0file.cc line 3540
InnoDB: Failing assertion: cb->m_err == DB_SUCCESS
220615 13:49:12 [ERROR] mysqld got signal 6 
Server version: 10.8.3-MariaDB-1:10.8.3+maria~jammy
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=3
max_threads=153
thread_count=3
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467997 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
mariadbd(my_print_stacktrace+0x32)[0x7f95a4905212]
mariadbd(handle_fatal_signal+0x478)[0x7f95a43da7e8]



 Comments   
Comment by Marko Mäkelä [ 2022-06-16 ]

The crash is at the start of the following function:

static void io_callback(tpool::aiocb *cb)
{
  ut_a(cb->m_err == DB_SUCCESS);

Can you please try to find out the value of cb->m_err? danblack should be able to assist you with enabling and analyzing core dumps in a Docker environment.

Which Linux kernel version are you using? It could play a role here.

Comment by Marko Mäkelä [ 2022-06-16 ]

A work-around would be to set innodb_use_native_aio=0 in the configuration.

If the name "jammy" refers to Ubuntu 22.04, I think that the native AIO implementation should be liburing (MDEV-24883) and not libaio.

Comment by Lee Thompson [ 2022-06-16 ]

I have no idea how to find out the cb->m_err value. I'll need step by step instructions.

The container is running (uname -a)

Linux 3.10.105 #25426 SMP Wed Jul 8 03:19:33 CST 2020 x86_64 x86_64 x86_64 GNU/Linux

The host OS is (uname -a)

Linux 3.10.105 #25426 SMP Wed Jul 8 03:19:33 CST 2020 x86_64 GNU/Linux synology_avoton_1817+

Comment by Lee Thompson [ 2022-06-16 ]

Since upgrading to 10.8.x it already is falling back to innodb_use_native_aio=0

2022-06-15 22:01:07 0 [Warning] mariadbd: io_uring_queue_init() failed with ENOSYS: check seccomp filters, and the kernel version (newer than 5.1 required)
2022-06-15 22:01:07 0 [Warning] InnoDB: liburing disabled: falling back to innodb_use_native_aio=OFF

Comment by Marko Mäkelä [ 2022-06-16 ]

leethompson, I can’t give step-by-step instructions for Docker, and I suspect that https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ would not work out of the box. Usually (but not always) in core dumps, the crashing thread is Thread 1. Once you have identified the crashing thread in the output of thread apply all backtrace, you would have to use a command like thread 1 to switch to that thread, and then something like frame 4 (I am not sure about the number) to get to the function io_callback, and then print cb->m_err to display the value. You would likely need a separate debug symbol package installed for the last step to work.

Are there any messages about file system corruption or other trouble in the system logs or in the kernel message buffer (sudo dmesg)? Does smartctl report any storage errors? Which file system and type of storage are you using?

I was under the wrong impression that io_callback() would not be invoked by the "simulated AIO" implementation. So, there is no work-around for this at the moment. Perhaps wlad has some ideas about this, since the code was last refactored by him in MDEV-16264.

Comment by Marko Mäkelä [ 2022-06-16 ]

Side note: Native AIO should be much more efficient than the fallback implementation. The liburing interface is rather recent; libaio was introduced some time during Linux 2.6 already. Because a given MariaDB Server executable will not support both implementations, it should be better to use an executable that was built for libaio. But, this should not solve the problem at hand.

Comment by Lee Thompson [ 2022-06-16 ]

@Marko Mäkelä, most of that went over my head, fortunately the container seem to have apt-get though so I may be able to get that working. I hope. It's getting late here so I'll try it tomorrow.

The filesystem (on the host) is btfrs but it's complicated, it's a hybrid RAID 6 array (Synology Hybrid Raid 2) (the box is a Synology DS1817+). There are no errors. MariaDB's data is on the host file system through the volume mounting so it is not in a docker volume.

Moreover, I've been trying to alter my backup script to use mariabackup and it works fine so I'm pretty sure (99%?) that this isn't a file system issue.

Comment by Lee Thompson [ 2022-06-16 ]

@Marko Mäkelä, changing kernel or mariadb binary is not likely. I'm just using mariadb:latest and whatever it's got. Synology's newer model and operating system is running 4.4.180+ which wouldn't help either.

(*nix is not my forte but you've probably guessed that by now.)

Comment by Daniel Black [ 2022-06-16 ]

You are right that a < 5.1 kernel won't have uring, so its revered to a simulated AIO.

Instead of mariadb:latest, can you run the container quay.io/mariadb-foundation/mariadb-debug:10.8 (same interface with --cap-add CAP_SYS_PTRACE in the docker options (might need CAP_ removed) when starting the container.

Before doing the backup run the following and leave it running:

sudo docker exec -ti $CONTAINER_NAME gdb -p 1
(gdb) c

"c" is continue the execution.

Run the backup, and the gdb should be stalled at this location with as the assertion happened.

(gdb) thread apply all bt -frame-arguments all full and capture this information and include here.

Go:
(gdb) up

until you are in the io_callback function.
(gdb) p *cb

will show the contents of this including the m_err value that marko and would like to see along with the function that was in progress.

Comment by Lee Thompson [ 2022-06-16 ]

Ended up staying up for something else and took a stab at trying to get debug symbols in the container.

The mariadb:latest is stripped so I followed the instructions but it ended in failure.

Suggestion for MariaDB: Make debug images. If I could swap out with a mariadb:debug_latest image, it would make this a lot easier for all of us. Especially those of us on systems where building a custom container is not much of an option.

sudo add-apt-repository 'deb [arch=amd64,arm64,ppc64el,s390x]  https://ftp.osuosl.org/pub/mariadb/repo/10.5/ubuntu focal main/debug'

This failed in the container for two reasons. sudo is not there. add-apt-repository is not there.

root@MariaDB:/# apt-get update && apt-get install -y mariadb-server-core-10.8.3-dbgsym 
Get:1 http://archive.ubuntu.com/ubuntu jammy InRelease [270 kB]                                     
Get:2 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]                                    
Get:4 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [109 kB]                                      
Get:5 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [99.8 kB]               
Get:3 https://archive.mariadb.org/mariadb-10.8.3/repo/ubuntu jammy InRelease [7728 B]
Get:6 http://archive.ubuntu.com/ubuntu jammy/main amd64 Packages [1792 kB]
Get:7 http://security.ubuntu.com/ubuntu jammy-security/multiverse amd64 Packages [4648 B]
Get:8 http://security.ubuntu.com/ubuntu jammy-security/restricted amd64 Packages [212 kB]
Get:9 http://archive.ubuntu.com/ubuntu jammy/restricted amd64 Packages [164 kB] 
Get:10 http://archive.ubuntu.com/ubuntu jammy/multiverse amd64 Packages [266 kB]                              
Get:11 http://archive.ubuntu.com/ubuntu jammy/universe amd64 Packages [17.5 MB]                                
Get:12 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [229 kB]                            
Get:13 https://archive.mariadb.org/mariadb-10.8.3/repo/ubuntu jammy/main amd64 Packages [9823 B]                                                    
Get:14 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [89.8 kB]                                                           
Get:15 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [153 kB]                                                              
Get:16 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages [236 kB]                                                            
Get:17 http://archive.ubuntu.com/ubuntu jammy-updates/multiverse amd64 Packages [4648 B]                                                            
Get:18 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [380 kB]                                                                  
Get:19 http://archive.ubuntu.com/ubuntu jammy-backports/universe amd64 Packages [2016 B]                                                            
Fetched 21.6 MB in 13s (1640 kB/s)                                                                                                                  
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package mariadb-server-core-10.8.3-dbgsym
E: Couldn't find any package by glob 'mariadb-server-core-10.8.3-dbgsym'
E: Couldn't find any package by regex 'mariadb-server-core-10.8.3-dbgsym'

Comment by Marko Mäkelä [ 2022-06-16 ]

leethompson, MariaDB supplies packages for many operating systems. I am not familiar with containers, so I do not know if this is relevant or applicable, but: If there is a Docker container of MariaDB based on Ubuntu 20.04 instead of 22.04, that one should use libaio instead of liburing.

Since btrfs was mentioned, this might be related to MDEV-24854 and you could try innodb_flush_method=fsync to disable the use of O_DIRECT. But I would still like to know the error code.

Comment by Lee Thompson [ 2022-06-16 ]

@Daniel Black, I will do that. I don't want my stuff to be down for a long time so what I'll do is setup the debug container with a clone of the data from the main one and work on getting you the information you need.

Comment by Daniel Black [ 2022-06-16 ]

The server debug symbol package does exist - https://archive.mariadb.org/mariadb-10.8.3/repo/ubuntu/pool/main/m/mariadb-10.8/mariadb-server-core-10.8-dbgsym_10.8.3%2Bmaria~jammy_amd64.ddeb , its just missing in the repo information somehow. Downloading and installing with dpkg -i should work.

marko containers are only the userspace and not the kernel, hence mariadbd: io_uring_queue_init() failed with ENOSYS, because the kernel interface.

If you want to build your own focal (20.04) based container for 10.8.3 - https://github.com/grooverdan/mariadb/tree/focal_images/10.8-focal.

Comment by Daniel Black [ 2022-06-16 ]

add-apt-repository if you use 10.8 instead of 10.5 and jammy instead of focal this should work as a repository directly.

Comment by Lee Thompson [ 2022-06-21 ]

@Daniel Black, Having trouble with the container building, the Synology Diskstation DS1817+ has it's own Docker UI which is somewhat limited. I've now got Portainer working and should be able to use that.

I have never built my own container image myself so this may take some time.

Comment by Daniel Black [ 2023-09-19 ]

with MDEV-27593 resolve the code path around this assertion and all stacks above have significantly improved.

Is this still an issue?

Generated at Thu Feb 08 10:04:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.