[MDEV-29969] Random crashes (signal 8) when restoring mariadb-server memory state using CRIU (OpenVZ 7) Created: 2022-11-07 Updated: 2022-11-07 Resolved: 2022-11-07 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | N/A |
| Affects Version/s: | 10.9.3 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Philippe | Assignee: | Unassigned |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Host : OpenVZ 7 (7.0.18) |
||
| Attachments: |
|
| Description |
|
Good evening! Long story short: MariaDB randomly crashes after being restored (using a backup) and having its memory state restored by CRIU. Command that triggers the bug: "vzctl resume <ctid>" — MariaDB 10.9.3 is installed inside an OpenVZ 7 container (Debian 10). When resuming/restoring this CT using OpenVZ 7 commands, MariaDB sometimes crashes inside the container (mysqld got signal 8)"
Please find attached MariaDB "error log", "gdb log" and "config CT" (contains all commands to reproduce environment). If needed, I can also provide CRIU "dump.log", "restore.log" and MariaDB core dump (246 Mo). Have a great evening! |
| Comments |
| Comment by Daniel Black [ 2022-11-07 ] |
|
Signal 8 is a SIGFPE generated by the __difftime. Looks like https://bugzilla.kernel.org/show_bug.cgi?id=4532, but seems too old. Probably not our bug. |
| Comment by Philippe [ 2022-11-07 ] |
|
Hello Daniel. Many thanks for your time and your help. My knowlegde in C is limited but, if I understand good, there is an inconsistency with "last_monitor_time" value? (0, not set or undefined?) OpenVZ 7.0.18 use kernel branch RHEL7 (3.10) — uname -r — cat /etc/os-release |
| Comment by Daniel Black [ 2022-11-07 ] |
|
Sorry I was wrong with the original C code posted. There is a little later setting of monitor_state.last_monitor_time to 0. difftime is a glibc implemented calculation, quite simple. It requires the floating point unit (FPU) of the processor to be enabled (as its arguments are actually double precision numbers). The SIGFPE is as a result of the attempted use of the processor feature without it being initialized. Either a) the kernel should have fully initialized this before passing to userspace code, or b) the kernel should enable the FPU and allow mariadb to continue. I've done a brief search on https://bugs.openvz.org/issues/?jql=text%20~%20%22SIGFPE%22 or https://bugzilla.redhat.com and have been unable to find this bug. So I recommend reporting it on openvz with your repoducer attachments here. While the code at ./storage/innobase/srv/srv0srv.cc +1194 could be done with a non-double based subtraction, there are other parts of the codebase that use double numbers which could easily trigger it. |
| Comment by Daniel Black [ 2022-11-07 ] |
|
Closing as "Not our bug". Thanks for the well written bug report which hopefully the openvz folks can parse and correct. Thanks for using MariaDB and reporting bugs. |
| Comment by Philippe [ 2022-11-07 ] |
|
Again, many many thanks! Seems related to CRIU project. I will contact OpenVZ devs as you suggested. MariaDB is a wonderfull projet and devs like you, responding to "normal" people, this is just awesome. Have a great day! |