[MDEV-28580] Server crash when creating an index after adding a foreign key Created: 2022-05-16  Updated: 2023-06-21  Resolved: 2022-09-04

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.7.3
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Joris Guffens Assignee: Daniel Black
Resolution: Incomplete Votes: 0
Labels: crash
Environment:

docker desktop on windows


Attachments: File Logfile3.PML     File docker-compose.yml     File mariadb-1.strace     Text File mariadb-crash-1.log     File mariadb.strace    
Issue Links:
Duplicate
is duplicated by MDEV-31486 mariadb crash after startup on bitnam... Closed
Relates
relates to MDEV-24189 Rename Table twice raise error "Table... Closed
relates to MDEV-29015 InnoDB: Assertion failure in file /ho... Closed
relates to MDEV-29260 Server crashes when running migration... Open

 Description   

When using a newly created database, executing the following sql statements will cause a server crash.

create table player_profiles (
  id                            varchar(40) not null,
  name                          varchar(255) not null,
  tokens                        integer default 0 not null,
  constraint pk_player_profiles primary key (id)
);
 
create table player_report (
  id                            varchar(40) not null,
  sender_player_id              varchar(40) not null,
  target_player_id              varchar(40) not null,
  created_at                    datetime(6) not null,
  constraint pk_player_report primary key (id)
);
 
create index ix_player_report_sender_player_id on player_report (sender_player_id);
 
-- the next line adds a foreign key, removing this line will prevent a server crash
alter table player_report add constraint fk_player_report_sender_player_id foreign key (sender_player_id) references player_profiles (id) on delete restrict on update restrict;
 
-- server crashes on this query
create index ix_player_report_target_player_id on player_report (target_player_id);

The crash occurs when adding a foreign key, followed by the creation of an index. This happens on the same table but on different columns.

The docker image was freshly pulled from Docker Hub and run on a Windows 10 computer.



 Comments   
Comment by Marko Mäkelä [ 2022-05-16 ]

mariadb-crash-1.log shows that an assertion ut_a(fsize != os_offset_t(-1)) fails in the function fil_space_extend_must_retry(). What are the build versions of Docker and Windows 10? I think that this should be specific to that platform, not native GNU/Linux or native Microsoft Windows.

Comment by Joris Guffens [ 2022-05-16 ]

After some more experimenting I figured out that this only happens when a docker volume is bound to the host file system. I included the docker-compose.yml

I don't think this is desired behavior. Correct me if I'm wrong.

Comment by Marko Mäkelä [ 2022-05-16 ]

Side note: ADD FOREIGN KEY would avoid a table rebuild if you set foreign_key_checks=0. I think that any attempt to invoke a table rebuild on a large enough table should reproduce the same. Examples:

OPTIMIZE TABLE player_report;
ALTER TABLE player_report FORCE;

How large is the table? Is the file system anywhere near full? How was the data loaded in the first place? By using a data directory that was created by some other means?

Comment by Joris Guffens [ 2022-05-16 ]

Windows 10 Pro 21H2, build 19044.1706
Docker Desktop 4.8.1

The docker-compose.yml was put in an empty directory therfore the database is completely empty.
Then I ran `docker-compose up -d`
I created a database with

CREATE DATABASE `testdb` /*!40100 COLLATE 'utf8mb4_general_ci' */;

The database is empty and the script provided in this issue creates new tables and then modifies them causing the crash.
My filesystem has plenty of available space.

Because running the container with a dedicated volume solves the problem, this might be a Docker issue.

Comment by Alice Sherepa [ 2022-05-16 ]

https://stackoverflow.com/questions/70784437/innodb-preallocating-147456-bytes-for-file-xxx-ibd-failed-with-error-2
this seems to be a similar problem.

Comment by Daniel Black [ 2022-05-18 ]

jorisguffens, are you able to make the trace recordings described in: docker library issue 403?

Comment by Joris Guffens [ 2022-05-18 ]

mariadb.strace Is this what you need? I followed the exact steps as described in the issue, I also have a procmon log file but that is too big to upload here.

Comment by Daniel Black [ 2022-05-18 ]

Apologizes I missed a -f option in strace to trace all the threads. With there would be a fallocate call. In your procmon log can you see what the fallocate linux system calls get translated to in windows and their results? The surrounding system calls related to the same file handles would be useful too.

Comment by Joris Guffens [ 2022-05-19 ]

mariadb.strace This strace is much more verbose but I don't have much experience with this so I don't know what to look for. I just found this when looking for fallocate.

fallocate(50, 0, 0, 65536)        = -1 EOPNOTSUPP (Operation not supported)

Logfile3.PML With some filtering I shrunk down the proclog file so I can upload here. Hopefully I didn't filter out too much.

Comment by Marko Mäkelä [ 2022-05-20 ]

InnoDB should handle EOPNOTSUPP from fallocate() for quite some time now, and fall back to extending the file by writes. I think that MDEV-16015 was addressed by that contributed fix. I understood that the GNU libc replacement in the Docker for Windows would implement a simpler version of posix_fallocate(), without an internal fallback for EOPNOTSUPP.

Comment by Daniel Black [ 2022-05-20 ]

WSL I think is providing a kernel interface. The glibc interface within the container is the raw ubuntu-20.04 glibc provided by the base container image of mariadb:10.7.

On mariadb-1.strace

pwrite based fallback after fallocate fails seem to be working.

Line:

659: 68    openat(AT_FDCWD, "./testdb/#sql-alter-1-9a.ibd", O_RDWR|O_CREAT|O_EXCL|O_CLOEXEC, 0660) = 50
..
706: 68    pwrite64(50, "\0", 1, 114687)     = 1
707: 68    fdatasync(50)                     = 0
708: 68    fdatasync(50)                     = 0
..
(no closing of fd 50)
...
740: 68    stat("./testdb/#sql-alter-1-9a.ibd",  <unfinished ...>
744: 68    <... stat resumed>{st_mode=S_IFREG|0660, st_size=114688, ...}) = 0
 
830: 68    rename("./testdb/#sql-alter-1-9a.ibd", "./testdb/player_report.ibd") = 0
...
 
968: 68    recvfrom(41, "\3-- server crashes on this que....
..
1002: 68    fstat(50, 0x7fd0144c9130)         = -1 ENOENT (No such file or directory)
1003: 68    write(2, "2022-05-19 10:51:51 154 [ERROR] InnoDB: preallocating 131072 bytes for file ./testdb/player_report."..., 123) = 123
1004: 68    fstat(50, 0x7fd0144c92e0)         = -1 ENOENT (No such file or directory)
1005: 68    write(2, "2022-05-19 10:51:51 0x7fd0144ce700", 34) = 34
1006: 68    write(2, "  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.7.3/storage/innobase/f"..., 122) = 122

In the strace there is no closing of file descriptor 50.

So its looking like fstat on an open file descriptor is loosing track of the file because it was renamed earlier.

jorisguffens if you could provide the info from the template bug report related to wsl information and kernel information.

I haven't looked though the PML file yet. My quick look so far can't see a WSL bug on this. I'm happy to write WSL bug using the provide information and I'll search again just to be sure first.

Comment by Joris Guffens [ 2022-05-20 ]

The information you requested:
Microsoft Windows [Version 10.0.19044.1706]
WSL 2
Kernel version: 5.10.102.1
Docker Desktop (Windows), version 4.8.2

While looking for your information I noticed my wsl kernel was quite a few versions behind (5.4.72). I updated and I was still able to reproduce this issue with the latest kernel.

Comment by Daniel Black [ 2022-05-25 ]

WSL bug 8443 lodged.

Comment by Daniel Black [ 2022-07-11 ]

FYI I've implemented an attempted workaround (10.3) based. This is available as quay.io/mariadb-foundation/mariadb-devel:10.3-mdev-29015-avoid-wsl8443. If you are able to test this to see if it crashes in a Windows WSL environment that would be appreciated.

If it fails for some reason, can you perform a strace on the test case like the instructions above.

Comment by Daniel Black [ 2022-07-25 ]

I retested on Windows 10 - 19044.1826 and was unable to reproduce this.

I also tested on Windows 11 22000.795 and couldn't reproduce it there either.

Can you please retest if your Windows updates resolve this.

Generated at Thu Feb 08 10:01:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.