[MDEV-24189] Rename Table twice raise error "Tablespace is missing for a table" Created: 2020-11-11  Updated: 2024-01-29  Resolved: 2022-07-25

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.7
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Daniel Black Assignee: Daniel Black
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

Windows docker container only. Not Linux, not windows native


Attachments: Microsoft Word ProcessMonitorLogfile.CSV     PNG File WSL Disabled - Working with Hyper-V.png     PNG File files.PNG     File rename-wsl.strace     File rename.c     File rename.strace     File script.sql     File strace.out    
Issue Links:
Relates
relates to MDEV-24184 InnoDB RENAME TABLE recovery failure ... Closed
relates to MDEV-28580 Server crash when creating an index a... Closed

 Description   

From: https://github.com/docker-library/mariadb/issues/331

Run docker-compose up -d
connect with your favorite client (I'm using DBeaver Enterprise)
Run a create table statement:

CREATE TABLE NewTable (
ID int primary key,
name varchar(10)
)

Run a RENAME table statement like this:
RENAME TABLE mytestdb.NewTable TO mytestdb.NewTablea;

Then, run again, another RENAME the statement:
RENAME TABLE mytestdb.NewTablea TO mytestdb.NewTableb;

The error I'm getting is:
SQL Error [1025] [HY000]: (conn=4) Error on rename of './mytestdb/newtablea' to './mytestdb/newtableb' (errno: 194 "Tablespace is missing for a table")

Notes:
1 - When I install the MariaDB client via .msi installer for windows, all works fine
2 - I changed the variable lower_case_table_names to 1 because the default with windows is coming 2 for me, but with no success, the error still the same.
3 - I created the same container with mysql:latest image instead of mariadb and all works fine
4 - Tried to adjust folder permissions on windows, giving all permisions to anyone (like a chmod 777 on linux), but the erros still the same.

So I think that there's some problem with windows + mariadb:latest image



 Comments   
Comment by Vladislav Vaintroub [ 2020-11-11 ]

I do not have a slightest idea where this may ever come from, nor I'm familiar with docker that much. I guess it might be Innodb on Linux not handling case-insensitivity as expected on a slightly weird platform.

Comment by Rafael Ambrosio [ 2020-11-11 ]

Hi Everyone, Thank you @Daniel Black for replicate the opened issue I've posted on github. I appreciate your effort.
Just updating some tests I've done:

Tested on enable the case-sensitivity on persist mapped folder on my windows using this: https://www.windowscentral.com/how-enable-ntfs-treat-folders-case-sensitive-windows-10

and I was pretty confident that this workaround would really work. But it did not work

Setting

 lower_case_table_names = 0 


and even trying to use

lower_case_table_names = 1

to force all keep on lower case.

But with no positive results.
Maybe the problem isn't about case-sensitivity but another things we don't know yet? The only thing I know is the MySQL container can handle it.

I opened side by side (mariadb and mysql) persist volume and figured out that even both containers is using InnoDB engines, the MySQL not create .frm files just the .ibd files...

Comment by Marko Mäkelä [ 2020-11-11 ]

For what it is worth, there is CIOPFS (Case-Insensitive-On-Purpose-File-System) that allows case-sensitive Linux file systems to emulate case-insensitive ones. I have used it maybe 5 years ago to troubleshoot similar issues.

The lower_case_table_names is a crude hack. If I remember correctly, on Windows, InnoDB is ignoring that parameters and always treating it as if it were 1. The InnoDB data structures are case-sensitive (there is a strcmp() like comparison in a hash table of table names), and we try to emulate case-insensitive table names by converting them to lower case. And to begin with, storage engines see table names in the filename-safe encoding, which uses 1, 3, or 5 ASCII characters per character. The 1-byte and 3-byte encodings are supposed to be compatible with the lower-case mapping.

Comment by Rafael Ambrosio [ 2020-11-11 ]

The strange thing is:
If I install the mariaDB .msi for windows, and run it natively, this problem not happens.
This only happens with these factors:
1 - Windows as host S.O.
2 - Using https://www.docker.com/products/docker-desktop
3 - Using mariadb:latest docker image
4 - mapping the /var/lib/mysql directory from the container to a directory on host (because I don't wanna loose all my data on destroying container: docker-compose down command)

If you have a windows s.o. please try to reproduce, it's so easy:

Install docker desktop and create an empty folder and a docker-compose.yml file with this:

version: '3'
services:
  database_container:
    container_name: my_db
    image: mariadb:latest
    ports:
      - 3306:3306
    environment:
      MYSQL_DATABASE: mytestdb
      MYSQL_USER: usr_db
      MYSQL_PASSWORD: 123456
      MYSQL_ROOT_PASSWORD: 123456
    volumes:
      - ./db/persist:/var/lib/mysql

open cmd or another terminal (I'm using gitbash) on the same directory of docker-compose.yml
run:

docker-compose up -d

connect to database, via terminal or via some db client (I'm using DBeaver)
run:

CREATE TABLE NewTable (
	ID int primary key,
	name varchar(10)
)

then:

-- first rename ok
RENAME TABLE mytestdb.NewTable TO mytestdb.NewTablea;
 
-- second rename rasing the error: Error on rename of './mytestdb/newtablea' to './mytestdb/newtableb' (errno: 194 "Tablespace is missing for a table")
RENAME TABLE mytestdb.NewTablea TO mytestdb.NewTableb;

That's why I opened the issue https://github.com/docker-library/mariadb/issues/331
on github, but they tell me that can't reproduce on linux environment
and told me to open an issue on https://github.com/docker/for-win/issues
a repo from docker with 1.1K issues :O

The @Daniel Black, autor of this issue, copied the github issue and posted here, trying to help us.

I kindly ask to read the github thread/responses.
And feel free to contact us.

Comment by Vladislav Vaintroub [ 2020-11-11 ]

ambrosiora, I think from practical poin of view, if you're running MariaDB on production, on Linux or not, it perhaps makes more sense to have MariaDB on Windows natively locally, rather than MySQL on Linux on docker on Windows.
That does not invalidate the bug report, but running MySQL locally, just because it runs on Linux on docker on Windows, will probably make a bigger difference with your production.

Comment by Rafael Ambrosio [ 2020-11-12 ]

Yes @Vladislav Vaintroub you're right on this point of view and on production I run MariaDB on AWS RDS, wich is way more consistent.
as you say, it not invalidate the bug, but is a good advice that I will take it. Thanks!

Comment by Marko Mäkelä [ 2020-11-12 ]

wlad, I think that you are best equipped to diagnose this.

I tried to test with ciopfs on top of /dev/shm as well as on ext4, but the bootstrap failed as follows:

2020-11-12  8:03:09 0 [ERROR] InnoDB: Unable to lock ./ibdata1 error: 37
2020-11-12  8:03:09 0 [ERROR] InnoDB: Operating system error number 37 in a file operation.
2020-11-12  8:03:09 0 [ERROR] InnoDB: Error number 37 means 'No locks available'
2020-11-12  8:03:09 0 [Note] InnoDB: Some operating system error numbers are described at https://mariadb.com/kb/en/library/operating-system-error-codes/
2020-11-12  8:03:09 0 [ERROR] InnoDB: Cannot open datafile './ibdata1'

So, it looks like ciopfs has fallen victim to some bit-rot, because this definitely worked a few years ago. I patched out, but then the test passed:

diff --git a/mysys/my_lock.c b/mysys/my_lock.c
index 7597436f381..33fd5db4dc2 100644
--- a/mysys/my_lock.c
+++ b/mysys/my_lock.c
@@ -137,7 +137,7 @@ static int win_lock(File fd, int locktype, my_off_t start, my_off_t length,
 int my_lock(File fd, int locktype, my_off_t start, my_off_t length,
 	    myf MyFlags)
 {
-#ifdef HAVE_FCNTL
+#if 0
   int value;
   ALARM_VARIABLES;
 #endif
@@ -159,7 +159,7 @@ int my_lock(File fd, int locktype, my_off_t start, my_off_t length,
     if (win_lock(fd, locktype, start, length, timeout_sec) == 0)
       DBUG_RETURN(0);
   }
-#else
+#elif 0
 #if defined(HAVE_FCNTL)
   {
     struct flock lock;
diff --git a/storage/innobase/os/os0file.cc b/storage/innobase/os/os0file.cc
index 24134479d8e..da8560eb1f9 100644
--- a/storage/innobase/os/os0file.cc
+++ b/storage/innobase/os/os0file.cc
@@ -327,7 +327,7 @@ class SyncFileIO
 };
 
 #undef USE_FILE_LOCK
-#ifndef _WIN32
+#if 0 //ndef _WIN32
 /* On Windows, mandatory locking is used */
 # define USE_FILE_LOCK
 #endif
diff --git a/storage/maria/ma_control_file.c b/storage/maria/ma_control_file.c
index 65b8b0922aa..837ef4fbab6 100644
--- a/storage/maria/ma_control_file.c
+++ b/storage/maria/ma_control_file.c
@@ -226,7 +226,7 @@ static int lock_control_file(const char *name, my_bool do_retry)
     @todo BUG We should explore my_sopen(_SH_DENYWRD) to open or create the
     file under Windows.
   */
-#ifndef __WIN__
+#if 0
   uint retry= 0;
   uint retry_count= do_retry ? MARIA_MAX_CONTROL_FILE_LOCK_RETRY : 0;
 

The test that I used was as follows:

--source include/have_innodb.inc
CREATE DATABASE mytestdb;
CREATE TABLE mytestdb.NewTable (ID int primary key, name varchar(10))
ENGINE=InnoDB;
RENAME TABLE mytestdb.NewTable TO mytestdb.NewTablea;
RENAME TABLE mytestdb.NewTablea TO mytestdb.NewTableb;
DROP DATABASE mytestdb;

Comment by Vladislav Vaintroub [ 2020-11-12 ]

On WSL1, on NTFS partition there are no problems with the test. Server correctly determines lower_case

MariaDB [(none)]> set global innodb_file_per_table=1;
Query OK, 0 rows affected (0.001 sec)
 
MariaDB [(none)]> CREATE DATABASE mytestdb;
Query OK, 1 row affected (0.005 sec)
 
MariaDB [(none)]> CREATE TABLE mytestdb.NewTable (ID int primary key, name varchar(10))
    -> ENGINE=InnoDB;
Query OK, 0 rows affected (0.082 sec)
 
MariaDB [(none)]> RENAME TABLE mytestdb.NewTable TO mytestdb.NewTablea;
Query OK, 0 rows affected (0.011 sec)
 
MariaDB [(none)]> RENAME TABLE mytestdb.NewTablea TO mytestdb.NewTableb;
Query OK, 0 rows affected (0.009 sec)
 
MariaDB [(none)]> show variables like 'lower_case%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| lower_case_file_system | ON    |
| lower_case_table_names | 2     |
+------------------------+-------+
2 rows in set (0.001 sec)

$ dir var/mysqld.1/data/mytestdb/
NewTableb.frm db.opt newtableb.ibd

I had to remove the "ignore world-writable my.cnf" bits from my_default.c to make this work, and also instruct server to put afunix socket to "/home" filesystem.

Comment by Vladislav Vaintroub [ 2020-11-12 ]

I can reproduce on WSL2 though

Comment by Vladislav Vaintroub [ 2020-11-12 ]

I reproduced on WSL2.
A stat() call fails after file is created via rename(), as can be seen in the attached strace.out
the file is there, I can see it in explorer. Seems to be a WSL2 their-filesystem's specific bug. failing stat() is around just before "[ERROR" in the strace.out, if you search for [ERROR . This stat() does not have a corresponding entry, if I trace filesystem access on Windows side (while while other filesystem manipulations are visible and are executed from vp9fs.dll, which seems something WSL2ish)

As per Innodb, I am not sure why Innodb is doing the stat(),at all, if the only thing it needs to to is a rename()

ambrosiora, this error does not happen if you use innodb-file-per-table=0, in case this offers any consolation

Comment by Vladislav Vaintroub [ 2020-11-12 ]

So, upon even further examination, I can see the issue in both WSL1, and WSL2, and also I captured some data from the procmon ProcessMonitorLogfile.CSV , which corresponds exactly to the failing RENAME command.

By taking a closer look, there is something in procmon that corresponds to the failing stat().

from strace.out

 stat("./mytestdb/newtablea.ibd", 0x7f6e73163170) = -1 ENOENT (No such file or directory)

on Windows side, in procmon, the old name is used
from ProcessMonitorLogfile.CSV

18:18:49,9451684,"DllHost.exe","2792","CreateFile","C:\work\data\mytestdb\newtable.ibd","NAME NOT FOUND","Desired Access: Read Attributes, Disposition: Open, Options: Open Reparse Point, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a"

So, there is a mismatch, Linux stats() a new name w ./mytestdb/newtablea.ibd , Windows wants to open old name of that file C:\work\data\mytestdb\newtable.ibd, and bad things happen.

I'm not sure what more we could do here. danblack, if you're into filing external bugs maybe you can file one here https://github.com/microsoft/WSL

Comment by Daniel Black [ 2020-11-12 ]

Caught me out on not having a Windows environment available and the WSL issue tracker being strict on inputs. ambrosiora do you want to compile the rename.c here (`gcc -o rename rename.c; strace -o rename.strace -f ./rename`) and see if like the above the strace shows stat returning ENOENT and the ProcessMonitor log showing the incorrect file access? If it follows that pattern can you write an WSL issue?

Comment by Rafael Ambrosio [ 2020-11-13 ]

Sure I can try
I can't guarantee that I'll be able to do it myself ¯_(ツ)_/¯ but I think so. haha.

Comment by Rafael Ambrosio [ 2020-11-13 ]

Hey danblack, I did it.

strace is not returnig ENOENT, but it seems that is there an exception. Added a files.PNG just to show that test1 is there, and added the rename.strace for analysis.

Comment by Rafael Ambrosio [ 2020-11-13 ]

Hey, recently installed the Docker Desktop and a message was displayed saying: "Docker is using WSL now, blah blah blah.. enable it, etc..."

Now, I just read >>here<< if I go to

C:\Users\<username>\AppData\Roaming\Docker\settings.json

and set wslEngineEnabled to false, I guess that Docker will run in a full VM mode, just like before WSL.

So, I was wondering... If I temporarily disable the WSL and recreate the test case, just to see if the problem shows up
what do you guys think?

Comment by Rafael Ambrosio [ 2020-11-13 ]

Hey guys, I don't know if this is good or bad news, but disabling WSL and coming back to Docker Containers Running with Hyper-V, the error don't came up.
I did the test right now, look:

I guess WSL is way more superior, how can we deal with it danblack and wlad?
I mean, should we do some other tests or try to do the "external bugs" thing (haha) mentioned above?

Comment by Daniel Black [ 2020-11-13 ]

The steps I quotes where for a WSL Debian or Ubuntu terminal. I think you managed a windows compile, not sure what strace you used but it wasn't the linux one. Can you try again. When you do the WSL bug report point directly at Vlad's "I reproduced on WSL2." comment.

Yes, you have a work around for Docker. My attempt at WSL Linux install stopped on encrypted or space disk error. Reporting bugs to the right location is the only way they get fixed.

Comment by Rafael Ambrosio [ 2020-11-13 ]

Hey danblack, you're right, I did it on windows, now I did a setup on windows like this: https://www.youtube.com/watch?v=epZOKY83t8g
and tried again.

here the results: rename-wsl.strace

Tell me if I did it right this time

Comment by Daniel Black [ 2020-11-16 ]

ambrosiora did it right, however something I did in the test case minimisation to rename.c dropped whatever caused the bug. If you could follow the WSL bug report to the letter, include your original problem, and point at Vlad's traces is hopefully sufficient for them to fix it.

Comment by Marko Mäkelä [ 2020-11-17 ]

wlad, it seems to me that the questionable stat() call is in the function dict_table_rename_in_cache(), invoked via the following:

		/* Delete any temp file hanging around. */
		if (os_file_status(filepath, &exists, &ftype)
		    && exists
		    && !os_file_delete_if_exists(innodb_temp_file_key,
						 filepath, NULL)) {
 
			ib::info() << "Delete of " << filepath << " failed.";
		}

I cannot entirely confirm this, because the message in strace.out is seriously truncated:

684   stat("./mytestdb/newtablea.ibd", 0x7f6e73163170) = -1 ENOENT (No such file or directory)
684   write(2, "2020-11-12 17:22:51 20 [ERROR] I"..., 150) = 150

Which message was actually written to the error log?
By the way, the above code path is handling a case where table->space==nullptr. That would suggest that something went wrong already earlier.

Comment by Vladislav Vaintroub [ 2020-11-17 ]

2020-11-17 19:43:35 8 [ERROR] InnoDB: Cannot rename './mytestdb/newtablea.ibd' to './mytestdb/newtableb.ibd' because the source file does not exist

Comment by Daniel Black [ 2022-07-25 ]

ambrosiora sorry for taking so long.

I retested on Windows 10 - 19044.1826 and was unable to reproduce this.

I also tested on Windows 11 22000.795 and couldn't reproduce it there either.

Can you please retest if your Windows updates resolve this.

Comment by Daniel Black [ 2022-07-25 ]

Closing as not a bug, but really not our bug.

If if this is an lets continue the discussion in MDEV-27580

I prepared a work around for 10.3 in that is available as quay.io/mariadb-foundation/mariadb-devel:10.3-mdev-29015-avoid-wsl8443. If the Windows update resolves this I'd rather leave the older code alone.

Comment by Rafael Ambrosio [ 2022-07-25 ]

Hi! Unfortunatelly, I don't have the environment anymore, neither the laptop. Since I started to have slowing down problems to work with docker on WSL, I decided to migrate to a real Linux environment. Sorry, but thanks for all the effort!

Comment by Andreas Tschersich [ 2022-12-12 ]

The error is back with Windows 10 19044.2311.

Comment by Vladislav Vaintroub [ 2022-12-12 ]

if so, maybe danblack can reopen https://github.com/microsoft/WSL/issues/8443 . it still remains a "not MariaDB bug"

Comment by Daniel Seiler [ 2024-01-27 ]

We had a user reproduce this issue on Win10 Pro 22H2, Docker Desktop + WSL2, MariaDB 1:11.2.2+maria~ubu2204, that is a migration failed with

pymysql.err.OperationalError: (1025, 'Error on rename of \'./darkflame/accounts\' to \'./darkflame/#sql-backup-1-20\' (errno: 194 "Tablespace is missing for a table")')

with a MariaDB log of

[ERROR] InnoDB: Cannot rename './darkflame/accounts.ibd' to './darkflame/#sql-backup-1-20.ibd' because the source file does not exist.

I'm commenting about it because the user reported that the reproducer in https://github.com/microsoft/WSL/issues/8443#issuecomment-1836812926 did not print any sort of "error on fstat errno 2", so while I think this it's still possible that MariaDB triggers some weird behaviour in 9P on Windows 10, it may be too quick to dismiss it as WSL#8443 and "not a MariaDB bug".

Comment by Daniel Black [ 2024-01-29 ]

Hi Xiphoseer,

Sorry to here we've still got a new form of this. To validate the system calls used:

  • install strace in the container (new container with "FROM mariadb:11.2.2\nRUN apt-get update && apt-get install -y strace")
  • after initializing some data, before the rename, --user mysql --entrypoint "strace -s 99" --command mariadbd, then preform the rename. Like what wlad did above, use ProcessMonitor to cature the rename at the same time.
  • yes I'm the user that put the broken reproducer. I was missing while doing the tests, and it didn't show up even though it looks exactly like the strace conditions, nor the under the basic rename. If you can create a C file and create a reproducer image that works in your environment that would be much appreciated.

    Dockerfile

    ARG VERSION=22.04
    FROM ubuntu:$VERSION
     
    COPY wsl8443.c .
     
    RUN apt-get update && apt-get install -y gcc strace && rm -rf /var/lib/apt/lists/*
     
    RUN gcc -o wsl8443 wsl8443.c
    

Generated at Thu Feb 08 09:28:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.