|
The single command in MDEV-26381_2.sql can be used to start generating the "Cannot close file ... pending operations and pending fsync" messages. It happens consistently after approx 30-35 seconds. Query takes 1 min+ on fast machine.
|
|
I tried repeating this on 10.5 and 10.6 both with Debug and RelWithDebInfo, to no avail. My invocation was like this:
./mtr --parallel=auto --repeat=100 innodb.MDEV-26381_2{,,,,,,,,,,,,,,,,,,,,,,,}
|
This was like MDEV-26381_2.sql , but added the following around the file:
--source include/have_innodb.inc
|
--source include/have_partition.inc
|
…
|
DROP TABLE t2;
|
Please provide an rr replay trace of one failure.
|
|
I think that my storage (NVMe) is too fast. I realized that my initial attempt was on /dev/shm. But even after symlinking mysql-test/var to something real and after disabling true asynchronous I/O, the test refuses to fail:
./mtr --repeat=100 --mysqld=--skip-innodb-use-native-aio innodb.MDEV-26381_2
|
Each run takes only 8.5 seconds. If I start 14 tests in parallel, then the time of one concurrent test will double to 16 seconds and the storage will write 600 to 750 MB/s, but I get no crashes. If I let the server use io_uring, the time for one test still remains around 16 seconds.
|
|
marko,
I hope you take into account that the MTR test will never actually fail due to "Cannot close file ... pending operations and pending fsync" – it is a note, not a warning, MTR doesn't pay attention to it when it appears in the error log.
|
|
Roel, I am still waiting for an rr replay trace of the crash. Please note that the message rate was recently reduced in the fix of MDEV-25215. I do not think that it should affect the assertion failure.
|
|
Observed in 10.7 debug build from 9 Oct 21, running 200 threads of MDEV-26381_1.sql with 2x added drop/create/use test database (added as MDEV-26381_3.sql ):
|
10.7.1 5cc9cf9a05582307abefd8f3b57548945da86b1a (Debug)
|
2021-10-14 9:32:33 0 [Note] /test/MD091021-mariadb-10.7.1-linux-x86_64-dbg/bin/mysqld: ready for connections.
|
Version: '10.7.1-MariaDB-debug-log' socket: '/test/MD091021-mariadb-10.7.1-linux-x86_64-dbg/socket.sock' port: 42436 MariaDB Server
|
2021-10-14 9:32:46 6 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 9:32:51 6 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
...repeated...
|
2021-10-14 9:33:29 6 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 9:34:11 13 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 9:34:21 13 [Note] InnoDB: Cannot close file ./test/t2#P#p59.ibd because of pending fsync
|
2021-10-14 9:34:26 13 [Note] InnoDB: Cannot close file ./mysql/innodb_table_stats.ibd because of 1 pending operations
|
2021-10-14 9:34:32 13 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
...repeated...
|
2021-10-14 9:34:54 13 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 9:35:01 13 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 9:38:11 839 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of 1 pending operations and pending fsync
|
2021-10-14 9:38:16 839 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 9:38:21 839 [Note] InnoDB: Cannot close file ./mysql/innodb_table_stats.ibd because of pending fsync
|
2021-10-14 9:39:17 1287 [Note] InnoDB: Cannot close file ./test/t2#P#p126.ibd because of pending fsync
|
2021-10-14 9:39:22 1287 [Note] InnoDB: Cannot close file ./test/t2#P#p886.ibd because of pending fsync
|
2021-10-14 9:39:27 1287 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 9:39:33 1287 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
...repeated...
|
2021-10-14 9:40:01 1287 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 9:40:39 1619 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 9:40:49 1619 [Note] InnoDB: Cannot close file ./test/t2#P#p111.ibd because of pending fsync
|
2021-10-14 9:40:54 1619 [Note] InnoDB: Cannot close file ./test/t2#P#p1022.ibd because of pending fsync
|
2021-10-14 9:40:59 1619 [Note] InnoDB: Cannot close file ./test/t2#P#p1806.ibd because of pending fsync
|
...
|
What is also interesting is that the way the CLI output renders is different over time; even if this is running 200 threads at the same time, it's clear that threads/situations are affecting each other and output differs over time, for example long periods of ERROR 1050 (42S01) at line x: Table 't2' already exists versus a burst period of ERROR 1146 (42S02) at line x: Table 'test.t2' doesn't exist} and {{ERROR 1146 (42S02) at line x: Table 'test.t1' doesn't exist. It seems this simple test setup is able to stress any possible interaction patterns that exist in InnoDB in this area.
|
|
Also ran into this with a more original and longer version of the testcase. Added info there.
|
|
Observed that in 10.7 trunk build as of today (14 Oct 21) at least the messages can be reproduced easily with this SQL (abbreviated here), executed in random order, in many (for example 1300) threads simultaneously.
DROP TABLE IF EXISTS t2;
|
DROP TABLE IF EXISTS t1;
|
INSERT INTO t1 VALUES (0);
|
INSERT INTO t2 VALUES (0);
|
SELECT * FROM t1;
|
SELECT * FROM t2;
|
CREATE TABLE t1 (a INT(11) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=latin1 PARTITION BY LIST (a)( PARTITION p0 VALUES IN (0,1,2,3) ENGINE = InnoDB, PARTITION p1 VALUES IN (4,5,6,7) ENGINE = InnoDB, ... ... ... .... .... PARTITION p8190 VALUES IN (32760,32761,32762,32763) ENGINE = InnoDB, PARTITION p8191 VALUES IN (32764,32765,32766,32767) ENGINE = InnoDB);
|
CREATE TABLE t2 (a INT(11) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=latin1 PARTITION BY LIST (a)( PARTITION p0 VALUES IN (0,1,2,3) ENGINE = InnoDB, PARTITION p1 VALUES IN (4,5,6,7) ENGINE = InnoDB, ... ... ... .... .... PARTITION p8190 VALUES IN (32760,32761,32762,32763) ENGINE = InnoDB, PARTITION p8191 VALUES IN (32764,32765,32766,32767) ENGINE = InnoDB);
|
And that with no extra mysqld options added. Full SQL attached as MDEV-26381_4.sql
|
10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Debug)
|
2021-10-14 10:22:08 0 [Note] /test/MD141021-mariadb-10.7.1-linux-x86_64-dbg/bin/mysqld: ready for connections.
|
Version: '10.7.1-MariaDB-debug' socket: '/test/MD141021-mariadb-10.7.1-linux-x86_64-dbg/socket.sock' port: 12249 MariaDB Server
|
2021-10-14 10:22:35 11 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 10:22:42 5 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
...repeated...
|
2021-10-14 10:24:19 5 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 10:24:29 11 [Note] InnoDB: Cannot close file ./mysql/innodb_table_stats.ibd because of pending fsync
|
2021-10-14 10:27:06 23 [Note] InnoDB: Cannot close file ./test/t1#P#p74.ibd because of pending fsync
|
2021-10-14 10:27:13 23 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 10:27:21 19 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
...repeated...
|
2021-10-14 10:28:52 19 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 10:29:39 19 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 10:31:45 29 [Note] InnoDB: Cannot close file ./test/t1#P#p0.ibd because of pending fsync
|
2021-10-14 10:32:05 29 [Note] InnoDB: Cannot close file ./mysql/innodb_table_stats.ibd because of pending fsync
|
2021-10-14 10:32:10 32 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
...
|
2021-10-14 10:35:38 20 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 10:36:08 42 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 10:36:29 42 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of 1 pending operations and pending fsync
|
2021-10-14 10:36:40 42 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of 1 pending operations and pending fsync
|
2021-10-14 10:36:50 42 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 10:36:58 42 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
...
|
2021-10-14 10:38:57 9 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
2021-10-14 10:39:04 9 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of 2 pending operations and pending fsync
|
2021-10-14 10:39:12 9 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of pending fsync
|
Indeed the verbosity of the messages seems reduced here, but they still do appear. This likely also means that the crash (which was already highly sporadic) is now very hard to produce.
|
|
Note to self/others; whilst messages show with MDEV-26381_4.sql , the assert may be dependent on the mysqld options.
|
|
10000 threads of MDEV-26381_OTHER_1.sql (a simplified SQL file with less partitions) does not produce the "Cannot close file" messages, but produces these automatically (with no extra mysqld options set):
|
10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Debug)
|
2021-10-14 10:47:38 0 [Note] /test/MD141021-mariadb-10.7.1-linux-x86_64-dbg/bin/mysqld: ready for connections.
|
Version: '10.7.1-MariaDB-debug-log' socket: '/test/MD141021-mariadb-10.7.1-linux-x86_64-dbg/socket.sock' port: 12249 MariaDB Server
|
2021-10-14 10:47:41 508 [Note] Detected table cache mutex contention at instance 1: 21% waits. Additional table cache instance activated. Number of instances after activation: 2.
|
2021-10-14 10:47:42 689 [Note] Detected table cache mutex contention at instance 2: 28% waits. Additional table cache instance activated. Number of instances after activation: 3.
|
2021-10-14 10:47:58 2709 [Note] InnoDB: Number of pools: 2
|
2021-10-14 10:48:28 6327 [Note] InnoDB: Number of pools: 3
|
2021-10-14 10:49:28 13193 [Note] InnoDB: Number of pools: 4
|
2021-10-14 10:50:18 19404 [Note] InnoDB: Number of pools: 5
|
With larger versions of the testcase (many partitions/tables) MDEV-26381_OTHER_2.sql , this quickly ramps up to 11 then stabilizes. Smaller versions stabilize around 4-5 pools.
|
|
Here are some seemingly interesting observations on the two testcases from the last comment;
- Neither testcase produces the "Cannot close file" messages. This is very odd at 10k threads, especially for MDEV-26381_OTHER_2.sql
which has the large amount of partitions CREATE TABLEs (this time in 5 tables all with the same definition as the one which produces the messages i.e. the same as MDEV-26381_4.sql ).
- After some time, both servers (running both testcases, one each) hang/lock up in 'Commit implicit' (the server/OS is still normally responsive, disk is not full and there is plenty of memory free):
- MDEV-26381_OTHER_1.sql
running against debug with no extra mysqld options.
|
10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Debug)
|
+-------+------+-----------+------+---------+------+---------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+-------+------+-----------+------+---------+------+---------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| 60 | root | localhost | test | Query | 1735 | Opening tables | INSERT INTO t1 VALUES (0) | 0.000 |
|
| 62 | root | localhost | test | Query | 1745 | Opening tables | CREATE TABLE t1 (a INT(11) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=latin1 PARTITION BY LIST (a)( | 0.000 |
|
| 64 | root | localhost | test | Query | 1746 | Opening tables | CREATE TABLE t1 (a INT(11) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=latin1 PARTITION BY LIST (a)( | 0.000 |
|
| 74 | root | localhost | test | Query | 1735 | Opening tables | SELECT * FROM t1 | 0.000 |
|
...
|
| 19735 | root | localhost | test | Query | 1589 | checking permissions | DROP TABLE IF EXISTS t1 | 0.000 |
|
...
|
| 20004 | root | localhost | test | Query | 1491 | Opening tables | INSERT INTO t1 VALUES (9) | 0.000 |
|
+-------+------+-----------+------+---------+------+---------------------------------+------------------------------------------------------------------------------------------------------+----------+
|
Most threads are in 'Opening tables' or 'checking permissions', as per snashopt above, except these:
|
10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Debug)
|
| 1721 | root | localhost | test | Query | 1736 | Commit implicit | DROP TABLE IF EXISTS t1 | 0.000 |
|
| 8921 | root | localhost | test | Query | 1709 | Waiting for table metadata lock | CREATE TABLE t1 (a INT(11) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=latin1 PARTITION BY LIST (a)( | 0.000 |
|
| 8927 | root | localhost | test | Query | 1709 | Waiting for table metadata lock | INSERT INTO t1 VALUES (4) | 0.000 |
|
| 15655 | root | localhost | test | Query | 1654 | Waiting for table metadata lock | INSERT INTO t1 VALUES (0) | 0.000 |
|
| 15659 | root | localhost | test | Query | 1654 | Waiting for table metadata lock | INSERT INTO t1 VALUES (1) | 0.000 |
|
| 15665 | root | localhost | test | Query | 1654 | Waiting for table metadata lock | INSERT INTO t1 VALUES (4) | 0.000 |
|
| 15669 | root | localhost | test | Query | 1654 | Waiting for table metadata lock | INSERT INTO t1 VALUES (23) | 0.000 |
|
| 20008 | root | localhost | test | Query | 0 | starting | show processlist | 0.000 |
|
Interestingly, there are some threads older than the Commit implicit.
It is the same on optimized:
- MDEV-26381_OTHER_2.sql
running against optimized with all mysqld options from the original description.
Excluding all 'Opening tables' and 'checking permissions' we have;
|
10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Optimized)
|
+-------+------+-----------+------+---------+------+----------------------+---------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+-------+------+-----------+------+---------+------+----------------------+---------------------------+----------+
|
| 5063 | root | localhost | test | Query | 1497 | Commit implicit | DROP TABLE IF EXISTS t1 | 0.000 |
|
| 20007 | root | localhost | test | Query | 0 | starting | show processlist | 0.000 |
|
+-------+------+-----------+------+---------+------+----------------------+---------------------------+----------+
|
|
- Added hang tag for the hangs in Commit implicit. It is yet to be see if the issues are related.
- Added MDEV-26381_OTHER_1.threads.dbg.log
and MDEV-26381_OTHER_2.threads.opt.log showing full SHOW PROCESSLIST outputs during the hangs.
- After a long time, the debug server partially unlocked (more statements processing/providing output when failed), whilst part of the server remains locked. Seemed quite odd. Updated processlist at that point in MDEV-26381_OTHER_1.threads_partial_unlock.dbg.log
. This time many threads in 'Closing tables' and no threads in 'Commit explicit'. The optimized server remains locked in same way. The debug server possibly became partially unlocked due to an event like this:
|
10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Debug)
|
2021-10-14 11:41:03 11885 [Note] Detected table cache mutex contention at instance 3: 28% waits. Additional table cache instance activated. Number of instances after activation: 4.
|
2021-10-14 11:41:03 14948 [Note] Detected table cache mutex contention at instance 1: 24% waits. Additional table cache instance activated. Number of instances after activation: 5.
|
2021-10-14 11:41:03 7467 [Note] Detected table cache mutex contention at instance 3: 30% waits. Additional table cache instance activated. Number of instances after activation: 6.
|
2021-10-14 11:41:03 11019 [Note] Detected table cache mutex contention at instance 4: 30% waits. Additional table cache instance activated. Number of instances after activation: 7.
|
2021-10-14 11:44:20 15395 [Note] InnoDB: Number of pools: 6
|
- The optimized server remained at the last error log message and the Commit implicit hang before it was terminated (45 minutes after that last message):
|
10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Optimized)
|
2021-10-14 11:05:03 19264 [Note] InnoDB: Number of pools: 11
|
|
10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Optimized)
|
* | 5063 | root | localhost | test | Query | 2751 | Commit implicit | DROP TABLE IF EXISTS t1 | 0.000 |
|
- A smaller number of threads, with a large number of partitions produces the "Cannot close file" messages more easily. A large number of threads, irrespective of the number of partitions (two dozen versus many), can create the hangs. Setups as per above comments.
- The partial debug server lock never cleared, even though new transactions are going through.
|
10.7.1 b4911f5a34f8dcfb642c6f14535bc9d5d97ade44 (Debug)
|
| 14449 | root | localhost | test | Query | 3630 | closing tables | INSERT INTO t1 VALUES (0)
|
|
|
In spite of many hours of trying both today and earlier, the assert was never seen again, nor in normal runs. An rr trace can thus regrettably not be provided. Debugging the messages and hang above may give further clues as to remaining race conditions and issues. Thanks
|
|
Roel, thank your for your efforts. Messages of the following type are not issued by InnoDB. I have occasionally noticed them as well when running load tests.
2021-10-14 11:41:03 11885 [Note] Detected table cache mutex contention at instance 3: 28% waits. Additional table cache instance activated. Number of instances after activation: 4.
|
Messages like the following are referring to the InnoDB trx_pools, which is used for allocating transaction objects:
2021-10-14 11:05:03 19264 [Note] InnoDB: Number of pools: 11
|
In MDEV-25215, the string "0 pending operations" was removed from the following type of messages, and rate-limiting was implemented.
2021-10-14 10:32:05 29 [Note] InnoDB: Cannot close file ./mysql/innodb_table_stats.ibd because of pending fsync
|
Those messages are not entirely harmless, because they are a sign that the configured limit of maximum number of open files may be exceeded.
Roel, it is unclear to me whether the original reported crash is reproducible. Maybe you should try ulimit -n 100 or some small number in the shell where you start mariadbd? If creating an rr replay trace is not possible for this crash, I would be happy with a core dump on some computer that I can access.
|
|
Based on some analysis of a core dump, the reason for the assertion failure could be that the fstat system call in the following failed with ENOMEM (out of memory in the kernel):
os_offset_t
|
os_file_get_size(os_file_t file)
|
{
|
struct stat statbuf;
|
return fstat(file, &statbuf) ? os_offset_t(-1) : statbuf.st_size;
|
}
|
I was suspecting that there could have been a race condition that would have allowed another thread to close the file handle.
I do not think that this failure is any recent regression. InnoDB is notoriously unforgiving in this respect. Also memory allocation in the user address space is basically always expected to succeed.
Roel, the hangs that you observed are not directly related to the originally reported crash. Please file a separate ticket for the hangs, with some stack traces. The culprit for those could lie outside of InnoDB.
|
|
Thank you marko!
|
|
To clarify; once this bug is split, the hangs will be handled in the splitoff bug MDEV-26873, and the ENOMEM (out of kernel memory) design will be fixed in this bug.
|
|
Here is an untested patch for the 10.2 series:
diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
|
index 8075defac4c..a41d1c8b712 100644
|
--- a/storage/innobase/fil/fil0fil.cc
|
+++ b/storage/innobase/fil/fil0fil.cc
|
@@ -1003,7 +1003,9 @@ fil_space_extend_must_retry(
|
to determine how much we were able to
|
extend it */
|
os_offset_t fsize = os_file_get_size(node->handle);
|
- ut_a(fsize != os_offset_t(-1));
|
+ if (fsize == -1) {
|
+ return false;
|
+ }
|
|
last_page_no = ulint(fsize / page_size)
|
+ file_start_page_no;
|
I think that we need DBUG_EXECUTE_IF fault injection and a test case that covers this. It remains to be seen whether the callers are really ready to deal with the failure to extend a file.
|
|
Roel, did the patch work for you?
|
|
marko The issue was seen with very low frequency before, and I have not seen it in recent runs. I have also not observed the 'because of pending fsync' in recent runs. In other words; I don't know. Is there a possibility to implement the patch above as-is?
|
|
With which Linux kernel version and file system did you reproduce this? I just checked fs/stat.c and ext4_getattr() in fs/ext4/inode.c in a Linux 5.17 release candidate, and I did not immediately see where they would return ENOMEM.
|
|
Filesystem: ext4.
Kernel: it would have been the latest "apt sudo update/upgrade" kernel (or the one before that) at the time of logging this bug (17 Aug 21), on a google cloud instance running Ubuntu Focal 20.04.4 LTS which is currently running kernel 5.11.0-1028-gcp.
|
|
New testcase, which seemingly consistently reproduces the "because of pending fsync" on 10.5 to 10.9 and additional info in MDEV-25215
|
|
Is this still reproducible?
|
|
We just stumbled over this issue today with MariaDB 10.7.6 and also 10.8 and 10.9 releases.
We tracked it down to the following setup:
- MariaDB running as docker container on WSL2 with Docker for Windows
- Configured with a docker volume mount `C:\dev\local-project-temp-dir\mysql:/var/lib/mysql:rw`
- Running a bunch of SQL migrations with Flyway in a Spring Boot application
- ALTER, MODIFY or DROP statements fail (causing a deadlock situation) in a migration (let's call it "v3") if the table, these queries alter, was created by a previous migration (let's say "v1") in the same Flyway migration run.
- The migration succeeds however, if the table in question hasn't been created in the same Flyway migration run.
- If the local volume is located inside the WSL, this problem doesn't occur.
- If a Docker volume (created by `docker volume create xxx`) is used instead, the scripts run just fine.
This is the log file from the mariaDB container:
2022-09-28T08:10:57.597081197Z 2022-09-28 10:10:57+02:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.7.4+maria~focal started.
|
2022-09-28T08:10:57.991234750Z 2022-09-28 10:10:57+02:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
|
2022-09-28T08:10:57.996564582Z 2022-09-28 10:10:57+02:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:10.7.4+maria~focal started.
|
2022-09-28T08:10:58.219679191Z 2022-09-28 10:10:58+02:00 [Note] [Entrypoint]: MariaDB upgrade not required
|
2022-09-28T08:10:58.255829792Z 2022-09-28 10:10:58 0 [Note] mariadbd (server 10.7.4-MariaDB-1:10.7.4+maria~focal) starting as process 1 ...
|
2022-09-28T08:10:58.261891906Z 2022-09-28 10:10:58 0 [Warning] Setting lower_case_table_names=2 because file system for /var/lib/mysql/ is case insensitive
|
2022-09-28T08:10:58.282132535Z 2022-09-28 10:10:58 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
|
2022-09-28T08:10:58.283357965Z 2022-09-28 10:10:58 0 [Note] InnoDB: Number of transaction pools: 1
|
2022-09-28T08:10:58.283411465Z 2022-09-28 10:10:58 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
|
2022-09-28T08:10:58.283418859Z 2022-09-28 10:10:58 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts)
|
2022-09-28T08:10:58.395555300Z 2022-09-28 10:10:58 0 [Note] InnoDB: Using Linux native AIO
|
2022-09-28T08:10:58.395729241Z 2022-09-28 10:10:58 0 [Note] InnoDB: Initializing buffer pool, total size = 134217728, chunk size = 134217728
|
2022-09-28T08:10:58.396459670Z 2022-09-28 10:10:58 0 [Note] InnoDB: Completed initialization of buffer pool
|
2022-09-28T08:10:58.446937846Z 2022-09-28 10:10:58 0 [Note] InnoDB: 128 rollback segments are active.
|
2022-09-28T08:10:58.519288724Z 2022-09-28 10:10:58 0 [Note] InnoDB: Creating shared tablespace for temporary tables
|
2022-09-28T08:10:58.521303669Z 2022-09-28 10:10:58 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
|
2022-09-28T08:10:58.963321691Z 2022-09-28 10:10:58 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
|
2022-09-28T08:10:58.968652765Z 2022-09-28 10:10:58 0 [Note] InnoDB: 10.7.4 started; log sequence number 159937042; transaction id 186163
|
2022-09-28T08:10:58.968749185Z 2022-09-28 10:10:58 0 [Note] Plugin 'FEEDBACK' is disabled.
|
2022-09-28T08:10:58.971548383Z 2022-09-28 10:10:58 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
|
2022-09-28T08:10:58.983788282Z 2022-09-28 10:10:58 0 [Warning] You need to use --log-bin to make --expire-logs-days or --binlog-expire-logs-seconds work.
|
2022-09-28T08:10:58.991933171Z 2022-09-28 10:10:58 0 [Note] Server socket created on IP: '0.0.0.0'.
|
2022-09-28T08:10:58.992049779Z 2022-09-28 10:10:58 0 [Note] Server socket created on IP: '::'.
|
2022-09-28T08:10:59.055339824Z 2022-09-28 10:10:59 0 [Note] InnoDB: Buffer pool(s) load completed at 220928 10:10:59
|
2022-09-28T08:10:59.086882855Z 2022-09-28 10:10:59 0 [Note] mariadbd: ready for connections.
|
2022-09-28T08:10:59.086951574Z Version: '10.7.4-MariaDB-1:10.7.4+maria~focal' socket: '/run/mysqld/mysqld.sock' port: 3306 mariadb.org binary distribution
|
2022-09-28T08:12:17.779003626Z 2022-09-28 10:12:17 17 [ERROR] InnoDB: preallocating 147456 bytes for file ./mpn4/track.ibd failed with error 2
|
2022-09-28T08:12:17.779102832Z 2022-09-28 10:12:17 0x7fa520494700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.7.4/storage/innobase/fil/fil0fil.cc line 610
|
2022-09-28T08:12:17.779121587Z InnoDB: Failing assertion: fsize != os_offset_t(-1)
|
2022-09-28T08:12:17.779127719Z InnoDB: We intentionally generate a memory trap.
|
2022-09-28T08:12:17.779132437Z InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
|
2022-09-28T08:12:17.779137337Z InnoDB: If you get repeated assertion failures or crashes, even
|
2022-09-28T08:12:17.779141985Z InnoDB: immediately after the mariadbd startup, there may be
|
2022-09-28T08:12:17.779146574Z InnoDB: corruption in the InnoDB tablespace. Please refer to
|
2022-09-28T08:12:17.779151012Z InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
|
2022-09-28T08:12:17.779155531Z InnoDB: about forcing recovery.
|
2022-09-28T08:12:17.779159899Z 220928 10:12:17 [ERROR] mysqld got signal 6 ;
|
2022-09-28T08:12:17.779164397Z This could be because you hit a bug. It is also possible that this binary
|
2022-09-28T08:12:17.779180347Z or one of the libraries it was linked against is corrupt, improperly built,
|
2022-09-28T08:12:17.779184355Z or misconfigured. This error can also be caused by malfunctioning hardware.
|
2022-09-28T08:12:17.779188142Z
|
2022-09-28T08:12:17.779191719Z To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
2022-09-28T08:12:17.779195586Z
|
2022-09-28T08:12:17.779199263Z We will try our best to scrape up some info that will hopefully help
|
2022-09-28T08:12:17.779203380Z diagnose the problem, but since we have already crashed,
|
2022-09-28T08:12:17.779238897Z something is definitely wrong and this may fail.
|
2022-09-28T08:12:17.779243866Z
|
2022-09-28T08:12:17.779247924Z Server version: 10.7.4-MariaDB-1:10.7.4+maria~focal
|
2022-09-28T08:12:17.779251972Z key_buffer_size=134217728
|
2022-09-28T08:12:17.779255859Z read_buffer_size=131072
|
2022-09-28T08:12:17.779259616Z max_used_connections=11
|
2022-09-28T08:12:17.779263343Z max_threads=153
|
2022-09-28T08:12:17.779267100Z thread_count=11
|
2022-09-28T08:12:17.779270977Z It is possible that mysqld could use up to
|
2022-09-28T08:12:17.779274874Z key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467995 K bytes of memory
|
2022-09-28T08:12:17.779278852Z Hope that's ok; if not, decrease some variables in the equation.
|
2022-09-28T08:12:17.779282729Z
|
2022-09-28T08:12:17.779286416Z Thread pointer: 0x7fa4d4000c58
|
2022-09-28T08:12:17.779290243Z Attempting backtrace. You can use the following information to find out
|
2022-09-28T08:12:17.779294171Z where mysqld died. If you see no messages after this, something went
|
2022-09-28T08:12:17.779298118Z terribly wrong...
|
2022-09-28T08:12:17.779303017Z stack_bottom = 0x7fa520493d98 thread_stack 0x49000
|
2022-09-28T08:12:17.781611209Z Printing to addr2line failed
|
2022-09-28T08:12:17.781784559Z mariadbd(my_print_stacktrace+0x32)[0x561b6b988e52]
|
2022-09-28T08:12:17.782347920Z mariadbd(handle_fatal_signal+0x485)[0x561b6b4632b5]
|
2022-09-28T08:12:17.784873369Z /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7fa535db1420]
|
2022-09-28T08:12:17.786797421Z /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7fa5358b500b]
|
2022-09-28T08:12:17.786866150Z /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7fa535894859]
|
2022-09-28T08:12:17.788989976Z mariadbd(+0x6721e8)[0x561b6b0ab1e8]
|
2022-09-28T08:12:17.789205635Z mariadbd(+0x6825bb)[0x561b6b0bb5bb]
|
2022-09-28T08:12:17.789539300Z mariadbd(+0x682863)[0x561b6b0bb863]
|
2022-09-28T08:12:17.789862061Z mariadbd(+0x68583d)[0x561b6b0be83d]
|
2022-09-28T08:12:17.790313677Z mariadbd(+0xe34f06)[0x561b6b86df06]
|
2022-09-28T08:12:17.790827351Z mariadbd(+0xd2ec09)[0x561b6b767c09]
|
2022-09-28T08:12:17.791293038Z mariadbd(+0xd2225b)[0x561b6b75b25b]
|
2022-09-28T08:12:17.791670050Z mariadbd(+0x8966ee)[0x561b6b2cf6ee]
|
2022-09-28T08:12:17.792054371Z mariadbd(_Z17mysql_alter_tableP3THDPK25st_mysql_const_lex_stringS3_P14HA_CREATE_INFOP10TABLE_LISTP10Alter_infojP8st_orderbb+0x4aad)[0x561b6b2ddf6d]
|
2022-09-28T08:12:17.792456985Z mariadbd(_ZN19Sql_cmd_alter_table7executeEP3THD+0x3c7)[0x561b6b347617]
|
2022-09-28T08:12:17.792872695Z mariadbd(_Z21mysql_execute_commandP3THDb+0x110e)[0x561b6b22435e]
|
2022-09-28T08:12:17.793232864Z mariadbd(_Z11mysql_parseP3THDPcjP12Parser_state+0x1e7)[0x561b6b213f07]
|
2022-09-28T08:12:17.793580742Z mariadbd(_Z16dispatch_command19enum_server_commandP3THDPcjb+0x1095)[0x561b6b2201e5]
|
2022-09-28T08:12:17.793955719Z mariadbd(_Z10do_commandP3THDb+0x138)[0x561b6b222428]
|
2022-09-28T08:12:17.794383056Z mariadbd(_Z24do_handle_one_connectionP7CONNECTb+0x3b7)[0x561b6b342637]
|
2022-09-28T08:12:17.794778822Z mariadbd(handle_one_connection+0x5d)[0x561b6b34298d]
|
2022-09-28T08:12:17.795246299Z mariadbd(+0xc50556)[0x561b6b689556]
|
2022-09-28T08:12:17.797282040Z /lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7fa535da5609]
|
2022-09-28T08:12:17.799272466Z /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7fa535991133]
|
2022-09-28T08:12:17.799322119Z
|
2022-09-28T08:12:17.799329503Z Trying to get some variables.
|
2022-09-28T08:12:17.799331937Z Some pointers may be invalid and cause the dump to abort.
|
2022-09-28T08:12:17.799334212Z Query (0x7fa4d4010920): -- Table 'track' -------------------------------------------------------------------------------------------------------
|
2022-09-28T08:12:17.799336225Z
|
2022-09-28T08:12:17.799337848Z -- update musical features
|
2022-09-28T08:12:17.799341345Z alter table track
|
2022-09-28T08:12:17.799345062Z add musical_feature_secondary_mood_cluster varchar(50) null after musical_feature_primary_mood_cluster,
|
2022-09-28T08:12:17.799351815Z add musical_feature_secondary_instrument varchar(50) null after musical_feature_dominant_instrument,
|
2022-09-28T08:12:17.799392220Z drop column musical_feature_bpm
|
2022-09-28T08:12:17.799401257Z
|
2022-09-28T08:12:17.799412989Z Connection ID (thread ID): 17
|
2022-09-28T08:12:17.799414953Z Status: NOT_KILLED
|
2022-09-28T08:12:17.799416646Z
|
2022-09-28T08:12:17.799418560Z Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
|
2022-09-28T08:12:17.799422697Z
|
2022-09-28T08:12:17.799424330Z The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
|
2022-09-28T08:12:17.799426154Z information that should help you find out what is causing the crash.
|
2022-09-28T08:12:17.799477245Z Writing a core file...
|
2022-09-28T08:12:17.799536877Z Working directory at /var/lib/mysql
|
2022-09-28T08:12:17.799568316Z Resource Limits:
|
2022-09-28T08:12:17.799574297Z Limit Soft Limit Hard Limit Units
|
2022-09-28T08:12:17.799578605Z Max cpu time unlimited unlimited seconds
|
2022-09-28T08:12:17.799582552Z Max file size unlimited unlimited bytes
|
2022-09-28T08:12:17.799586409Z Max data size unlimited unlimited bytes
|
2022-09-28T08:12:17.799590237Z Max stack size 8388608 unlimited bytes
|
2022-09-28T08:12:17.799594004Z Max core file size 0 unlimited bytes
|
2022-09-28T08:12:17.799597761Z Max resident set unlimited unlimited bytes
|
2022-09-28T08:12:17.799601498Z Max processes unlimited unlimited processes
|
2022-09-28T08:12:17.799605265Z Max open files 1048576 1048576 files
|
2022-09-28T08:12:17.799609743Z Max locked memory 83968000 83968000 bytes
|
2022-09-28T08:12:17.799615324Z Max address space unlimited unlimited bytes
|
2022-09-28T08:12:17.799621024Z Max file locks unlimited unlimited locks
|
2022-09-28T08:12:17.799626555Z Max pending signals 62834 62834 signals
|
2022-09-28T08:12:17.799632175Z Max msgqueue size 819200 819200 bytes
|
2022-09-28T08:12:17.799638637Z Max nice priority 0 0
|
2022-09-28T08:12:17.799686116Z Max realtime priority 0 0
|
2022-09-28T08:12:17.799725771Z Max realtime timeout unlimited unlimited us
|
2022-09-28T08:12:17.799736621Z Core pattern: core
|
2022-09-28T08:12:17.799741841Z
|
2022-09-28T08:12:17.799748593Z Fatal signal 11 while backtracing
|
To me, it looks like MariaDB on the docker image cannot read/write lock/unlock fast enough to a local volume on Windows.
Just as a side note: We encountered similar issues half a year ago and they were related to table renames and we were able to link them to this specific WSL issue: https://github.com/microsoft/WSL/issues/8443
However, this issue seems to be resolved now...
|
|
tofi86, thank you. That WSL bug was fingered as the cause of MDEV-24189. Can you please provide more details there?
This bug that was filed by Roel feels to me more like a potential Linux kernel bug, while your bug looks like something in Microsoft’s code.
I have seen some Linux kernels misbehave when a system is grossly overloaded. This happens in our internal stress testing environments, which could be running hundreds of database server processes concurrently. Occasionally, server reboots are necessary.
|
|
Critical aspect of this bug is : preallocating 98304 bytes for file ./test/t2#P#p4560.ibd failed with error 9
The 9 error code is EBADF, so the file descriptor isn't valid. in os_file_set_size this could be straight from fstat rather than posix_fallocate that the error implies.
The os_file_set_size is called by fil_space_extend_must_retry
in fil_mutex_enter_and_prepare_for_io, there is a } else if (!node || node->is_open()). If this condition fails, the assumption is the else condition will open node, however it doesn't. and then can call fil_space_extend_must_retry potentially with a not open file.
Maybe it be forced with:
$ git diff
|
diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
|
index ebeb1f66101..567e5ff0288 100644
|
--- a/storage/innobase/fil/fil0fil.cc
|
+++ b/storage/innobase/fil/fil0fil.cc
|
@@ -930,7 +930,14 @@ fil_mutex_enter_and_prepare_for_io(ulint space_id)
|
anything; if the space does not exist, we handle the
|
situation in the function which called this
|
function */
|
+ if (node->is_open() && space->recv_size !=0) {
|
+ node->n_pending= 0;
|
+ node->close();
|
+ goto next;
|
+ }
|
+
|
} else {
|
+next:
|
while (fil_system.n_open >= srv_max_n_open_files) {
|
/* Too many files are open */
|
if (fil_try_to_close_file_in_LRU(count > 1)) {
|
|
|
|
The frequency of the crash is very low across all test runs. It does however happen from time to time and I found 3 occurrences, of which two were in a 11.2[.0] debug build of 22 April 2023 (exact revision unknown, build previously deleted) and one in a 11.0 debug build of 7 April 2023 at revision 8e55d7ea4a2f94ae3f38fdd8785778612d4b1203. Updating versions to match.
fsize != os_offset_t(-1)|SIGABRT|ut_dbg_assertion_failed|fil_space_extend_must_retry|fil_space_extend|fsp_try_extend_data_file_with_pages
|
In all 3 cases, partitions are involved. The important part of the SQL trace seems to always be something similar to:
CREATE TABLE t1 (a INT NOT NULL, KEY(a)) ENGINE='InnoDB' PARTITION BY HASH(a) PARTITIONS 8192;
|
Or
CREATE TABLE t2 (a INT, KEY(a)) ENGINE='a' PARTITION BY KEY(a) PARTITIONS 8192;
|
I expect that running enough iterations of either of these statements (or both) would eventually lead to a crash.
|
|
I was able to repeat a related issue under MTR in the 11.0 (of 7 April 2023) quoted in the previous comment using this testcase:
--source include/have_innodb.inc
|
--source include/have_partition.inc
|
--delimiter $$
|
BEGIN NOT ATOMIC
|
DECLARE CONTINUE HANDLER FOR 1210 BEGIN END;
|
LOOP
|
DROP TABLE IF EXISTS t1;
|
DROP TABLE IF EXISTS t2;
|
CREATE TABLE t1 (a INT NOT NULL, KEY(a)) ENGINE='InnoDB' PARTITION BY HASH(a) PARTITIONS 8192;
|
CREATE TABLE t2 (a INT, KEY(a)) ENGINE='InnoDB' PARTITION BY KEY(a) PARTITIONS 8192;
|
SELECT SLEEP(1);
|
END LOOP;
|
END $$
|
--delimiter ;
|
Executed as:
./mtr --parallel 50 --repeat 1000 test
|
|
11.0.2 8e55d7ea4a2f94ae3f38fdd8785778612d4b1203
|
2023-09-08 14:09:19 0 [Note] /data/NEWBUGS/test/mariadb-11.0.2-linux-x86_64/bin/mariadbd: ready for connections.
|
Version: '11.0.2-MariaDB-debug-log' socket: '/data/NEWBUGS/test/mariadb-11.0.2-linux-x86_64/mysql-test/var/tmp/4/mysqld.1.sock' port: 16280 MariaDB Server
|
2023-09-08 14:09:23 4 [Note] InnoDB: Cannot close file ./test/t1#P#p412.ibd because of 1 pending operations
|
2023-09-08 14:09:30 4 [Note] InnoDB: Cannot close file ./test/t1#P#p1250.ibd because of 1 pending operations
|
2023-09-08 14:09:35 4 [Note] InnoDB: Cannot close file ./test/t1#P#p1884.ibd because of 1 pending operations
|
2023-09-08 14:09:40 4 [Note] InnoDB: Cannot close file ./test/t1#P#p2448.ibd because of 1 pending operations
|
2023-09-08 14:09:45 4 [Note] InnoDB: Cannot close file ./test/t1#P#p3074.ibd because of 1 pending operations
|
2023-09-08 14:09:47 0x14d4519ec640 InnoDB: Assertion failure in file /test/11.0_dbg/storage/innobase/fil/fil0fil.cc line 2981
|
InnoDB: Failing assertion: fil_system.n_open == n_open
|
InnoDB: We intentionally generate a memory trap.
|
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
|
InnoDB: If you get repeated assertion failures or crashes, even
|
InnoDB: immediately after the mariadbd startup, there may be
|
InnoDB: corruption in the InnoDB tablespace. Please refer to
|
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
|
InnoDB: about forcing recovery.
|
The assert happens 2 seconds after the last cannot close file note.
This assert is also mentioned in MDEV-31347, MDEV-31152, and MDEV-31080.
|
11.0.2 8e55d7ea4a2f94ae3f38fdd8785778612d4b1203
|
Core was generated by `/data/NEWBUGS/test/mariadb-11.0.2-linux-x86_64/bin/mariadbd --defaults-group-su'.
|
Program terminated with signal SIGABRT, Aborted.
|
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=22902134982208)
|
at ./nptl/pthread_kill.c:44
|
[Current thread is 1 (Thread 0x14d4519ec640 (LWP 263893))]
|
(gdb) bt
|
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=22902134982208) at ./nptl/pthread_kill.c:44
|
#1 __pthread_kill_internal (signo=6, threadid=22902134982208) at ./nptl/pthread_kill.c:78
|
#2 __GI___pthread_kill (threadid=22902134982208, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
|
#3 0x000014d474442476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
|
#4 0x000014d4744287f3 in __GI_abort () at ./stdlib/abort.c:79
|
#5 0x000055f3dfb1592c in ut_dbg_assertion_failed (expr=expr@entry=0x55f3e0102372 "fil_system.n_open == n_open", file=file@entry=0x55f3e01003e0 "/test/11.0_dbg/storage/innobase/fil/fil0fil.cc", line=line@entry=2981) at /test/11.0_dbg/storage/innobase/ut/ut0dbg.cc:60
|
#6 0x000055f3dfbf5ad0 in fil_validate () at /test/11.0_dbg/storage/innobase/fil/fil0fil.cc:2981
|
#7 0x000055f3dfbf5b30 in fil_validate_skip () at /test/11.0_dbg/storage/innobase/fil/fil0fil.cc:212
|
#8 0x000055f3dfbf5b98 in fil_space_t::io (this=0x55f3e20bfc20, type=@0x14d4519eb6e0: {bpage = 0x14d4732ce820, slot = 0x0, node = 0x55f3e20bfd60, type = IORequest::WRITE_ASYNC}, offset=49152, len=len@entry=16384, buf=buf@entry=0x14d4732fc000, bpage=bpage@entry=0x14d4732ce820) at /test/11.0_dbg/storage/innobase/fil/fil0fil.cc:2759
|
#9 0x000055f3dfb862e8 in buf_dblwr_t::flush_buffered_writes_completed (this=<optimized out>, request=@0x14d4519eb960: {bpage = 0x0, slot = 0x0, node = 0x55f3e2097ab0, type = IORequest::DBLWR_BATCH}) at /test/11.0_dbg/storage/innobase/buf/buf0dblwr.cc:713
|
#10 0x000055f3dfbf6420 in fil_aio_callback (request=@0x14d4519eb960: {bpage = 0x0, slot = 0x0, node = 0x55f3e2097ab0, type = IORequest::DBLWR_BATCH}) at /test/11.0_dbg/storage/innobase/fil/fil0fil.cc:2866
|
#11 0x000055f3df9dd26d in io_callback (cb=0x55f3e205d838) at /test/11.0_dbg/storage/innobase/os/os0file.cc:3452
|
#12 0x000055f3dfc732a1 in tpool::task_group::execute (this=0x55f3e2051568, t=t@entry=0x55f3e205d890) at /test/11.0_dbg/tpool/task_group.cc:70
|
#13 0x000055f3dfc734f9 in tpool::task::execute (this=0x55f3e205d890) at /test/11.0_dbg/tpool/task.cc:32
|
#14 0x000055f3dfc71284 in tpool::thread_pool_generic::worker_main (this=0x55f3e1ff7e20, thread_var=0x55f3e1ff9050) at /test/11.0_dbg/tpool/tpool_generic.cc:580
|
#15 0x000055f3dfc72600 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/11/bits/invoke.h:74
|
#16 std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=<optimized out>) at /usr/include/c++/11/bits/invoke.h:96
|
#17 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:253
|
#18 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:260
|
#19 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:211
|
#20 0x000014d4748dc253 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
|
#21 0x000014d474494b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
|
#22 0x000014d474526a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
|
The assertion is readily repeatable.
|
|
It shows the Cannot close file notes after some time for 11.3 also, though this time on innodb_index_stats.ibd only:
|
11.3.0 7ba9c7fb84b5f28e4736656b57d9508b70ca6369 (Debug)
|
2023-09-08 15:32:39 0 [Note] /test/MD020923-mariadb-11.3.0-linux-x86_64-dbg/bin/mariadbd: ready for connections.
|
Version: '11.3.0-MariaDB-debug-log' socket: '/test/MD020923-mariadb-11.3.0-linux-x86_64-dbg/mariadb-test/var/tmp/28/mysqld.1.sock' port: 16500 MariaDB Server
|
2023-09-08 15:32:55 4 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of 1 pending operations
|
2023-09-08 15:40:14 4 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of 8 pending operations
|
2023-09-08 15:41:20 4 [Note] InnoDB: Cannot close file ./mysql/innodb_table_stats.ibd because of 3 pending operations
|
2023-09-08 15:47:25 4 [Note] InnoDB: Cannot close file ./mysql/innodb_index_stats.ibd because of 1 pending operations
|
The number of open files is set to the maximum (1048576) so it is unlikely that we ran out of file descriptors. sudo lsof | wc -l gives 455507 (~43% of capacity).
Also, oddly, only ./var/28/log/mysqld.1.err exists (which showed the Cannot close file issue above), there are no other MTR instances that created a log (though all subdirectories are present and have a full data directory in them, but no other files - only directory 28 has the files in the log dir).
/test/MD020923-mariadb-11.3.0-linux-x86_64-dbg/mariadb-test$ ls var/*/log/*err
|
var/28/log/mysqld.1.err
|
test/MD020923-mariadb-11.3.0-linux-x86_64-dbg/mariadb-test$
|
|
|
From testing, it looks like at least 1) the fil_system.n_open == n_open assertion and 2) the Cannot close file ... t1#P#p412.ibd like notes are resolved post 11.0.2 8e55d7ea4a2f94ae3f38fdd8785778612d4b1203 or earlier.
It is not clear if the original crash still exists or not.
Also, even on recent builds, there is still the Cannot close file ./mysql/innodb_index_stats.ibd notes, which are somewhat easily reproducible with the MTR testcase here.
And, there is the MTR oddity when running that testcase with --parallel=50 that only a single var subdir seems to end up with log files.
marko Can you please have a look at the innodb_index_stats.ibd notes, and see if you get the same MTR oddness when running the testcase.
(Any MTR 900 second timeouts are normal and due to the loop in the testcase. It likely indicates proper working of the server.)
|
|
danblack had written the following:
Critical aspect of this bug is : preallocating 98304 bytes for file ./test/t2#P#p4560.ibd failed with error 9
The 9 error code is EBADF, so the file descriptor isn't valid. in os_file_set_size this could be straight from fstat rather than posix_fallocate that the error implies.
The cause for that could be MDEV-31347 or MDEV-31256.
There is an open bug related to InnoDB persistent statistics and partitioned tables, in MDEV-29566. Roel, is everything working for you if you set innodb_stats_persistent=OFF?
|
|
The MTR single error log/thread oddity turns out to be an MTR bug, logged as MDEV-32185 MTR: --parallel does not work correctly when using LOOP
|
|
We encountered a similiar crash and managed to reproduce it by doing intensive table operations. After investigations, we found out that the problem was the global variable innodb_open_files, which we have set to 2000. After raising it to 16384, the problem disappeared.
The following scenario managed to crash it in ~3-5 min when inodb_open_files was set to 2000.
First created some dummy sql
n_dbs=20
|
n_tbls=1000
|
|
for (( i=1; i<=${n_dbs}; i++ )); do
|
cat <<EOF >db${i}.sql
|
DROP DATABASE IF EXISTS db${i};
|
CREATE DATABASE db${i};
|
USE db${i};
|
CREATE TABLE \`cycle\` (
|
\`cc_cycle_id\` int(10) unsigned NOT NULL AUTO_INCREMENT,
|
PRIMARY KEY (\`cc_cycle_id\`)
|
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
|
CREATE TABLE \`account\` (
|
\`ac_account_id\` int(10) unsigned NOT NULL AUTO_INCREMENT,
|
PRIMARY KEY (\`ac_account_id\`)
|
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
|
EOF
|
for (( j=1; j<=${n_tbls}; j++ )); do
|
cat <<EOF >>db${i}.sql
|
CREATE TABLE \`johndoe${j}\` (
|
\`xyz_johndoe_id\` int(10) unsigned NOT NULL AUTO_INCREMENT,
|
\`xyz_cc_cycle_id\` int(10) unsigned NOT NULL,
|
\`xyz_transaction_number\` varchar(100) NOT NULL,
|
\`xyz_record_number\` varchar(45) DEFAULT NULL,
|
\`xyz_ac_account_id\` int(10) unsigned NOT NULL,
|
\`xyz_buy_sell\` tinyint(4) NOT NULL,
|
\`xyz_quantity\` int(10) unsigned NOT NULL,
|
\`xyz_in_instrument_id\` int(10) unsigned NOT NULL,
|
\`xyz_price\` decimal(18,8) NOT NULL,
|
\`xyz_settlement_price\` decimal(18,8) NOT NULL,
|
\`xyz_market_value\` decimal(12,2) NOT NULL,
|
\`xyz_pmvarn\` decimal(12,2) NOT NULL,
|
\`xyz_multiplier\` decimal(12,2) NOT NULL,
|
\`xyz_comm\` decimal(9,2) DEFAULT NULL,
|
\`xyz_fee1\` decimal(9,2) DEFAULT NULL,
|
\`xyz_fee2\` decimal(9,2) DEFAULT NULL,
|
\`xyz_fee3\` decimal(9,2) DEFAULT NULL,
|
\`xyz_fee4\` decimal(9,2) DEFAULT NULL,
|
\`xyz_fee5\` decimal(9,2) DEFAULT NULL,
|
\`xyz_fee6\` decimal(9,2) DEFAULT NULL,
|
\`xyz_fee7\` decimal(9,2) DEFAULT NULL,
|
\`xyz_fee8\` decimal(9,2) DEFAULT NULL,
|
\`xyz_fee9\` decimal(9,2) DEFAULT NULL,
|
\`xyz_gichg\` decimal(9,2) DEFAULT NULL,
|
\`xyz_bkchg\` decimal(9,2) DEFAULT NULL,
|
\`xyz_other\` decimal(9,2) DEFAULT NULL,
|
\`xyz_batch\` varchar(10) DEFAULT NULL,
|
\`xyz_source_file\` varchar(200) NOT NULL,
|
\`xyz_source_record_no\` int(11) NOT NULL,
|
PRIMARY KEY (\`xyz_johndoe_id\`),
|
KEY \`FKa_lop_tt_trade_transaction_id_idx${j}\` (\`xyz_transaction_number\`),
|
KEY \`FKa_lop_cc_cycle_id_idx${j}\` (\`xyz_cc_cycle_id\`),
|
KEY \`FKa_lop_linn_tt_idx${j}\` (\`xyz_transaction_number\`),
|
KEY \`fka_LINN_open_position_account1_idx${j}\` (\`xyz_ac_account_id\`),
|
KEY \`fka_LINN_open_position_instrument1_idx${j}\` (\`xyz_in_instrument_id\`),
|
CONSTRAINT \`FKa_lop_cc_cycle_id${j}\` FOREIGN KEY (\`xyz_cc_cycle_id\`) REFERENCES \`cycle\` (\`cc_cycle_id\`) ON DELETE NO ACTION ON UPDATE NO ACTION,
|
CONSTRAINT \`fka_LINN_open_position_account1${j}\` FOREIGN KEY (\`xyz_ac_account_id\`) REFERENCES \`account\` (\`ac_account_id\`) ON DELETE NO ACTION ON UPDATE NO ACTION
|
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
|
EOF
|
done
|
done
|
And then applied them in parallel:
creds="-uroot -pjohndoe -h127.0.0.1 -P3306"
|
touch dump_run
|
date
|
for db in $(find . -name "*.sql"); do
|
while [[ -f ./dump_run ]]; do
|
mysql ${creds} <$db || break
|
done &
|
done
|
|
|
Sebieee, which version of MariaDB were you using? Did it include the fixes of the bugs MDEV-31256 and MDEV-31347?
|
|
For the reproducible test case I used the image docker.io/library/mariadb:11.0.2, running with podman on CentOS 7.
|
|
Sebieee, 11.0.2 is affected by MDEV-31256 and MDEV-31347; their fixed version includes 11.0.3. Can you test something newer?
|