[MDEV-25215] Excessive logging "Cannot close file ./test/FTS_xxx.ibd because of n pending operations and pending fsync - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.5, 10.6
Fix Version/s: 10.5.13, 10.6.5
Component/s: Storage Engine - InnoDB
Labels:
- regression-10.5

Description

Within ~1 min test on current 10.6 with default log_warnings level I'm getting in the error log ~8500 records like this:

2021-03-22  2:58:35 8 [Note] InnoDB: Cannot close file ./test/FTS_00000000000012c8_00000000000015f9_INDEX_1.ibd because of 0 pending operations and pending fsync

2021-03-22  2:58:35 8 [Note] InnoDB: Cannot close file ./test/FTS_00000000000012c8_00000000000015f9_INDEX_2.ibd because of 0 pending operations and pending fsync

2021-03-22  2:58:35 8 [Note] InnoDB: Cannot close file ./test/FTS_00000000000012c8_00000000000015f9_INDEX_4.ibd because of 0 pending operations and pending fsync

2021-03-22  2:58:35 8 [Note] InnoDB: Cannot close file ./test/FTS_00000000000012c8_00000000000015f9_INDEX_5.ibd because of 1 pending operations and pending fsync

2021-03-22  2:58:35 8 [Note] InnoDB: Cannot close file ./test/FTS_00000000000012c8_00000000000015f9_INDEX_6.ibd because of 0 pending operations and pending fsync

...

10.5 seems to produce only those with "0 pending operations", so there are less of them, but still, ~4000 in a few-minute test. And these ones (with zero) are especially confusing.

I think if these are real problems which a user needs to take care of, there should be warnings or errors (and given the persistence, maybe we are missing at least a debug assertion or something); and if these are just technical notifications, there shouldn't be so many of them.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

cannotclosefile.JPG
83 kB
2021-09-24 07:32
mycnf.txt
2 kB
2021-09-24 07:35
mysql-0-20211028.log
219 kB
2021-10-28 10:42
screenshot-1.png
19 kB
2021-08-18 12:51
screenshot-2.png
35 kB
2021-10-07 08:56

Issue Links

is caused by

MDEV-23855 InnoDB log checkpointing causes regression for write-heavy OLTP

Closed

relates to

MDEV-26381 InnoDB: Failing assertion: fsize != os_offset_t(-1) on CREATE TABLE with many partitions / ENOMEM (out of kernel memory) from fstat handling

Closed

MDEV-28110 FLUSH TABLE ... WITH READ LOCK output excessively verbose

Open

MDEV-30829 InnoDB: Cannot close file <tablename>.ibd because of pending fsync

Closed

Activity

Ascending order - Click to sort in descending order

Elena Stepanova created issue - 2021-03-22 01:20

Simon Avery added a comment - 2021-08-10 08:28 - edited

We encountered this bug today on a busy production server and researching brought us here.

Mariadb: Server version: 10.5.11 (From Maria's C7 repo)
OS: CentOS 7

2021-08-09 22:49:11 5443978 [Note] InnoDB: Cannot close file ./DB0/Tab1.ibd because of 20 pending operations and pending fsync
2021-08-03 13:05:10 145279 [Note] InnoDB: Cannot close file ./DB1/Tab1#P#per.ibd because of 0 pending operations and pending fsync
2021-08-03 13:05:10 145911 [Note] InnoDB: Cannot close file ./DB1/Tab1#P#per.ibd because of 0 pending operations and pending fsync
2021-08-03 13:05:10 145515 [Note] InnoDB: Cannot close file ./DB1/Tab1#P#per.ibd because of 0 pending operations and pending fsync

Variations of this, affecting multiple databases and tables.

Total number of similar log entries is 200149 during this period.

Restarting Maria appears to have (temporarily) resolved the issue.

Simon Avery added a comment - 2021-08-10 08:28 - edited We encountered this bug today on a busy production server and researching brought us here. Mariadb: Server version: 10.5.11 (From Maria's C7 repo) OS: CentOS 7 2021-08-09 22:49:11 5443978 [Note] InnoDB: Cannot close file ./DB0/Tab1.ibd because of 20 pending operations and pending fsync 2021-08-03 13:05:10 145279 [Note] InnoDB: Cannot close file ./DB1/Tab1#P#per.ibd because of 0 pending operations and pending fsync 2021-08-03 13:05:10 145911 [Note] InnoDB: Cannot close file ./DB1/Tab1#P#per.ibd because of 0 pending operations and pending fsync 2021-08-03 13:05:10 145515 [Note] InnoDB: Cannot close file ./DB1/Tab1#P#per.ibd because of 0 pending operations and pending fsync Variations of this, affecting multiple databases and tables. Total number of similar log entries is 200149 during this period. Restarting Maria appears to have (temporarily) resolved the issue.

Simon Avery made changes - 2021-08-18 12:51

Field	Original Value	New Value
Attachment		screenshot-1.png [ 58600 ]

Simon Avery added a comment - 2021-08-18 12:52

Still ongoing for us. There's some suggestion that it's linked to some queries that aren't completing, although I can't confirm that definitely.

To give some indication of the sheer amount of lines in the log this is creating, this is the counts from one server over 14 days.

Simon Avery added a comment - 2021-08-18 12:52 Still ongoing for us. There's some suggestion that it's linked to some queries that aren't completing, although I can't confirm that definitely. To give some indication of the sheer amount of lines in the log this is creating, this is the counts from one server over 14 days.

Marko Mäkelä added a comment - 2021-08-30 10:32

digdilem, do you see this with MariaDB 10.5.12? ~~MDEV-25113~~, ~~MDEV-25801~~, ~~MDEV-25954~~ were fixed in that release.

Marko Mäkelä added a comment - 2021-08-30 10:32 digdilem , do you see this with MariaDB 10.5.12? MDEV-25113 , MDEV-25801 , MDEV-25954 were fixed in that release.

Simon Avery added a comment - 2021-08-31 09:44

Hi Marko

Maybe, maybe not...

I have two machines reporting this.

Server 1 is on 10.5.9, so let's not focus on that. That's still reporting it occasionally, but not in volume.

Server 2 was upgraded from 10.5.11 to 10.5.12 (from Maria.org's Centos7/EL7 repos) on 3rd August (just before my last report). It's still reporting occasion output like so;

2021-08-21 21:39:37 4117695 [Note] InnoDB: Cannot close file ./Db1/Tb1.ibd because of 0 pending operations and pending fsync

However, nothing more recent than this - so that's 10 days clear.

I'll continue to monitor and should it resurface on anything 10.5.12 onwards, I'll update this.

Thanks all,

Simon Avery added a comment - 2021-08-31 09:44 Hi Marko Maybe, maybe not... I have two machines reporting this. Server 1 is on 10.5.9, so let's not focus on that. That's still reporting it occasionally, but not in volume. Server 2 was upgraded from 10.5.11 to 10.5.12 (from Maria.org's Centos7/EL7 repos) on 3rd August (just before my last report). It's still reporting occasion output like so; 2021-08-21 21:39:37 4117695 [Note] InnoDB: Cannot close file ./Db1/Tb1.ibd because of 0 pending operations and pending fsync However, nothing more recent than this - so that's 10 days clear. I'll continue to monitor and should it resurface on anything 10.5.12 onwards, I'll update this. Thanks all,

Marko Mäkelä added a comment - 2021-09-14 15:13

elenst, can you please check if you can yourself reproduce this with recent 10.5 or 10.6 trees?

I see that we had an internal discussion on this on August 30, and mleich only found a very small number of occurrences of this (I guess, from runs with rr record, which would at times create very ‘unfair’ scheduling of threads). He conducted some tests on HDD in order to get more realistic I/O performance and was not able to reproduce massive amounts of this.

Marko Mäkelä added a comment - 2021-09-14 15:13 elenst , can you please check if you can yourself reproduce this with recent 10.5 or 10.6 trees? I see that we had an internal discussion on this on August 30, and mleich only found a very small number of occurrences of this (I guess, from runs with rr record , which would at times create very ‘unfair’ scheduling of threads). He conducted some tests on HDD in order to get more realistic I/O performance and was not able to reproduce massive amounts of this.

Elena Stepanova added a comment - 2021-09-14 15:23

No, sorry, I can't, not in any near future. I assume if Matthias was able to reproduce them at all, he can take one of versions for which the complaints were raised and run the tests on a normal non-rr release-like build.

Elena Stepanova added a comment - 2021-09-14 15:23 No, sorry, I can't, not in any near future. I assume if Matthias was able to reproduce them at all, he can take one of versions for which the complaints were raised and run the tests on a normal non-rr release-like build.

Julien Fritsch made changes - 2021-09-14 15:24

Assignee

Marko Mäkelä [ marko ]

Elena Stepanova [ elenst ]

Elena Stepanova made changes - 2021-09-14 15:25

Assignee

Elena Stepanova [ elenst ]

Marko Mäkelä [ marko ]

Marko Mäkelä added a comment - 2021-09-23 17:43

digdilem, how does it look like after more time running MariaDB 10.5.12?

Apart from ~~MDEV-25113~~, ~~MDEV-25801~~, ~~MDEV-25954~~, which were fixed in 10.5.12, there is also ~~MDEV-26445~~ that could be of interest, once the fix is available.

Marko Mäkelä added a comment - 2021-09-23 17:43 digdilem , how does it look like after more time running MariaDB 10.5.12? Apart from MDEV-25113 , MDEV-25801 , MDEV-25954 , which were fixed in 10.5.12, there is also MDEV-26445 that could be of interest, once the fix is available.

Simon Avery made changes - 2021-09-24 07:32

Attachment

cannotclosefile.JPG [ 59112 ]

Simon Avery made changes - 2021-09-24 07:35

Attachment

mycnf.txt [ 59113 ]

Simon Avery added a comment - 2021-09-24 07:36

@Marko Mäkelä - thanks for following up.

This is still an ongoing issue.

Follows a 14 day plot for this string against the same production server as before. This is Centos 7 and a busy machine.

Aside from the increased logging and marginal overhead associated with that, I have not noticed any other effects, nor have any issues been reported by the users.

We run another 27 MariaDb 10.5.12 servers on C7, but no others show this problem. The Maria configurations are not all identical, and the problem server has more ram (64G) and activity than most.

I've also attached the my.cnf* variables, in case anything odd pops out that might be linked.

Thank you

MariaDB [(none)]> select version();
---------------------

version()

---------------------

10.5.12-MariaDB-log

---------------------

mycnf.txt

Simon Avery added a comment - 2021-09-24 07:36 @Marko Mäkelä - thanks for following up. This is still an ongoing issue. Follows a 14 day plot for this string against the same production server as before. This is Centos 7 and a busy machine. Aside from the increased logging and marginal overhead associated with that, I have not noticed any other effects, nor have any issues been reported by the users. We run another 27 MariaDb 10.5.12 servers on C7, but no others show this problem. The Maria configurations are not all identical, and the problem server has more ram (64G) and activity than most. I've also attached the my.cnf* variables, in case anything odd pops out that might be linked. Thank you MariaDB [(none)] > select version(); --------------------- version() --------------------- 10.5.12-MariaDB-log --------------------- mycnf.txt

Matthias Leich added a comment - 2021-09-29 10:12 - edited

I have replayed the problem on some tree 10.6 mid 2021-09.
After I removed the feature "Versioning" from the test the problem
disappears totally or becomes at least extreme rare (no replay within ~ 2000 RQG tests).

Matthias Leich added a comment - 2021-09-29 10:12 - edited I have replayed the problem on some tree 10.6 mid 2021-09. After I removed the feature "Versioning" from the test the problem disappears totally or becomes at least extreme rare (no replay within ~ 2000 RQG tests).

Julien Fritsch made changes - 2021-09-29 14:38

Status

Open [ 1 ]

Confirmed [ 10101 ]

Marko Mäkelä added a comment - 2021-09-29 15:06

digdilem, thank you. In mycnf.txt I see that your innodb_log_file_size is only half a gigabyte, which is a fraction of the innodb_buffer_pool_size. That will force very frequent log checkpoints, which in turn will cause stalls. That could actually be the root cause of your messages, if those messages always say "0 pending operations and pending fsync". I do not think that there is any need to execute fsync() or fdatasync() outside log checkpoints. Starting with 10.5, thanks to ~~MDEV-19176~~, recovery should work fine even if you make the redo log as big as the buffer pool, or possibly even larger.

Another factor is that if innodb_open_files is not specified, open_files_limit will be used instead. That may force frequent closing of data files. In mycnf.txt I see open_files_limit=65535, which may or may not be reasonable. But the innodb_log_file_size is definitely way too small.

Internally, we have repeated something, but only when using system versioned tables and FULLTEXT INDEX. Theoretically, there could be a bug in some system versioning code that forgets to commit a mini-transaction (which would release a page latch). But I think that such a bug should cause more trouble than just these messages. At the very least, it should prevent log checkpoints and cause log file overruns, for which we have a separate message.

Some of the logs from our internal testing show a nonzero "pending operations" count, but we also have logs that exclusively show "0 pending operations and pending fsync". I see that within a second, we can issue the message for several files. For us, they are the numerous internal tables for FULLTEXT INDEX for a single user table. In one log that I checked, all file names would start with the same prefix that includes the numeric main table identifier, such as test/FTS_000000000000125b_. Those messages were almost certainly issued by the LRU logic that is attempting to enforce the ‘open files’ limit in InnoDB.

We should definitely rate-limit that output somehow, even if this run is using a ‘misconfigured’ system. If all files are located in the same journaled file system, all files should be ‘on the same boat’ waiting for the fsync() or fdatasync() to complete.

Marko Mäkelä added a comment - 2021-09-29 15:06 digdilem , thank you. In mycnf.txt I see that your innodb_log_file_size is only half a gigabyte, which is a fraction of the innodb_buffer_pool_size . That will force very frequent log checkpoints, which in turn will cause stalls. That could actually be the root cause of your messages, if those messages always say "0 pending operations and pending fsync". I do not think that there is any need to execute fsync() or fdatasync() outside log checkpoints. Starting with 10.5, thanks to MDEV-19176 , recovery should work fine even if you make the redo log as big as the buffer pool, or possibly even larger. Another factor is that if innodb_open_files is not specified, open_files_limit will be used instead. That may force frequent closing of data files. In mycnf.txt I see open_files_limit=65535 , which may or may not be reasonable. But the innodb_log_file_size is definitely way too small. Internally, we have repeated something, but only when using system versioned tables and FULLTEXT INDEX . Theoretically, there could be a bug in some system versioning code that forgets to commit a mini-transaction (which would release a page latch). But I think that such a bug should cause more trouble than just these messages. At the very least, it should prevent log checkpoints and cause log file overruns, for which we have a separate message. Some of the logs from our internal testing show a nonzero "pending operations" count, but we also have logs that exclusively show "0 pending operations and pending fsync". I see that within a second, we can issue the message for several files. For us, they are the numerous internal tables for FULLTEXT INDEX for a single user table. In one log that I checked, all file names would start with the same prefix that includes the numeric main table identifier, such as test/FTS_000000000000125b_ . Those messages were almost certainly issued by the LRU logic that is attempting to enforce the ‘open files’ limit in InnoDB. We should definitely rate-limit that output somehow, even if this run is using a ‘misconfigured’ system. If all files are located in the same journaled file system, all files should be ‘on the same boat’ waiting for the fsync() or fdatasync() to complete.

Marko Mäkelä added a comment - 2021-09-29 15:30

Clarification: We normally execute frequent fdatasync() or fsync() on the InnoDB write-ahead log (ib_logfile0). On data files, I would believe that we only do that on log checkpoint or when extending files.

When using innodb_flush_method=O_DIRECT (which ~~MDEV-24854~~ finally made the default on 10.6) in combination with innodb_use_native_aio=ON, the fsync() or fdatasync() for data files seems to be a no-op on my development system (ext4fs on NVMe, SSD, or HDD). You should check that the server is actually using native asynchronous I/O (io_submit() and friends) and not simulated I/O. With libaio (the only option on Linux before ~~MDEV-24883~~ in 10.6 introduced io_uring) it is possible that you must adjust /proc/sys/fs/aio-max-nr so that io_setup() will not fail on startup.

Marko Mäkelä added a comment - 2021-09-29 15:30 Clarification: We normally execute frequent fdatasync() or fsync() on the InnoDB write-ahead log ( ib_logfile0 ). On data files, I would believe that we only do that on log checkpoint or when extending files. When using innodb_flush_method=O_DIRECT (which MDEV-24854 finally made the default on 10.6) in combination with innodb_use_native_aio=ON , the fsync() or fdatasync() for data files seems to be a no-op on my development system (ext4fs on NVMe, SSD, or HDD). You should check that the server is actually using native asynchronous I/O ( io_submit() and friends) and not simulated I/O. With libaio (the only option on Linux before MDEV-24883 in 10.6 introduced io_uring ) it is possible that you must adjust /proc/sys/fs/aio-max-nr so that io_setup() will not fail on startup.

Marko Mäkelä added a comment - 2021-09-30 05:27 - edited

I am able to reproduce the excessive logging with the following test:

--source include/have_innodb.inc

--source include/have_partition.inc

--source include/have_sequence.inc

CREATE TABLE t1 (pk INT AUTO_INCREMENT PRIMARY KEY, b INT) ENGINE=InnoDB

PARTITION BY key (pk) PARTITIONS 100;

INSERT INTO t1 (pk) SELECT * FROM seq_1_to_100;

BEGIN;

let $n=500;

while ($n)

eval UPDATE t1 SET b=pk+$n;

dec $n;

COMMIT;

DROP TABLE t1;

Edit: And with the following .opt file:

--innodb-open-files=10

--innodb-log-file-size=4m

The messages are displayed by fil_node_open_file() and fil_space_t::try_to_close(). I think that both need to be revised to limit the messages to at most 1 per fil_node_open_file() call, only in the case when the maximum configured number of open files will be exceeded.

Marko Mäkelä added a comment - 2021-09-30 05:27 - edited I am able to reproduce the excessive logging with the following test: --source include/have_innodb.inc --source include/have_partition.inc --source include/have_sequence.inc CREATE TABLE t1 (pk INT AUTO_INCREMENT PRIMARY KEY , b INT ) ENGINE=InnoDB PARTITION BY key (pk) PARTITIONS 100; INSERT INTO t1 (pk) SELECT * FROM seq_1_to_100; BEGIN ; let $n=500; while ($n) { eval UPDATE t1 SET b=pk+$n; dec $n; } COMMIT ; DROP TABLE t1; Edit: And with the following .opt file: --innodb-open-files=10 --innodb-log-file-size=4m The messages are displayed by fil_node_open_file() and fil_space_t::try_to_close() . I think that both need to be revised to limit the messages to at most 1 per fil_node_open_file() call, only in the case when the maximum configured number of open files will be exceeded.

Marko Mäkelä made changes - 2021-09-30 05:27

Link

This issue is caused by ~~MDEV-23855~~ [ ~~MDEV-23855~~ ]

Marko Mäkelä made changes - 2021-09-30 05:27

Component/s	Full-text Search [ 10104 ]
Fix Version/s		10.6 [ 24028 ]
Labels		regression-10.5

Marko Mäkelä added a comment - 2021-09-30 06:43

Here is a little simpler and smaller version of the test.

--source include/have_innodb.inc

--source include/have_partition.inc

--source include/have_sequence.inc

CREATE TABLE t1 (pk INT AUTO_INCREMENT PRIMARY KEY) ENGINE=InnoDB

PARTITION BY KEY (pk) PARTITIONS 100;

INSERT INTO t1 SELECT * FROM seq_1_to_100;

--disable_query_log

let $n=300;

while ($n)

BEGIN; DELETE FROM t1; ROLLBACK;

dec $n;

--enable_query_log

DROP TABLE t1;

Marko Mäkelä added a comment - 2021-09-30 06:43 Here is a little simpler and smaller version of the test. --source include/have_innodb.inc --source include/have_partition.inc --source include/have_sequence.inc CREATE TABLE t1 (pk INT AUTO_INCREMENT PRIMARY KEY ) ENGINE=InnoDB PARTITION BY KEY (pk) PARTITIONS 100; INSERT INTO t1 SELECT * FROM seq_1_to_100; --disable_query_log let $n=300; while ($n) { BEGIN ; DELETE FROM t1; ROLLBACK ; dec $n; } --enable_query_log DROP TABLE t1;

Marko Mäkelä added a comment - 2021-09-30 07:40

We will display a message for at most one file every 5 seconds.

Marko Mäkelä added a comment - 2021-09-30 07:40 We will display a message for at most one file every 5 seconds.

Marko Mäkelä made changes - 2021-09-30 07:40

issue.field.resolutiondate

2021-09-30 07:40:16.0

2021-09-30 07:40:16.078

Marko Mäkelä made changes - 2021-09-30 07:40

Fix Version/s		10.5.13 [ 26026 ]
Fix Version/s		10.6.5 [ 26034 ]
Fix Version/s	10.5 [ 23123 ]
Fix Version/s	10.6 [ 24028 ]
Resolution		Fixed [ 1 ]
Status	Confirmed [ 10101 ]	Closed [ 6 ]

Simon Avery added a comment - 2021-09-30 08:02

@Marko - Thanks for your comments and suggestions on our configuration, notably `innodb_log_file_size` - appreciated.

I've made the change to that specific server (setting it to 25G to match ibps), although as it's not a dynamic variable I won't get a chance to restart the server to pick up the change for a few days so won't immediately be able to report whether it eliminates or reduces these occurances.

Thanks again.

Simon Avery added a comment - 2021-09-30 08:02 @Marko - Thanks for your comments and suggestions on our configuration, notably `innodb_log_file_size` - appreciated. I've made the change to that specific server (setting it to 25G to match ibps), although as it's not a dynamic variable I won't get a chance to restart the server to pick up the change for a few days so won't immediately be able to report whether it eliminates or reduces these occurances. Thanks again.

Marko Mäkelä added a comment - 2021-09-30 08:44

digdilem, you’re welcome. In the MariaDB Enterprise Server starting with version 10.5, innodb_log_file_size can be changed without restart; the log file will be resized on the next checkpoint. But, changing innodb_flush_method would require a server restart.

Marko Mäkelä added a comment - 2021-09-30 08:44 digdilem , you’re welcome. In the MariaDB Enterprise Server starting with version 10.5, innodb_log_file_size can be changed without restart; the log file will be resized on the next checkpoint. But, changing innodb_flush_method would require a server restart.

Simon Avery added a comment - 2021-09-30 09:56

Interesting. Is that dynamic change an Enterprise-only feature?

On trying in community 10.5.12 (Rocky8/EL8) doesn't seem to fly, and the docs state it's not dynamic.

{{MariaDB [(none)]> set global innodb_log_file_size = 26843545600;
ERROR 1238 (HY000): Variable 'innodb_log_file_size' is a read only variable}}

Simon Avery added a comment - 2021-09-30 09:56 Interesting. Is that dynamic change an Enterprise-only feature? On trying in community 10.5.12 (Rocky8/EL8) doesn't seem to fly, and the docs state it's not dynamic. {{MariaDB [(none)] > set global innodb_log_file_size = 26843545600; ERROR 1238 (HY000): Variable 'innodb_log_file_size' is a read only variable}}

Marko Mäkelä added a comment - 2021-10-01 05:32

digdilem, that is right, the Community Server only allows changing innodb_log_file_size in the start-up configuration, to take effect when the server is restarted.

Marko Mäkelä added a comment - 2021-10-01 05:32 digdilem , that is right, the Community Server only allows changing innodb_log_file_size in the start-up configuration, to take effect when the server is restarted.

Simon Avery made changes - 2021-10-07 08:56

Attachment

screenshot-2.png [ 59650 ]

Simon Avery added a comment - 2021-10-07 09:01

HI Marko.

Update - no change.

I raised innodb_log_file_size to equal innodb_buffer_pool_size. The same server was restarted at 0600 on the 5th and picked up the change;

{{MariaDB [(none)]> show variables like 'innodb_log_file_size';
---------------------------------+

Variable_name

Value

---------------------------------+

innodb_log_file_size

26843545600

---------------------------------+
1 row in set (0.001 sec)}}

On checking for the same log, am still seeing high frequencies of this log output in 10.5.12-MariaDB-log

2021-10-06 23:33:29 1369618 [Note] InnoDB: Cannot close file ./DB1/TB1.ibd because of 0 pending operations and pending fsync

So it doesn't look like frequency is affected by thrashing of the binary log due to a small innodb_log_file_size - attached 7 day frequency chart.

Hope this helps diagnose and fix.

Simon Avery added a comment - 2021-10-07 09:01 HI Marko. Update - no change. I raised innodb_log_file_size to equal innodb_buffer_pool_size. The same server was restarted at 0600 on the 5th and picked up the change; {{MariaDB [(none)] > show variables like 'innodb_log_file_size'; --------------------- ------------+ Variable_name Value --------------------- ------------+ innodb_log_file_size 26843545600 --------------------- ------------+ 1 row in set (0.001 sec)}} On checking for the same log, am still seeing high frequencies of this log output in 10.5.12-MariaDB-log 2021-10-06 23:33:29 1369618 [Note] InnoDB: Cannot close file ./DB1/TB1.ibd because of 0 pending operations and pending fsync So it doesn't look like frequency is affected by thrashing of the binary log due to a small innodb_log_file_size - attached 7 day frequency chart. Hope this helps diagnose and fix.

Bryan Bancroft (Inactive) made changes - 2021-10-07 17:30

Link

This issue is blocked by TODO-3164 [ TODO-3164 ]

Marko Mäkelä added a comment - 2021-10-08 05:43

digdilem, the output will be rate-limited in the upcoming 10.5.13 and 10.6.5 releases. I meant that the likelihood of seeing any such message at all should be related to the open files limit and the frequency of InnoDB log checkpoints. The bug simply was that we would output long bursts of those messages when the open files limit is being exceeded. The limit may still be exceeded; that cannot be prevented easily. I suppose that it is more likely when running on slower storage or when not using innodb_flush_method=O_DIRECT. This has nothing to do with the binary log; the checkpoints involve only the ib_logfile0 and the persistent InnoDB data files.

Marko Mäkelä added a comment - 2021-10-08 05:43 digdilem , the output will be rate-limited in the upcoming 10.5.13 and 10.6.5 releases. I meant that the likelihood of seeing any such message at all should be related to the open files limit and the frequency of InnoDB log checkpoints. The bug simply was that we would output long bursts of those messages when the open files limit is being exceeded. The limit may still be exceeded; that cannot be prevented easily. I suppose that it is more likely when running on slower storage or when not using innodb_flush_method=O_DIRECT . This has nothing to do with the binary log; the checkpoints involve only the ib_logfile0 and the persistent InnoDB data files.

Sergei Golubchik made changes - 2021-10-09 08:08

Link

This issue blocks TODO-3164 [ TODO-3164 ]

Sergei Golubchik made changes - 2021-10-09 08:08

Link

This issue is blocked by TODO-3164 [ TODO-3164 ]

Roel Van de Paar made changes - 2021-10-13 23:59

Link

This issue relates to ~~MDEV-26381~~ [ ~~MDEV-26381~~ ]

Ralf Becker made changes - 2021-10-28 10:42

Attachment

mysql-0-20211028.log [ 60054 ]

Ralf Becker added a comment - 2021-10-28 10:45

I want to raise attention of the combination of this bug with a Galera cluster: most of the time this bug takes out ALL nodes of a Galera cluster

We had this on our geo-distributed 5 node Galera cluster which we recently updated to 10.5.12 this morning for the 3rd time in 4 weeks.

What it causes on Galera is, that all nodes accumulate more and more queries until the cluster becomes completely unresponsive for updating queries. Trying to kill the queries wont work, they are stuck in state "killed" and the situation does not improve.

Only way we found so far out of that situation is to completely shut down the cluster / all nodes and restart them one by one with an SST.

I'm attaching the log of one of our nodes, in case someone want's to check it (I had to X out the database names, as they are customer names).

Is there a date for 10.5.13, or do we need to downgrade to 10.4, to fix the problem?

Ralf

mysql-0-20211028.log

Ralf Becker added a comment - 2021-10-28 10:45 I want to raise attention of the combination of this bug with a Galera cluster: most of the time this bug takes out ALL nodes of a Galera cluster We had this on our geo-distributed 5 node Galera cluster which we recently updated to 10.5.12 this morning for the 3rd time in 4 weeks. What it causes on Galera is, that all nodes accumulate more and more queries until the cluster becomes completely unresponsive for updating queries. Trying to kill the queries wont work, they are stuck in state "killed" and the situation does not improve. Only way we found so far out of that situation is to completely shut down the cluster / all nodes and restart them one by one with an SST. I'm attaching the log of one of our nodes, in case someone want's to check it (I had to X out the database names, as they are customer names). Is there a date for 10.5.13, or do we need to downgrade to 10.4, to fix the problem? Ralf mysql-0-20211028.log

Ralf Becker added a comment - 2021-10-28 10:50

I forgot t mention that we already used innodb_flush_method=O_DIRECT and after the 2nd incident and finding the bug report also set innodb_log_file_size=innodb_buffer_cache_size=8GB.

The above did NOT prevent having the described issue with a Galera cluster!

Ralf

Ralf Becker added a comment - 2021-10-28 10:50 I forgot t mention that we already used innodb_flush_method=O_DIRECT and after the 2nd incident and finding the bug report also set innodb_log_file_size=innodb_buffer_cache_size=8GB. The above did NOT prevent having the described issue with a Galera cluster! Ralf

Sergei Golubchik made changes - 2021-12-06 21:53

Workflow

MariaDB v3 [ 120330 ]

MariaDB v4 [ 159063 ]

Gopinath added a comment - 2022-01-11 12:36

Hi,

Encountered this issue even with MariaDB 10.5.13 Enterprise Edition.

2022-01-11 8:40:52 0 [Note] InnoDB: 10.5.13 started; log sequence number 361848522653; transaction id 405336
2022-01-11 8:40:52 0 [Note] InnoDB: Creating #1 encryption thread id 140471212390144 total threads 4.
2022-01-11 8:40:52 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2022-01-11 8:40:52 0 [Note] InnoDB: Creating #2 encryption thread id 140471103624960 total threads 4.
2022-01-11 8:40:52 0 [Note] InnoDB: Creating #3 encryption thread id 140471095232256 total threads 4.
2022-01-11 8:40:52 0 [Note] InnoDB: Creating #4 encryption thread id 140471086839552 total threads 4.
2022-01-11 8:40:52 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-01-11 8:40:52 0 [Note] Using encryption key id 2 for temporary files
2022-01-11 8:40:52 0 [Note] InnoDB: Buffer pool(s) load completed at 220111 8:40:52
2022-01-11 8:40:52 0 [Warning] Failed to create a socket for IPv6 '::': errno: 97.
2022-01-11 8:40:52 0 [Note] Server socket created on IP: '0.0.0.0'.
2022-01-11 8:40:52 0 [Warning] 'proxies_priv' entry '@% root@xxxxxxx' ignored in --skip-name-resolve mode.
2022-01-11 8:40:52 0 [Note] Reading of all Master_info entries succeeded
2022-01-11 8:40:52 0 [Note] Added new Master_info '' to hash table
2022-01-11 8:40:52 0 [Note] mysqld: ready for connections.
Version: '10.5.13-9-MariaDB-enterprise-log' socket: '/var/run/mysqld/mysqld.sock' port: 53805 MariaDB Enterprise Server
2022-01-11 8:57:51 21 [Note] Start binlog_dump to slave_server(52396), pos(mysqld-bin.000001, 369), using_gtid(0), gtid('')
2022-01-11 9:10:35 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_avl_hist.ibd because of 206 pending operations and pending fsync
2022-01-11 9:13:40 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_bp_hist.ibd because of pending fsync
2022-01-11 9:14:42 22 [Note] InnoDB: Cannot close file ./ownbfmprd/fallback_stf_stat_availability_office_xs#P#p736782.ibd because of 1 pending operations
2022-01-11 9:23:08 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_crystd_hist.ibd because of pending fsync
2022-01-11 9:25:07 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_ediavl_hist.ibd because of 222 pending operations and pending fsync
2022-01-11 9:27:11 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_edistd_hist.ibd because of pending fsync
2022-01-11 9:27:24 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_ab_targetoffice_mp#P#p738529.ibd because of 1 pending operations
2022-01-11 9:27:33 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_bench_met.ibd because of 1 pending operations
2022-01-11 9:28:39 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_excluded_hist.ibd because of pending fsync
2022-01-11 9:29:35 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_excluded_lfs_hist.ibd because of 146 pending operations and pending fsync
2022-01-11 9:31:17 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_fop_hist.ibd because of pending fsync
2022-01-11 9:32:38 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_fp_hist.ibd because of pending fsync
2022-01-11 10:57:54 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_mp_hist.ibd because of pending fsync

Gopinath added a comment - 2022-01-11 12:36 Hi, Encountered this issue even with MariaDB 10.5.13 Enterprise Edition. 2022-01-11 8:40:52 0 [Note] InnoDB: 10.5.13 started; log sequence number 361848522653; transaction id 405336 2022-01-11 8:40:52 0 [Note] InnoDB: Creating #1 encryption thread id 140471212390144 total threads 4. 2022-01-11 8:40:52 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool 2022-01-11 8:40:52 0 [Note] InnoDB: Creating #2 encryption thread id 140471103624960 total threads 4. 2022-01-11 8:40:52 0 [Note] InnoDB: Creating #3 encryption thread id 140471095232256 total threads 4. 2022-01-11 8:40:52 0 [Note] InnoDB: Creating #4 encryption thread id 140471086839552 total threads 4. 2022-01-11 8:40:52 0 [Note] Plugin 'FEEDBACK' is disabled. 2022-01-11 8:40:52 0 [Note] Using encryption key id 2 for temporary files 2022-01-11 8:40:52 0 [Note] InnoDB: Buffer pool(s) load completed at 220111 8:40:52 2022-01-11 8:40:52 0 [Warning] Failed to create a socket for IPv6 '::': errno: 97. 2022-01-11 8:40:52 0 [Note] Server socket created on IP: '0.0.0.0'. 2022-01-11 8:40:52 0 [Warning] 'proxies_priv' entry '@% root@xxxxxxx' ignored in --skip-name-resolve mode. 2022-01-11 8:40:52 0 [Note] Reading of all Master_info entries succeeded 2022-01-11 8:40:52 0 [Note] Added new Master_info '' to hash table 2022-01-11 8:40:52 0 [Note] mysqld: ready for connections. Version: '10.5.13-9-MariaDB-enterprise-log' socket: '/var/run/mysqld/mysqld.sock' port: 53805 MariaDB Enterprise Server 2022-01-11 8:57:51 21 [Note] Start binlog_dump to slave_server(52396), pos(mysqld-bin.000001, 369), using_gtid(0), gtid('') 2022-01-11 9:10:35 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_avl_hist.ibd because of 206 pending operations and pending fsync 2022-01-11 9:13:40 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_bp_hist.ibd because of pending fsync 2022-01-11 9:14:42 22 [Note] InnoDB: Cannot close file ./ownbfmprd/fallback_stf_stat_availability_office_xs#P#p736782.ibd because of 1 pending operations 2022-01-11 9:23:08 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_crystd_hist.ibd because of pending fsync 2022-01-11 9:25:07 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_ediavl_hist.ibd because of 222 pending operations and pending fsync 2022-01-11 9:27:11 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_edistd_hist.ibd because of pending fsync 2022-01-11 9:27:24 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_ab_targetoffice_mp#P#p738529.ibd because of 1 pending operations 2022-01-11 9:27:33 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_bench_met.ibd because of 1 pending operations 2022-01-11 9:28:39 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_excluded_hist.ibd because of pending fsync 2022-01-11 9:29:35 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_excluded_lfs_hist.ibd because of 146 pending operations and pending fsync 2022-01-11 9:31:17 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_fop_hist.ibd because of pending fsync 2022-01-11 9:32:38 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_fp_hist.ibd because of pending fsync 2022-01-11 10:57:54 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_mp_hist.ibd because of pending fsync

Marko Mäkelä added a comment - 2022-01-11 13:03

gopinathkarangula, what you encountered is not the same as the originally reported issue, where the server error log was flooded with such messages. The rate-limiting seems to work for you. Please file a new ticket for this problem if you think that the configuration is reasonable.

ralfbecker, by innodb_buffer_cache_size I assume that you mean innodb_buffer_pool_size. Does 10.5.13 or 10.6.5 work for you?

Marko Mäkelä added a comment - 2022-01-11 13:03 gopinathkarangula , what you encountered is not the same as the originally reported issue, where the server error log was flooded with such messages. The rate-limiting seems to work for you. Please file a new ticket for this problem if you think that the configuration is reasonable. ralfbecker , by innodb_buffer_cache_size I assume that you mean innodb_buffer_pool_size . Does 10.5.13 or 10.6.5 work for you?

Ralf Becker added a comment - 2022-01-11 13:13

@Marko Mäkelä: yes, innodb_buffer_pool_size, but it seem to have no effect on the issue we had.
Since the update to 10.5.13 it has never appeared again - so far.

Ralf

Ralf Becker added a comment - 2022-01-11 13:13 @Marko Mäkelä: yes, innodb_buffer_pool_size, but it seem to have no effect on the issue we had. Since the update to 10.5.13 it has never appeared again - so far. Ralf

Roel Van de Paar made changes - 2022-03-17 05:50

Link

This issue relates to MDEV-28110 [ MDEV-28110 ]

Marko Mäkelä made changes - 2023-03-10 14:15

Link

This issue relates to ~~MDEV-30829~~ [ ~~MDEV-30829~~ ]

Jira Automation (IT) made changes - 2024-07-04 02:23

Zendesk Related Tickets

138229 113559

People

Assignee:: Marko Mäkelä

Reporter:: Elena Stepanova

Votes:: 4 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 2021-03-22 01:20

Updated:: 2024-07-07 21:27

Resolved:: 2021-09-30 07:40

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration