Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25215

Excessive logging "Cannot close file ./test/FTS_xxx.ibd because of n pending operations and pending fsync

Details

    Description

      Within ~1 min test on current 10.6 with default log_warnings level I'm getting in the error log ~8500 records like this:

      2021-03-22  2:58:35 8 [Note] InnoDB: Cannot close file ./test/FTS_00000000000012c8_00000000000015f9_INDEX_1.ibd because of 0 pending operations and pending fsync
      2021-03-22  2:58:35 8 [Note] InnoDB: Cannot close file ./test/FTS_00000000000012c8_00000000000015f9_INDEX_2.ibd because of 0 pending operations and pending fsync
      2021-03-22  2:58:35 8 [Note] InnoDB: Cannot close file ./test/FTS_00000000000012c8_00000000000015f9_INDEX_4.ibd because of 0 pending operations and pending fsync
      2021-03-22  2:58:35 8 [Note] InnoDB: Cannot close file ./test/FTS_00000000000012c8_00000000000015f9_INDEX_5.ibd because of 1 pending operations and pending fsync
      2021-03-22  2:58:35 8 [Note] InnoDB: Cannot close file ./test/FTS_00000000000012c8_00000000000015f9_INDEX_6.ibd because of 0 pending operations and pending fsync
      ...
      

      10.5 seems to produce only those with "0 pending operations", so there are less of them, but still, ~4000 in a few-minute test. And these ones (with zero) are especially confusing.

      I think if these are real problems which a user needs to take care of, there should be warnings or errors (and given the persistence, maybe we are missing at least a debug assertion or something); and if these are just technical notifications, there shouldn't be so many of them.

      Attachments

        1. cannotclosefile.JPG
          cannotclosefile.JPG
          83 kB
        2. mycnf.txt
          2 kB
        3. mysql-0-20211028.log
          219 kB
        4. screenshot-1.png
          screenshot-1.png
          19 kB
        5. screenshot-2.png
          screenshot-2.png
          35 kB

        Issue Links

          Activity

            elenst Elena Stepanova created issue -
            digdilem Simon Avery added a comment - - edited

            We encountered this bug today on a busy production server and researching brought us here.

            Mariadb: Server version: 10.5.11 (From Maria's C7 repo)
            OS: CentOS 7

            2021-08-09 22:49:11 5443978 [Note] InnoDB: Cannot close file ./DB0/Tab1.ibd because of 20 pending operations and pending fsync
            2021-08-03 13:05:10 145279 [Note] InnoDB: Cannot close file ./DB1/Tab1#P#per.ibd because of 0 pending operations and pending fsync
            2021-08-03 13:05:10 145911 [Note] InnoDB: Cannot close file ./DB1/Tab1#P#per.ibd because of 0 pending operations and pending fsync
            2021-08-03 13:05:10 145515 [Note] InnoDB: Cannot close file ./DB1/Tab1#P#per.ibd because of 0 pending operations and pending fsync

            Variations of this, affecting multiple databases and tables.

            Total number of similar log entries is 200149 during this period.

            Restarting Maria appears to have (temporarily) resolved the issue.

            digdilem Simon Avery added a comment - - edited We encountered this bug today on a busy production server and researching brought us here. Mariadb: Server version: 10.5.11 (From Maria's C7 repo) OS: CentOS 7 2021-08-09 22:49:11 5443978 [Note] InnoDB: Cannot close file ./DB0/Tab1.ibd because of 20 pending operations and pending fsync 2021-08-03 13:05:10 145279 [Note] InnoDB: Cannot close file ./DB1/Tab1#P#per.ibd because of 0 pending operations and pending fsync 2021-08-03 13:05:10 145911 [Note] InnoDB: Cannot close file ./DB1/Tab1#P#per.ibd because of 0 pending operations and pending fsync 2021-08-03 13:05:10 145515 [Note] InnoDB: Cannot close file ./DB1/Tab1#P#per.ibd because of 0 pending operations and pending fsync Variations of this, affecting multiple databases and tables. Total number of similar log entries is 200149 during this period. Restarting Maria appears to have (temporarily) resolved the issue.
            digdilem Simon Avery made changes -
            Field Original Value New Value
            Attachment screenshot-1.png [ 58600 ]
            digdilem Simon Avery added a comment -

            Still ongoing for us. There's some suggestion that it's linked to some queries that aren't completing, although I can't confirm that definitely.

            To give some indication of the sheer amount of lines in the log this is creating, this is the counts from one server over 14 days.

            digdilem Simon Avery added a comment - Still ongoing for us. There's some suggestion that it's linked to some queries that aren't completing, although I can't confirm that definitely. To give some indication of the sheer amount of lines in the log this is creating, this is the counts from one server over 14 days.

            digdilem, do you see this with MariaDB 10.5.12? MDEV-25113, MDEV-25801, MDEV-25954 were fixed in that release.

            marko Marko Mäkelä added a comment - digdilem , do you see this with MariaDB 10.5.12? MDEV-25113 , MDEV-25801 , MDEV-25954 were fixed in that release.
            digdilem Simon Avery added a comment -

            Hi Marko

            Maybe, maybe not...

            I have two machines reporting this.

            Server 1 is on 10.5.9, so let's not focus on that. That's still reporting it occasionally, but not in volume.

            Server 2 was upgraded from 10.5.11 to 10.5.12 (from Maria.org's Centos7/EL7 repos) on 3rd August (just before my last report). It's still reporting occasion output like so;

            2021-08-21 21:39:37 4117695 [Note] InnoDB: Cannot close file ./Db1/Tb1.ibd because of 0 pending operations and pending fsync

            However, nothing more recent than this - so that's 10 days clear.

            I'll continue to monitor and should it resurface on anything 10.5.12 onwards, I'll update this.

            Thanks all,

            digdilem Simon Avery added a comment - Hi Marko Maybe, maybe not... I have two machines reporting this. Server 1 is on 10.5.9, so let's not focus on that. That's still reporting it occasionally, but not in volume. Server 2 was upgraded from 10.5.11 to 10.5.12 (from Maria.org's Centos7/EL7 repos) on 3rd August (just before my last report). It's still reporting occasion output like so; 2021-08-21 21:39:37 4117695 [Note] InnoDB: Cannot close file ./Db1/Tb1.ibd because of 0 pending operations and pending fsync However, nothing more recent than this - so that's 10 days clear. I'll continue to monitor and should it resurface on anything 10.5.12 onwards, I'll update this. Thanks all,

            elenst, can you please check if you can yourself reproduce this with recent 10.5 or 10.6 trees?

            I see that we had an internal discussion on this on August 30, and mleich only found a very small number of occurrences of this (I guess, from runs with rr record, which would at times create very ‘unfair’ scheduling of threads). He conducted some tests on HDD in order to get more realistic I/O performance and was not able to reproduce massive amounts of this.

            marko Marko Mäkelä added a comment - elenst , can you please check if you can yourself reproduce this with recent 10.5 or 10.6 trees? I see that we had an internal discussion on this on August 30, and mleich only found a very small number of occurrences of this (I guess, from runs with rr record , which would at times create very ‘unfair’ scheduling of threads). He conducted some tests on HDD in order to get more realistic I/O performance and was not able to reproduce massive amounts of this.

            No, sorry, I can't, not in any near future. I assume if Matthias was able to reproduce them at all, he can take one of versions for which the complaints were raised and run the tests on a normal non-rr release-like build.

            elenst Elena Stepanova added a comment - No, sorry, I can't, not in any near future. I assume if Matthias was able to reproduce them at all, he can take one of versions for which the complaints were raised and run the tests on a normal non-rr release-like build.
            julien.fritsch Julien Fritsch made changes -
            Assignee Marko Mäkelä [ marko ] Elena Stepanova [ elenst ]
            elenst Elena Stepanova made changes -
            Assignee Elena Stepanova [ elenst ] Marko Mäkelä [ marko ]

            digdilem, how does it look like after more time running MariaDB 10.5.12?

            Apart from MDEV-25113, MDEV-25801, MDEV-25954, which were fixed in 10.5.12, there is also MDEV-26445 that could be of interest, once the fix is available.

            marko Marko Mäkelä added a comment - digdilem , how does it look like after more time running MariaDB 10.5.12? Apart from MDEV-25113 , MDEV-25801 , MDEV-25954 , which were fixed in 10.5.12, there is also MDEV-26445 that could be of interest, once the fix is available.
            digdilem Simon Avery made changes -
            Attachment cannotclosefile.JPG [ 59112 ]
            digdilem Simon Avery made changes -
            Attachment mycnf.txt [ 59113 ]
            digdilem Simon Avery added a comment -

            @Marko Mäkelä - thanks for following up.

            This is still an ongoing issue.

            Follows a 14 day plot for this string against the same production server as before. This is Centos 7 and a busy machine.

            Aside from the increased logging and marginal overhead associated with that, I have not noticed any other effects, nor have any issues been reported by the users.

            We run another 27 MariaDb 10.5.12 servers on C7, but no others show this problem. The Maria configurations are not all identical, and the problem server has more ram (64G) and activity than most.

            I've also attached the my.cnf* variables, in case anything odd pops out that might be linked.

            Thank you

            MariaDB [(none)]> select version();
            ---------------------

            version()

            ---------------------

            10.5.12-MariaDB-log

            ---------------------

            mycnf.txt

            digdilem Simon Avery added a comment - @Marko Mäkelä - thanks for following up. This is still an ongoing issue. Follows a 14 day plot for this string against the same production server as before. This is Centos 7 and a busy machine. Aside from the increased logging and marginal overhead associated with that, I have not noticed any other effects, nor have any issues been reported by the users. We run another 27 MariaDb 10.5.12 servers on C7, but no others show this problem. The Maria configurations are not all identical, and the problem server has more ram (64G) and activity than most. I've also attached the my.cnf* variables, in case anything odd pops out that might be linked. Thank you MariaDB [(none)] > select version(); --------------------- version() --------------------- 10.5.12-MariaDB-log --------------------- mycnf.txt
            mleich Matthias Leich added a comment - - edited

            I have replayed the problem on some tree 10.6 mid 2021-09.
            After I removed the feature "Versioning" from the test the problem
            disappears totally or becomes at least extreme rare (no replay within ~ 2000 RQG tests).

            mleich Matthias Leich added a comment - - edited I have replayed the problem on some tree 10.6 mid 2021-09. After I removed the feature "Versioning" from the test the problem disappears totally or becomes at least extreme rare (no replay within ~ 2000 RQG tests).
            julien.fritsch Julien Fritsch made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]

            digdilem, thank you. In mycnf.txt I see that your innodb_log_file_size is only half a gigabyte, which is a fraction of the innodb_buffer_pool_size. That will force very frequent log checkpoints, which in turn will cause stalls. That could actually be the root cause of your messages, if those messages always say "0 pending operations and pending fsync". I do not think that there is any need to execute fsync() or fdatasync() outside log checkpoints. Starting with 10.5, thanks to MDEV-19176, recovery should work fine even if you make the redo log as big as the buffer pool, or possibly even larger.

            Another factor is that if innodb_open_files is not specified, open_files_limit will be used instead. That may force frequent closing of data files. In mycnf.txt I see open_files_limit=65535, which may or may not be reasonable. But the innodb_log_file_size is definitely way too small.

            Internally, we have repeated something, but only when using system versioned tables and FULLTEXT INDEX. Theoretically, there could be a bug in some system versioning code that forgets to commit a mini-transaction (which would release a page latch). But I think that such a bug should cause more trouble than just these messages. At the very least, it should prevent log checkpoints and cause log file overruns, for which we have a separate message.

            Some of the logs from our internal testing show a nonzero "pending operations" count, but we also have logs that exclusively show "0 pending operations and pending fsync". I see that within a second, we can issue the message for several files. For us, they are the numerous internal tables for FULLTEXT INDEX for a single user table. In one log that I checked, all file names would start with the same prefix that includes the numeric main table identifier, such as test/FTS_000000000000125b_. Those messages were almost certainly issued by the LRU logic that is attempting to enforce the ‘open files’ limit in InnoDB.

            We should definitely rate-limit that output somehow, even if this run is using a ‘misconfigured’ system. If all files are located in the same journaled file system, all files should be ‘on the same boat’ waiting for the fsync() or fdatasync() to complete.

            marko Marko Mäkelä added a comment - digdilem , thank you. In mycnf.txt I see that your innodb_log_file_size is only half a gigabyte, which is a fraction of the innodb_buffer_pool_size . That will force very frequent log checkpoints, which in turn will cause stalls. That could actually be the root cause of your messages, if those messages always say "0 pending operations and pending fsync". I do not think that there is any need to execute fsync() or fdatasync() outside log checkpoints. Starting with 10.5, thanks to MDEV-19176 , recovery should work fine even if you make the redo log as big as the buffer pool, or possibly even larger. Another factor is that if innodb_open_files is not specified, open_files_limit will be used instead. That may force frequent closing of data files. In mycnf.txt I see open_files_limit=65535 , which may or may not be reasonable. But the innodb_log_file_size is definitely way too small. Internally, we have repeated something, but only when using system versioned tables and FULLTEXT INDEX . Theoretically, there could be a bug in some system versioning code that forgets to commit a mini-transaction (which would release a page latch). But I think that such a bug should cause more trouble than just these messages. At the very least, it should prevent log checkpoints and cause log file overruns, for which we have a separate message. Some of the logs from our internal testing show a nonzero "pending operations" count, but we also have logs that exclusively show "0 pending operations and pending fsync". I see that within a second, we can issue the message for several files. For us, they are the numerous internal tables for FULLTEXT INDEX for a single user table. In one log that I checked, all file names would start with the same prefix that includes the numeric main table identifier, such as test/FTS_000000000000125b_ . Those messages were almost certainly issued by the LRU logic that is attempting to enforce the ‘open files’ limit in InnoDB. We should definitely rate-limit that output somehow, even if this run is using a ‘misconfigured’ system. If all files are located in the same journaled file system, all files should be ‘on the same boat’ waiting for the fsync() or fdatasync() to complete.

            Clarification: We normally execute frequent fdatasync() or fsync() on the InnoDB write-ahead log (ib_logfile0). On data files, I would believe that we only do that on log checkpoint or when extending files.

            When using innodb_flush_method=O_DIRECT (which MDEV-24854 finally made the default on 10.6) in combination with innodb_use_native_aio=ON, the fsync() or fdatasync() for data files seems to be a no-op on my development system (ext4fs on NVMe, SSD, or HDD). You should check that the server is actually using native asynchronous I/O (io_submit() and friends) and not simulated I/O. With libaio (the only option on Linux before MDEV-24883 in 10.6 introduced io_uring) it is possible that you must adjust /proc/sys/fs/aio-max-nr so that io_setup() will not fail on startup.

            marko Marko Mäkelä added a comment - Clarification: We normally execute frequent fdatasync() or fsync() on the InnoDB write-ahead log ( ib_logfile0 ). On data files, I would believe that we only do that on log checkpoint or when extending files. When using innodb_flush_method=O_DIRECT (which MDEV-24854 finally made the default on 10.6) in combination with innodb_use_native_aio=ON , the fsync() or fdatasync() for data files seems to be a no-op on my development system (ext4fs on NVMe, SSD, or HDD). You should check that the server is actually using native asynchronous I/O ( io_submit() and friends) and not simulated I/O. With libaio (the only option on Linux before MDEV-24883 in 10.6 introduced io_uring ) it is possible that you must adjust /proc/sys/fs/aio-max-nr so that io_setup() will not fail on startup.
            marko Marko Mäkelä added a comment - - edited

            I am able to reproduce the excessive logging with the following test:

            --source include/have_innodb.inc
            --source include/have_partition.inc
            --source include/have_sequence.inc
             
            CREATE TABLE t1 (pk INT AUTO_INCREMENT PRIMARY KEY, b INT) ENGINE=InnoDB
            PARTITION BY key (pk) PARTITIONS 100;
             
            INSERT INTO t1 (pk) SELECT * FROM seq_1_to_100;
            BEGIN;
            let $n=500;
            while ($n)
            {
            eval UPDATE t1 SET b=pk+$n;
            dec $n;
            }
            COMMIT;
            DROP TABLE t1;
            

            Edit: And with the following .opt file:

            --innodb-open-files=10
            --innodb-log-file-size=4m
            

            The messages are displayed by fil_node_open_file() and fil_space_t::try_to_close(). I think that both need to be revised to limit the messages to at most 1 per fil_node_open_file() call, only in the case when the maximum configured number of open files will be exceeded.

            marko Marko Mäkelä added a comment - - edited I am able to reproduce the excessive logging with the following test: --source include/have_innodb.inc --source include/have_partition.inc --source include/have_sequence.inc   CREATE TABLE t1 (pk INT AUTO_INCREMENT PRIMARY KEY , b INT ) ENGINE=InnoDB PARTITION BY key (pk) PARTITIONS 100;   INSERT INTO t1 (pk) SELECT * FROM seq_1_to_100; BEGIN ; let $n=500; while ($n) { eval UPDATE t1 SET b=pk+$n; dec $n; } COMMIT ; DROP TABLE t1; Edit: And with the following .opt file: --innodb-open-files=10 --innodb-log-file-size=4m The messages are displayed by fil_node_open_file() and fil_space_t::try_to_close() . I think that both need to be revised to limit the messages to at most 1 per fil_node_open_file() call, only in the case when the maximum configured number of open files will be exceeded.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Component/s Full-text Search [ 10104 ]
            Fix Version/s 10.6 [ 24028 ]
            Labels regression-10.5

            Here is a little simpler and smaller version of the test.

            --source include/have_innodb.inc
            --source include/have_partition.inc
            --source include/have_sequence.inc
             
            CREATE TABLE t1 (pk INT AUTO_INCREMENT PRIMARY KEY) ENGINE=InnoDB
            PARTITION BY KEY (pk) PARTITIONS 100;
             
            INSERT INTO t1 SELECT * FROM seq_1_to_100;
            --disable_query_log
            let $n=300;
            while ($n)
            {
            BEGIN; DELETE FROM t1; ROLLBACK;
            dec $n;
            }
            --enable_query_log
            DROP TABLE t1;
            

            marko Marko Mäkelä added a comment - Here is a little simpler and smaller version of the test. --source include/have_innodb.inc --source include/have_partition.inc --source include/have_sequence.inc   CREATE TABLE t1 (pk INT AUTO_INCREMENT PRIMARY KEY ) ENGINE=InnoDB PARTITION BY KEY (pk) PARTITIONS 100;   INSERT INTO t1 SELECT * FROM seq_1_to_100; --disable_query_log let $n=300; while ($n) { BEGIN ; DELETE FROM t1; ROLLBACK ; dec $n; } --enable_query_log DROP TABLE t1;

            We will display a message for at most one file every 5 seconds.

            marko Marko Mäkelä added a comment - We will display a message for at most one file every 5 seconds.
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2021-09-30 07:40:16.0 2021-09-30 07:40:16.078
            marko Marko Mäkelä made changes -
            Fix Version/s 10.5.13 [ 26026 ]
            Fix Version/s 10.6.5 [ 26034 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Resolution Fixed [ 1 ]
            Status Confirmed [ 10101 ] Closed [ 6 ]
            digdilem Simon Avery added a comment -

            @Marko - Thanks for your comments and suggestions on our configuration, notably `innodb_log_file_size` - appreciated.

            I've made the change to that specific server (setting it to 25G to match ibps), although as it's not a dynamic variable I won't get a chance to restart the server to pick up the change for a few days so won't immediately be able to report whether it eliminates or reduces these occurances.

            Thanks again.

            digdilem Simon Avery added a comment - @Marko - Thanks for your comments and suggestions on our configuration, notably `innodb_log_file_size` - appreciated. I've made the change to that specific server (setting it to 25G to match ibps), although as it's not a dynamic variable I won't get a chance to restart the server to pick up the change for a few days so won't immediately be able to report whether it eliminates or reduces these occurances. Thanks again.

            digdilem, you’re welcome. In the MariaDB Enterprise Server starting with version 10.5, innodb_log_file_size can be changed without restart; the log file will be resized on the next checkpoint. But, changing innodb_flush_method would require a server restart.

            marko Marko Mäkelä added a comment - digdilem , you’re welcome. In the MariaDB Enterprise Server starting with version 10.5, innodb_log_file_size can be changed without restart; the log file will be resized on the next checkpoint. But, changing innodb_flush_method would require a server restart.
            digdilem Simon Avery added a comment -

            Interesting. Is that dynamic change an Enterprise-only feature?

            On trying in community 10.5.12 (Rocky8/EL8) doesn't seem to fly, and the docs state it's not dynamic.

            {{MariaDB [(none)]> set global innodb_log_file_size = 26843545600;
            ERROR 1238 (HY000): Variable 'innodb_log_file_size' is a read only variable}}

            digdilem Simon Avery added a comment - Interesting. Is that dynamic change an Enterprise-only feature? On trying in community 10.5.12 (Rocky8/EL8) doesn't seem to fly, and the docs state it's not dynamic. {{MariaDB [(none)] > set global innodb_log_file_size = 26843545600; ERROR 1238 (HY000): Variable 'innodb_log_file_size' is a read only variable}}

            digdilem, that is right, the Community Server only allows changing innodb_log_file_size in the start-up configuration, to take effect when the server is restarted.

            marko Marko Mäkelä added a comment - digdilem , that is right, the Community Server only allows changing innodb_log_file_size in the start-up configuration, to take effect when the server is restarted.
            digdilem Simon Avery made changes -
            Attachment screenshot-2.png [ 59650 ]
            digdilem Simon Avery added a comment -

            HI Marko.

            Update - no change.

            I raised innodb_log_file_size to equal innodb_buffer_pool_size. The same server was restarted at 0600 on the 5th and picked up the change;

            {{MariaDB [(none)]> show variables like 'innodb_log_file_size';
            ---------------------------------+

            Variable_name Value

            ---------------------------------+

            innodb_log_file_size 26843545600

            ---------------------------------+
            1 row in set (0.001 sec)}}

            On checking for the same log, am still seeing high frequencies of this log output in 10.5.12-MariaDB-log

            2021-10-06 23:33:29 1369618 [Note] InnoDB: Cannot close file ./DB1/TB1.ibd because of 0 pending operations and pending fsync

            So it doesn't look like frequency is affected by thrashing of the binary log due to a small innodb_log_file_size - attached 7 day frequency chart.

            Hope this helps diagnose and fix.

            digdilem Simon Avery added a comment - HI Marko. Update - no change. I raised innodb_log_file_size to equal innodb_buffer_pool_size. The same server was restarted at 0600 on the 5th and picked up the change; {{MariaDB [(none)] > show variables like 'innodb_log_file_size'; --------------------- ------------+ Variable_name Value --------------------- ------------+ innodb_log_file_size 26843545600 --------------------- ------------+ 1 row in set (0.001 sec)}} On checking for the same log, am still seeing high frequencies of this log output in 10.5.12-MariaDB-log 2021-10-06 23:33:29 1369618 [Note] InnoDB: Cannot close file ./DB1/TB1.ibd because of 0 pending operations and pending fsync So it doesn't look like frequency is affected by thrashing of the binary log due to a small innodb_log_file_size - attached 7 day frequency chart. Hope this helps diagnose and fix.
            bbancroft Bryan Bancroft (Inactive) made changes -

            digdilem, the output will be rate-limited in the upcoming 10.5.13 and 10.6.5 releases. I meant that the likelihood of seeing any such message at all should be related to the open files limit and the frequency of InnoDB log checkpoints. The bug simply was that we would output long bursts of those messages when the open files limit is being exceeded. The limit may still be exceeded; that cannot be prevented easily. I suppose that it is more likely when running on slower storage or when not using innodb_flush_method=O_DIRECT. This has nothing to do with the binary log; the checkpoints involve only the ib_logfile0 and the persistent InnoDB data files.

            marko Marko Mäkelä added a comment - digdilem , the output will be rate-limited in the upcoming 10.5.13 and 10.6.5 releases. I meant that the likelihood of seeing any such message at all should be related to the open files limit and the frequency of InnoDB log checkpoints. The bug simply was that we would output long bursts of those messages when the open files limit is being exceeded. The limit may still be exceeded; that cannot be prevented easily. I suppose that it is more likely when running on slower storage or when not using innodb_flush_method=O_DIRECT . This has nothing to do with the binary log; the checkpoints involve only the ib_logfile0 and the persistent InnoDB data files.
            serg Sergei Golubchik made changes -
            serg Sergei Golubchik made changes -
            Roel Roel Van de Paar made changes -
            ralfbecker Ralf Becker made changes -
            Attachment mysql-0-20211028.log [ 60054 ]
            ralfbecker Ralf Becker added a comment -

            I want to raise attention of the combination of this bug with a Galera cluster: most of the time this bug takes out ALL nodes of a Galera cluster

            We had this on our geo-distributed 5 node Galera cluster which we recently updated to 10.5.12 this morning for the 3rd time in 4 weeks.

            What it causes on Galera is, that all nodes accumulate more and more queries until the cluster becomes completely unresponsive for updating queries. Trying to kill the queries wont work, they are stuck in state "killed" and the situation does not improve.

            Only way we found so far out of that situation is to completely shut down the cluster / all nodes and restart them one by one with an SST.

            I'm attaching the log of one of our nodes, in case someone want's to check it (I had to X out the database names, as they are customer names).

            Is there a date for 10.5.13, or do we need to downgrade to 10.4, to fix the problem?

            Ralf

            mysql-0-20211028.log

            ralfbecker Ralf Becker added a comment - I want to raise attention of the combination of this bug with a Galera cluster: most of the time this bug takes out ALL nodes of a Galera cluster We had this on our geo-distributed 5 node Galera cluster which we recently updated to 10.5.12 this morning for the 3rd time in 4 weeks. What it causes on Galera is, that all nodes accumulate more and more queries until the cluster becomes completely unresponsive for updating queries. Trying to kill the queries wont work, they are stuck in state "killed" and the situation does not improve. Only way we found so far out of that situation is to completely shut down the cluster / all nodes and restart them one by one with an SST. I'm attaching the log of one of our nodes, in case someone want's to check it (I had to X out the database names, as they are customer names). Is there a date for 10.5.13, or do we need to downgrade to 10.4, to fix the problem? Ralf mysql-0-20211028.log
            ralfbecker Ralf Becker added a comment -

            I forgot t mention that we already used innodb_flush_method=O_DIRECT and after the 2nd incident and finding the bug report also set innodb_log_file_size=innodb_buffer_cache_size=8GB.

            The above did NOT prevent having the described issue with a Galera cluster!

            Ralf

            ralfbecker Ralf Becker added a comment - I forgot t mention that we already used innodb_flush_method=O_DIRECT and after the 2nd incident and finding the bug report also set innodb_log_file_size=innodb_buffer_cache_size=8GB. The above did NOT prevent having the described issue with a Galera cluster! Ralf
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 120330 ] MariaDB v4 [ 159063 ]
            gopinathkarangula Gopinath added a comment -

            Hi,

            Encountered this issue even with MariaDB 10.5.13 Enterprise Edition.

            2022-01-11 8:40:52 0 [Note] InnoDB: 10.5.13 started; log sequence number 361848522653; transaction id 405336
            2022-01-11 8:40:52 0 [Note] InnoDB: Creating #1 encryption thread id 140471212390144 total threads 4.
            2022-01-11 8:40:52 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
            2022-01-11 8:40:52 0 [Note] InnoDB: Creating #2 encryption thread id 140471103624960 total threads 4.
            2022-01-11 8:40:52 0 [Note] InnoDB: Creating #3 encryption thread id 140471095232256 total threads 4.
            2022-01-11 8:40:52 0 [Note] InnoDB: Creating #4 encryption thread id 140471086839552 total threads 4.
            2022-01-11 8:40:52 0 [Note] Plugin 'FEEDBACK' is disabled.
            2022-01-11 8:40:52 0 [Note] Using encryption key id 2 for temporary files
            2022-01-11 8:40:52 0 [Note] InnoDB: Buffer pool(s) load completed at 220111 8:40:52
            2022-01-11 8:40:52 0 [Warning] Failed to create a socket for IPv6 '::': errno: 97.
            2022-01-11 8:40:52 0 [Note] Server socket created on IP: '0.0.0.0'.
            2022-01-11 8:40:52 0 [Warning] 'proxies_priv' entry '@% root@xxxxxxx' ignored in --skip-name-resolve mode.
            2022-01-11 8:40:52 0 [Note] Reading of all Master_info entries succeeded
            2022-01-11 8:40:52 0 [Note] Added new Master_info '' to hash table
            2022-01-11 8:40:52 0 [Note] mysqld: ready for connections.
            Version: '10.5.13-9-MariaDB-enterprise-log' socket: '/var/run/mysqld/mysqld.sock' port: 53805 MariaDB Enterprise Server
            2022-01-11 8:57:51 21 [Note] Start binlog_dump to slave_server(52396), pos(mysqld-bin.000001, 369), using_gtid(0), gtid('')
            2022-01-11 9:10:35 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_avl_hist.ibd because of 206 pending operations and pending fsync
            2022-01-11 9:13:40 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_bp_hist.ibd because of pending fsync
            2022-01-11 9:14:42 22 [Note] InnoDB: Cannot close file ./ownbfmprd/fallback_stf_stat_availability_office_xs#P#p736782.ibd because of 1 pending operations
            2022-01-11 9:23:08 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_crystd_hist.ibd because of pending fsync
            2022-01-11 9:25:07 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_ediavl_hist.ibd because of 222 pending operations and pending fsync
            2022-01-11 9:27:11 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_edistd_hist.ibd because of pending fsync
            2022-01-11 9:27:24 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_ab_targetoffice_mp#P#p738529.ibd because of 1 pending operations
            2022-01-11 9:27:33 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_bench_met.ibd because of 1 pending operations
            2022-01-11 9:28:39 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_excluded_hist.ibd because of pending fsync
            2022-01-11 9:29:35 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_excluded_lfs_hist.ibd because of 146 pending operations and pending fsync
            2022-01-11 9:31:17 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_fop_hist.ibd because of pending fsync
            2022-01-11 9:32:38 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_fp_hist.ibd because of pending fsync
            2022-01-11 10:57:54 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_mp_hist.ibd because of pending fsync

            gopinathkarangula Gopinath added a comment - Hi, Encountered this issue even with MariaDB 10.5.13 Enterprise Edition. 2022-01-11 8:40:52 0 [Note] InnoDB: 10.5.13 started; log sequence number 361848522653; transaction id 405336 2022-01-11 8:40:52 0 [Note] InnoDB: Creating #1 encryption thread id 140471212390144 total threads 4. 2022-01-11 8:40:52 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool 2022-01-11 8:40:52 0 [Note] InnoDB: Creating #2 encryption thread id 140471103624960 total threads 4. 2022-01-11 8:40:52 0 [Note] InnoDB: Creating #3 encryption thread id 140471095232256 total threads 4. 2022-01-11 8:40:52 0 [Note] InnoDB: Creating #4 encryption thread id 140471086839552 total threads 4. 2022-01-11 8:40:52 0 [Note] Plugin 'FEEDBACK' is disabled. 2022-01-11 8:40:52 0 [Note] Using encryption key id 2 for temporary files 2022-01-11 8:40:52 0 [Note] InnoDB: Buffer pool(s) load completed at 220111 8:40:52 2022-01-11 8:40:52 0 [Warning] Failed to create a socket for IPv6 '::': errno: 97. 2022-01-11 8:40:52 0 [Note] Server socket created on IP: '0.0.0.0'. 2022-01-11 8:40:52 0 [Warning] 'proxies_priv' entry '@% root@xxxxxxx' ignored in --skip-name-resolve mode. 2022-01-11 8:40:52 0 [Note] Reading of all Master_info entries succeeded 2022-01-11 8:40:52 0 [Note] Added new Master_info '' to hash table 2022-01-11 8:40:52 0 [Note] mysqld: ready for connections. Version: '10.5.13-9-MariaDB-enterprise-log' socket: '/var/run/mysqld/mysqld.sock' port: 53805 MariaDB Enterprise Server 2022-01-11 8:57:51 21 [Note] Start binlog_dump to slave_server(52396), pos(mysqld-bin.000001, 369), using_gtid(0), gtid('') 2022-01-11 9:10:35 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_avl_hist.ibd because of 206 pending operations and pending fsync 2022-01-11 9:13:40 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_bp_hist.ibd because of pending fsync 2022-01-11 9:14:42 22 [Note] InnoDB: Cannot close file ./ownbfmprd/fallback_stf_stat_availability_office_xs#P#p736782.ibd because of 1 pending operations 2022-01-11 9:23:08 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_crystd_hist.ibd because of pending fsync 2022-01-11 9:25:07 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_ediavl_hist.ibd because of 222 pending operations and pending fsync 2022-01-11 9:27:11 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_edistd_hist.ibd because of pending fsync 2022-01-11 9:27:24 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_ab_targetoffice_mp#P#p738529.ibd because of 1 pending operations 2022-01-11 9:27:33 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_bench_met.ibd because of 1 pending operations 2022-01-11 9:28:39 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_excluded_hist.ibd because of pending fsync 2022-01-11 9:29:35 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_excluded_lfs_hist.ibd because of 146 pending operations and pending fsync 2022-01-11 9:31:17 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_fop_hist.ibd because of pending fsync 2022-01-11 9:32:38 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_fp_hist.ibd because of pending fsync 2022-01-11 10:57:54 22 [Note] InnoDB: Cannot close file ./ownbfmprd/stf_stat_availability_office_mp_hist.ibd because of pending fsync

            gopinathkarangula, what you encountered is not the same as the originally reported issue, where the server error log was flooded with such messages. The rate-limiting seems to work for you. Please file a new ticket for this problem if you think that the configuration is reasonable.

            ralfbecker, by innodb_buffer_cache_size I assume that you mean innodb_buffer_pool_size. Does 10.5.13 or 10.6.5 work for you?

            marko Marko Mäkelä added a comment - gopinathkarangula , what you encountered is not the same as the originally reported issue, where the server error log was flooded with such messages. The rate-limiting seems to work for you. Please file a new ticket for this problem if you think that the configuration is reasonable. ralfbecker , by innodb_buffer_cache_size I assume that you mean innodb_buffer_pool_size . Does 10.5.13 or 10.6.5 work for you?
            ralfbecker Ralf Becker added a comment -

            @Marko Mäkelä: yes, innodb_buffer_pool_size, but it seem to have no effect on the issue we had.
            Since the update to 10.5.13 it has never appeared again - so far.

            Ralf

            ralfbecker Ralf Becker added a comment - @Marko Mäkelä: yes, innodb_buffer_pool_size, but it seem to have no effect on the issue we had. Since the update to 10.5.13 it has never appeared again - so far. Ralf
            Roel Roel Van de Paar made changes -
            marko Marko Mäkelä made changes -
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 138229 113559

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              4 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.