Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33379

innodb_log_file_buffering=OFF causes corruption on bcachefs

Details

    Description

      Hello,

      When I installed MariaDB and try to init databases using
      ```sh
      mariadb-install-db --user=mysql --basedir=/usr --datadir=/var/lib/mysql
      ```
      I receive this kind of crash:
      ```
      mariadb-install-db --user=mysql --basedir=/usr --datadir=/var/lib/mysql --verbose
      Installing MariaDB/MySQL system tables in '/var/lib/mysql' ...
      2024-02-03 15:30:30 0 [Note] Starting MariaDB 11.2.2-MariaDB source revision 929532a9426d085111c24c63de9c23cc54382259 as process 28383
      2024-02-03 15:30:30 0 [Note] InnoDB: The first data file './ibdata1' did not exist. A new tablespace will be created!
      2024-02-03 15:30:30 0 [Note] InnoDB: Compressed tables use zlib 1.3
      2024-02-03 15:30:30 0 [Note] InnoDB: Number of transaction pools: 1
      2024-02-03 15:30:30 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2024-02-03 15:30:30 0 [Note] InnoDB: Using liburing
      2024-02-03 15:30:30 0 [Note] InnoDB: Initializing buffer pool, total size = 128.000MiB, chunk size = 2.000MiB
      2024-02-03 15:30:30 0 [Note] InnoDB: Completed initialization of buffer pool
      2024-02-03 15:30:30 0 [Note] InnoDB: Setting file './ibdata1' size to 12.000MiB. Physically writing the file full; Please wait ...
      2024-02-03 15:30:30 0 [Note] InnoDB: File './ibdata1' size is now 12.000MiB.
      2024-02-03 15:30:30 0 [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
      2024-02-03 15:30:30 0 [Warning] InnoDB: 512 bytes should have been written at 12288 from (unknown file), but got only 0. Retrying.
      2024-02-03 15:30:30 0 [Warning] InnoDB: 512 bytes should have been written at 12288 from (unknown file), but got only 0. Retrying.
      2024-02-03 15:30:30 0 [Warning] InnoDB: 512 bytes should have been written at 12288 from (unknown file), but got only 0. Retrying.
      2024-02-03 15:30:30 0 [Warning] InnoDB: 512 bytes should have been written at 12288 from (unknown file), but got only 0. Retrying.
      2024-02-03 15:30:30 0 [Warning] InnoDB: 512 bytes should have been written at 12288 from (unknown file), but got only 0. Retrying.
      2024-02-03 15:30:30 0 [Warning] InnoDB: 512 bytes should have been written at 12288 from (unknown file), but got only 0. Retrying.
      2024-02-03 15:30:30 0 [Warning] InnoDB: 512 bytes should have been written at 12288 from (unknown file), but got only 0. Retrying.
      2024-02-03 15:30:30 0 [Warning] InnoDB: 512 bytes should have been written at 12288 from (unknown file), but got only 0. Retrying.
      2024-02-03 15:30:30 0 [Warning] InnoDB: 512 bytes should have been written at 12288 from (unknown file), but got only 0. Retrying.
      2024-02-03 15:30:30 0 [Warning] InnoDB: 512 bytes should have been written at 12288 from (unknown file), but got only 0. Retrying.
      2024-02-03 15:30:30 0 [Warning] InnoDB: Retry attempts for writing partial data failed.
      2024-02-03 15:30:30 0 [ERROR] InnoDB: Write to file ib_logfile0 failed at offset 12288, 512 bytes should have been written, only 0 were written. Operating system error number 2. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.
      2024-02-03 15:30:30 0 [ERROR] InnoDB: Error number 2 means 'No such file or directory'
      2024-02-03 15:30:30 0 [Note] InnoDB: Some operating system error numbers are described at https://mariadb.com/kb/en/library/operating-system-error-codes/
      2024-02-03 15:30:30 0 [ERROR] [FATAL] InnoDB: write("ib_logfile0") returned I/O error
      240203 15:30:30 [ERROR] mysqld got signal 6 ;
      Sorry, we probably made a mistake, and this is a bug.

      Your assistance in bug reporting will enable us to fix this for the next release.
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 11.2.2-MariaDB source revision: 929532a9426d085111c24c63de9c23cc54382259
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=0
      max_threads=153
      thread_count=0
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 468089 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x49000
      /usr/bin/mariadbd(my_print_stacktrace+0x31)[0x5c8cc8cc4591]
      /usr/bin/mariadbd(handle_fatal_signal+0x47a)[0x5c8cc88a690a]
      /usr/lib/libc.so.6(+0x3a710)[0x7e1151bdc710]
      /usr/lib/libc.so.6(+0x8a83c)[0x7e1151c2c83c]
      /usr/lib/libc.so.6(raise+0x18)[0x7e1151bdc668]
      /usr/lib/libc.so.6(abort+0xd7)[0x7e1151bc44b8]
      /usr/bin/mariadbd(+0x5f0f29)[0x5c8cc8444f29]
      /usr/bin/mariadbd(+0x620b7f)[0x5c8cc8474b7f]
      /usr/bin/mariadbd(+0xd08b4f)[0x5c8cc8b5cb4f]
      /usr/bin/mariadbd(+0xdb04a1)[0x5c8cc8c044a1]
      /usr/bin/mariadbd(+0x5f9511)[0x5c8cc844d511]
      /usr/bin/mariadbd(+0x5e6d3e)[0x5c8cc843ad3e]
      /usr/bin/mariadbd(+0x5eadf6)[0x5c8cc843edf6]
      /usr/bin/mariadbd(+0xcc34d1)[0x5c8cc8b174d1]
      /usr/bin/mariadbd(_Z24ha_initialize_handlertonP13st_plugin_int+0x8e)[0x5c8cc88a7a2e]
      /usr/bin/mariadbd(+0x7d6a13)[0x5c8cc862aa13]
      /usr/bin/mariadbd(+0x7da466)[0x5c8cc862e466]
      /usr/bin/mariadbd(_Z11plugin_initPiPPci+0xe85)[0x5c8cc8635165]
      /usr/bin/mariadbd(+0x6a1821)[0x5c8cc84f5821]
      /usr/bin/mariadbd(_Z11mysqld_mainiPPc+0x2c1a)[0x5c8cc85024ea]
      /usr/lib/libc.so.6(+0x23cd0)[0x7e1151bc5cd0]
      /usr/lib/libc.so.6(__libc_start_main+0x8a)[0x7e1151bc5d8a]
      /usr/bin/mariadbd(_start+0x25)[0x5c8cc84bb0a5]
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains
      information that should help you find out what is causing the crash.
      Writing a core file...
      Working directory at /var/lib/mysql
      Resource Limits:
      Limit Soft Limit Hard Limit Units
      Max cpu time unlimited unlimited seconds
      Max file size unlimited unlimited bytes
      Max data size unlimited unlimited bytes
      Max stack size 8388608 unlimited bytes
      Max core file size 0 0 bytes
      Max resident set unlimited unlimited bytes
      Max processes 127351 127351 processes
      Max open files 32190 32190 files
      Max locked memory 8388608 8388608 bytes
      Max address space unlimited unlimited bytes
      Max file locks unlimited unlimited locks
      Max pending signals 127351 127351 signals
      Max msgqueue size 819200 819200 bytes
      Max nice priority 0 0
      Max realtime priority 0 0
      Max realtime timeout unlimited unlimited us
      Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

      Kernel version: Linux version 6.7.3-arch1-1 (linux@archlinux) (gcc (GCC) 13.2.1 20230801, GNU ld (GNU Binutils) 2.42.0) #1 SMP PREEMPT_DYNAMIC Thu, 01 Feb 2024 10:30:35 +0000

      /usr/bin/mariadb-install-db: line 581: 28383 Aborted (core dumped) "$mysqld_bootstrap" $defaults $defaults_group_suffix "$mysqld_opt" -bootstrap $silent_startup "basedir=$basedir" "datadir=$ldata" --log-warnings=0 --enforce-storage-engine="" "-plugin-dir=${plugindir}" $args --max_allowed_packet=8M --net_buffer_length=16K

      Installation of system tables failed! Examine the logs in
      /var/lib/mysql for more information.

      The problem could be conflicting information in an external
      my.cnf files. You can ignore these by doing:

      shell> /usr/bin/mariadb-install-db --defaults-file=~/.my.cnf

      You can also try to start the mariadbd daemon with:

      shell> /usr/bin/mariadbd --skip-grant-tables --general-log &

      and use the command line tool /usr/bin/mariadb
      to connect to the mysql database and look at the grant tables:

      shell> /usr/bin/mariadb -u root mysql
      MariaDB> show tables;

      Try '/usr/bin/mariadbd --help' if you have problems with paths. Using
      --general-log gives you a log in /var/lib/mysql that may be helpful.

      The latest information about mariadb-install-db is available at
      https://mariadb.com/kb/en/installing-system-tables-mysql_install_db
      You can find the latest source at https://downloads.mariadb.org and
      the maria-discuss email list at https://launchpad.net/~maria-discuss

      Please check all of the above before submitting a bug report
      at https://mariadb.org/jira
      ```

      I dont know whether is it mariaDB or bcachefs issue (as used filesystem). But I would like to help here so I found out some rules for working configurations that "solve" this issue. Working configuration is:
      ```sh
      innodb_flush_log_at_trx_commit = 2
      innodb_flush_method = O_DIRECT_NO_FSYNC
      ```
      `innodb_flush_method` can also be `nosync`. Everything else does not work (`fsync / O_DSYNC / littlesync / O_DIRECT`).

      `innodb_flush_log_at_trx_commit` must always be `= 2` regardless of `innodb_flush_method`. If I set 0 or 1 it just does not work also.

      With this configuration `mariadb-install-db` just works and I can start `mariadb.service`. I did not try real functionality of the mysql server but I expect that as I started without any errors then it could work.

      Could you please navigate me to find out what is wrong? Maybe it is a mariadb issue itself but maybe a filesystem issue. I would like to find out with your support what part of filesystem does not work as expected from mariadb point of view and then I can open an issue in bcachefs if you confirm it is not mariadb issue.

      Just for reference, my bcachefs volume is created using:
      ```sh
      bcachefs format --block_size=4k --label=ssd.ssd1 /dev/nvme0n1p2 --label=ssd.ssd2 /dev/nvme1n1p2 --label=hdd.hdd1 /dev/sda --label=hdd.hdd2 /dev/sdb --replicas=2 --foreground_target=ssd --promote_target=ssd --background_target=hdd
      ```

      Attachments

        Issue Links

          Activity

            The parameter innodb_flush_method was deprecated in MDEV-30136 (MariaDB Server 11.0).

            Can you please try differentn values of the 4 Boolean parameters:

            • innodb_log_file_buffering (disable O_DIRECT on ib_logfile0)
            • innodb_data_file_buffering (disable O_DIRECT on data files)
            • innodb_log_file_write_through (enable O_DSYNC on the log)
            • innodb_data_file_write_through (enable O_DSYNC on persistent data files)

            I would expect O_DIRECT to not be supported on many copy-on-write file systems that support transparent compression. Ideally, the file system should return an error from a system call that tries to enable O_DIRECT, instead of corrupting any data. As you can read in issues linked from MDEV-24854, also btrfs has had similar trouble with O_DIRECT.

            marko Marko Mäkelä added a comment - The parameter innodb_flush_method was deprecated in MDEV-30136 (MariaDB Server 11.0). Can you please try differentn values of the 4 Boolean parameters: innodb_log_file_buffering (disable O_DIRECT on ib_logfile0 ) innodb_data_file_buffering (disable O_DIRECT on data files) innodb_log_file_write_through (enable O_DSYNC on the log) innodb_data_file_write_through (enable O_DSYNC on persistent data files) I would expect O_DIRECT to not be supported on many copy-on-write file systems that support transparent compression. Ideally, the file system should return an error from a system call that tries to enable O_DIRECT , instead of corrupting any data. As you can read in issues linked from MDEV-24854 , also btrfs has had similar trouble with O_DIRECT .

            Thank you for your quick support.

            These are results from testing. I used default configuration (removed innodb_flush_log_at_trx_commit and innodb_flush_method).

            log_file_buffering data_file_buffering log_file_write_through data_file_write_through mariadb-install-db
            OFF OFF OFF OFF ERROR
            ON OFF OFF OFF OK
            OFF ON OFF OFF ERROR
            OFF OFF ON OFF ERROR
            OFF OFF OFF ON ERROR
            OFF OFF ON ON ERROR
            ON OFF ON OFF OK
            ON ON ON ON OK

            I opened an issue in bcachefs and will link this for reference. It looks like some issue with O_DIRECT.

            kezyr Miroslav Rýzek added a comment - Thank you for your quick support. These are results from testing. I used default configuration (removed innodb_flush_log_at_trx_commit and innodb_flush_method). log_file_buffering data_file_buffering log_file_write_through data_file_write_through mariadb-install-db OFF OFF OFF OFF ERROR ON OFF OFF OFF OK OFF ON OFF OFF ERROR OFF OFF ON OFF ERROR OFF OFF OFF ON ERROR OFF OFF ON ON ERROR ON OFF ON OFF OK ON ON ON ON OK I opened an issue in bcachefs and will link this for reference. It looks like some issue with O_DIRECT.

            kezyr, thank you for testing this. Your table does not quite look like a truth table with 4 inputs, because there are 5 instead of 4 lines innodb_log_file_buffering=OFF. In any case, the error seems to correlate exactly with that.

            One more difference between the log and data files is that the ib_logfile0 is being written synchronously (we may revisit MDEV-26603 some day) while the data files are typically written asynchronously via io_uring or libaio (io_setup(), io_submit(), io_getevents()). But, it does not look like this would be a problem. After all, we are using the simpler (synchronous) API for log file I/O.

            I see that on Microsoft Windows, we pass the flags FILE_FLAG_NO_BUFFERING (corresponding to O_DIRECT) and FILE_FLAG_WRITE_THROUGH (corresponding to O_DSYNC or O_SYNC) to the CreateFile() call, which corresponds to open(2) in POSIX environments. Before MDEV-30136, on POSIX like systems we never passed the O_DIRECT flag to open(2). Only the O_DSYNC or O_SYNC were be passed to it. The commit message includes the following:

            We will open files straight in O_DSYNC or O_SYNC mode when applicable. Data files we will try to open straight in O_DIRECT mode when the page size is at least 4096 bytes. For atomically creating data files, we will invoke os_file_set_nocache() to enable O_DIRECT afterwards, because O_DIRECT is not supported on some file systems. We will also continue to invoke os_file_set_nocache() on ib_logfile0 when innodb_log_file_buffering=OFF can be fulfilled.

            I think that we should be able to fix this in MariaDB Server, by removing the function os_file_set_nocache() and always trying to open the files in O_DIRECT mode when needed. We probably should fix this in 10.11 as well, because it is the newest long-term supported (LTS) release. Fixing this in older LTS (10.5 or 10.6) could require more extensive changes.

            marko Marko Mäkelä added a comment - kezyr , thank you for testing this. Your table does not quite look like a truth table with 4 inputs, because there are 5 instead of 4 lines innodb_log_file_buffering=OFF . In any case, the error seems to correlate exactly with that. One more difference between the log and data files is that the ib_logfile0 is being written synchronously (we may revisit MDEV-26603 some day) while the data files are typically written asynchronously via io_uring or libaio ( io_setup() , io_submit() , io_getevents() ). But, it does not look like this would be a problem. After all, we are using the simpler (synchronous) API for log file I/O. I see that on Microsoft Windows, we pass the flags FILE_FLAG_NO_BUFFERING (corresponding to O_DIRECT ) and FILE_FLAG_WRITE_THROUGH (corresponding to O_DSYNC or O_SYNC ) to the CreateFile() call, which corresponds to open(2) in POSIX environments. Before MDEV-30136 , on POSIX like systems we never passed the O_DIRECT flag to open(2) . Only the O_DSYNC or O_SYNC were be passed to it. The commit message includes the following: We will open files straight in O_DSYNC or O_SYNC mode when applicable. Data files we will try to open straight in O_DIRECT mode when the page size is at least 4096 bytes. For atomically creating data files, we will invoke os_file_set_nocache() to enable O_DIRECT afterwards, because O_DIRECT is not supported on some file systems. We will also continue to invoke os_file_set_nocache() on ib_logfile0 when innodb_log_file_buffering=OFF can be fulfilled. I think that we should be able to fix this in MariaDB Server, by removing the function os_file_set_nocache() and always trying to open the files in O_DIRECT mode when needed. We probably should fix this in 10.11 as well, because it is the newest long-term supported (LTS) release. Fixing this in older LTS (10.5 or 10.6) could require more extensive changes.

            Looks ok

            wlad Vladislav Vaintroub added a comment - Looks ok

            People

              marko Marko Mäkelä
              kezyr Miroslav Rýzek
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.