Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25121

innodb_flush_method=O_DIRECT fails on compressed tables

Details

    • Bug
    • Status: Closed (View Workflow)
    • Blocker
    • Resolution: Fixed
    • 10.5.9
    • 10.2.38, 10.3.29, 10.4.19, 10.5.10, 10.6.0
    • None

    Description

      Repeat steps are as follows (100% reproducible) :

      1. Run the Sysbench benchmark using either Intel P4610 or ScaleFlux CSD 2000 and set the logical sector to 4K (take Intel P4610 as an example below)

      [root@localhost~]# isdct start -intelssd 0 -nvmeformat LBAFormat=1
      [root@localhost~]# nvme list
      Node SN Model Namespace Usage Format FW Rev
      ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
      /dev/nvme0n1 PHLJ8230004W4P0DXX INTEL SSDPE2KX040XX 1 3.84 TB / 3.84 TB 4 KiB + 0 B VDV10131

      2. Use the ext4 file system to format and mount the Intel P4610

      [root@localhost~]# mkfs.ext4 /dev/nvme0n1
      [root@localhost~]# mount /dev/nvme0n1 -o discard /data/nvme0n1

      3. Use the binary tar package of MariaDB 10.5.9 to initialize the installation (steps are abbreviated). The configuration options are as follows

      [client]
      socket=/data/sfdv0n1/mariadb-10.5.9/mysql.sock

      [mysqld_safe]
      user=mysql
      log_error=/data/sfdv0n1/mariadb-10.5.9/error.log

      [mysqld]
      socket=/data/sfdv0n1/mariadb-10.5.9/mysql.sock
      datadir=/data/sfdv0n1/mariadb-10.5.9
      basedir=/opt/app/mariadb-10.5.9
      user=mysql
      log_error=/data/sfdv0n1/mariadb-10.5.9/error.log
      explicit_defaults_for_timestamp=1

      innodb_page_size=16384
      innodb_buffer_pool_size =32G
      innodb_buffer_pool_instances=8
      innodb_page_cleaners=8
      innodb_log_file_size =8G
      innodb_log_buffer_size = 128M
      innodb_flush_log_at_trx_commit=1
      innodb_thread_concurrency=0
      innodb_open_files=100000
      innodb_file_per_table=1
      innodb_flush_method=O_DIRECT
      innodb_change_buffering=all
      innodb_adaptive_flushing=1
      innodb_old_blocks_time=1000
      innodb_use_native_aio=1
      innodb_lock_wait_timeout=120
      lock_wait_timeout=60
      innodb_io_capacity_max = 100000
      innodb_flush_neighbors = 0
      innodb_log_write_ahead_size=8192

      innodb_doublewrite=1
      innodb_compression_algorithm=zlib
      max_connections=65536
      max_prepared_stmt_count=1048576

      4. Running sysbench requires more than 20GB of data to be written to disk. The sysbench command looks like this

      [root@localhost~]# sysbench --db-driver=mysql --time=900 --threads=16 --report-interval=1 --mysql-socket=/data/sfdv0n1/mariadb-10.5.9/mysql.sock --mysql-user=qbench --mysql-password=qbench --mysql-db=sysbench --tables=32 --table-size=20000000 oltp_read_write --db-ps-mode=disable --percentile=99 --mysql-ignore-errors=1062,1213 --mysql_storage_engine=innodb --create_table_options="page_compressed=1" --rand-type=uniform prepare

      5. After a minute or so, the sysbench tells you that the connection is broken. If you look at the MariaDB error log, you can see a crash prompt similar to the one shown in the figure below

      PS: if the logic of SSD equipment sector is set to 512 bytes, will not appear afore-mentioned crash phenomenon, we again in many different servers, once the logical sector is set to 4 k of SSD equipment, enabling the page compression would be a collapse of the Server process, and once the logic of SSD equipment sector is set to 512 bytes, enable page compression will not appear the phenomenon of Server process crashes, seems page compression of SSD equipment 4 k logical sector support not friendly enough?

      Attachments

        Issue Links

          Activity

            xiaoboluo768 Xiaobo Luo added a comment -

            Okay, thank you Marko

            xiaoboluo768 Xiaobo Luo added a comment - Okay, thank you Marko
            xiaoboluo768 Xiaobo Luo added a comment -

            I think I should wait for this problem to be fixed and then I will run the stress test script for testing, because the test is not very urgent at the moment

            xiaoboluo768 Xiaobo Luo added a comment - I think I should wait for this problem to be fixed and then I will run the stress test script for testing, because the test is not very urgent at the moment

            There was an earlier attempt to fix this bug: MDEV-21584 Linux aio returned OS error 22
            However, because we were unable to test it on a system that has a larger block size than 512 bytes, the fix turned out to be incomplete.

            marko Marko Mäkelä added a comment - There was an earlier attempt to fix this bug: MDEV-21584 Linux aio returned OS error 22 However, because we were unable to test it on a system that has a larger block size than 512 bytes, the fix turned out to be incomplete.

            I have a fix for 10.5 and 10.6 that makes the tests pass on the remote system.

            The earlier attempt of fixing this (MDEV-21584) intended to disable O_DIRECT for page_compressed tables, but the check was incorrect for files that were created with innodb_checksum_algorithm=full_crc32, and the check was omitted during file creation (executed only when opening pre-existing files). But, as far as I can tell, there is no need to disable O_DIRECT; we only have to ensure that fil_node_t::block_size is set correctly.

            Before my fix, I got failures also for ROW_FORMAT=COMPRESSED tables using KEY_BLOCK_SIZE=1 (1024 bytes) or KEY_BLOCK_SIZE=2 (2048 bytes). It easiest to refuse O_DIRECT for them. I intend to deprecate and remove that format; MDEV-23497 is the first step towards that.

            wlad is now checking that after my fix, everything will work correctly on Microsoft Windows, and then I will have to port and test the fix on 10.2, 10.3, 10.4. I believe that all major versions will differ a little in this area.

            marko Marko Mäkelä added a comment - I have a fix for 10.5 and 10.6 that makes the tests pass on the remote system. The earlier attempt of fixing this ( MDEV-21584 ) intended to disable O_DIRECT for page_compressed tables, but the check was incorrect for files that were created with innodb_checksum_algorithm=full_crc32 , and the check was omitted during file creation (executed only when opening pre-existing files). But, as far as I can tell, there is no need to disable O_DIRECT ; we only have to ensure that fil_node_t::block_size is set correctly. Before my fix, I got failures also for ROW_FORMAT=COMPRESSED tables using KEY_BLOCK_SIZE=1 (1024 bytes) or KEY_BLOCK_SIZE=2 (2048 bytes). It easiest to refuse O_DIRECT for them. I intend to deprecate and remove that format; MDEV-23497 is the first step towards that. wlad is now checking that after my fix, everything will work correctly on Microsoft Windows, and then I will have to port and test the fix on 10.2, 10.3, 10.4. I believe that all major versions will differ a little in this area.

            I verified that already MariaDB 10.2 is broken. Furthermore, MDEV-21584 unnecessarily disabled O_DIRECT on page_compressed data files. And it failed to disable the equivalent (FILE_FLAG_NO_BUFFERING) for ROW_FORMAT=COMPRESSED tables with 1024- or 2048-byte page size on Microsoft Windows.

            I tested 10.2, 10.4 and 10.6 based branches without and with my fix. I will keep this ticket open until the fix has been merged up to 10.6.

            On 10.4, I used the following invocation:

            ./mtr --parallel=$(nproc) --big-test --suite=innodb,innodb_zip,encryption,mariabackup --mysqld=--loose-innodb-flush-method=O_DIRECT --mysqld=--loose-innodb-checksum-algoritm=full_crc32
            

            while mysql-test/var was a symlink to a directory in the SSD with 4KiB block size. 10.4 would normally use innodb_checksum_algorithm=crc32; the default was changed for 10.5 in MDEV-19534.

            marko Marko Mäkelä added a comment - I verified that already MariaDB 10.2 is broken. Furthermore, MDEV-21584 unnecessarily disabled O_DIRECT on page_compressed data files. And it failed to disable the equivalent ( FILE_FLAG_NO_BUFFERING ) for ROW_FORMAT=COMPRESSED tables with 1024- or 2048-byte page size on Microsoft Windows. I tested 10.2, 10.4 and 10.6 based branches without and with my fix. I will keep this ticket open until the fix has been merged up to 10.6. On 10.4, I used the following invocation: ./mtr --parallel=$(nproc) --big-test --suite=innodb,innodb_zip,encryption,mariabackup --mysqld=--loose-innodb-flush-method=O_DIRECT --mysqld=--loose-innodb-checksum-algoritm=full_crc32 while mysql-test/var was a symlink to a directory in the SSD with 4KiB block size. 10.4 would normally use innodb_checksum_algorithm=crc32 ; the default was changed for 10.5 in MDEV-19534 .

            People

              marko Marko Mäkelä
              xiaoboluo768 Xiaobo Luo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.