Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.5.9
-
None
-
[root@localhost~]# cat /etc/redhat-release
CentOS Linux release 7.8.2003 (Core)
[root@localhost~]# uname -r
4.14.105-19-0013
[root@localhost~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
[root@localhost~]# free -g
total used free shared buff/cache available
Mem: 125 46 42 1 35 75
Swap: 3 0 3
The SSD device is Intel P4610 3.2TB and the database version is MariaDB 10.5.9 Community Edition[ root@localhost ~]# cat /etc/redhat-release CentOS Linux release 7.8.2003 (Core) [ root@localhost ~]# uname -r 4.14.105-19-0013 [ root@localhost ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 48 On-line CPU(s) list: 0-47 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz [ root@localhost ~]# free -g total used free shared buff/cache available Mem: 125 46 42 1 35 75 Swap: 3 0 3 The SSD device is Intel P4610 3.2TB and the database version is MariaDB 10.5.9 Community Edition
Description
Repeat steps are as follows (100% reproducible) :
1. Run the Sysbench benchmark using either Intel P4610 or ScaleFlux CSD 2000 and set the logical sector to 4K (take Intel P4610 as an example below)
[root@localhost~]# isdct start -intelssd 0 -nvmeformat LBAFormat=1
[root@localhost~]# nvme list
Node SN Model Namespace Usage Format FW Rev
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1 PHLJ8230004W4P0DXX INTEL SSDPE2KX040XX 1 3.84 TB / 3.84 TB 4 KiB + 0 B VDV10131
2. Use the ext4 file system to format and mount the Intel P4610
[root@localhost~]# mkfs.ext4 /dev/nvme0n1
[root@localhost~]# mount /dev/nvme0n1 -o discard /data/nvme0n1
3. Use the binary tar package of MariaDB 10.5.9 to initialize the installation (steps are abbreviated). The configuration options are as follows
[client]
socket=/data/sfdv0n1/mariadb-10.5.9/mysql.sock
[mysqld_safe]
user=mysql
log_error=/data/sfdv0n1/mariadb-10.5.9/error.log
[mysqld]
socket=/data/sfdv0n1/mariadb-10.5.9/mysql.sock
datadir=/data/sfdv0n1/mariadb-10.5.9
basedir=/opt/app/mariadb-10.5.9
user=mysql
log_error=/data/sfdv0n1/mariadb-10.5.9/error.log
explicit_defaults_for_timestamp=1
innodb_page_size=16384
innodb_buffer_pool_size =32G
innodb_buffer_pool_instances=8
innodb_page_cleaners=8
innodb_log_file_size =8G
innodb_log_buffer_size = 128M
innodb_flush_log_at_trx_commit=1
innodb_thread_concurrency=0
innodb_open_files=100000
innodb_file_per_table=1
innodb_flush_method=O_DIRECT
innodb_change_buffering=all
innodb_adaptive_flushing=1
innodb_old_blocks_time=1000
innodb_use_native_aio=1
innodb_lock_wait_timeout=120
lock_wait_timeout=60
innodb_io_capacity_max = 100000
innodb_flush_neighbors = 0
innodb_log_write_ahead_size=8192
innodb_doublewrite=1
innodb_compression_algorithm=zlib
max_connections=65536
max_prepared_stmt_count=1048576
4. Running sysbench requires more than 20GB of data to be written to disk. The sysbench command looks like this
[root@localhost~]# sysbench --db-driver=mysql --time=900 --threads=16 --report-interval=1 --mysql-socket=/data/sfdv0n1/mariadb-10.5.9/mysql.sock --mysql-user=qbench --mysql-password=qbench --mysql-db=sysbench --tables=32 --table-size=20000000 oltp_read_write --db-ps-mode=disable --percentile=99 --mysql-ignore-errors=1062,1213 --mysql_storage_engine=innodb --create_table_options="page_compressed=1" --rand-type=uniform prepare
5. After a minute or so, the sysbench tells you that the connection is broken. If you look at the MariaDB error log, you can see a crash prompt similar to the one shown in the figure below
PS: if the logic of SSD equipment sector is set to 512 bytes, will not appear afore-mentioned crash phenomenon, we again in many different servers, once the logical sector is set to 4 k of SSD equipment, enabling the page compression would be a collapse of the Server process, and once the logic of SSD equipment sector is set to 512 bytes, enable page compression will not appear the phenomenon of Server process crashes, seems page compression of SSD equipment 4 k logical sector support not friendly enough?
Attachments
Issue Links
- relates to
-
MDEV-16328 ALTER TABLE…page_compression_level should not rebuild table
-
- Closed
-
-
MDEV-21584 Linux aio returned OS error 22
-
- Closed
-
-
MDEV-26040 os_file_set_size() may not work on O_DIRECT files
-
- Closed
-
-
MDEV-16264 Implement a common work queue for InnoDB background tasks
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue relates to |
Component/s | Storage Engine - InnoDB [ 10129 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Assignee | Marko Mäkelä [ marko ] | |
Description |
h2. Repeat steps are as follows (100% reproducible) : h3. 1. Run the Sysbench benchmark using either Intel P4610 or ScaleFlux CSD 2000 and set the logical sector to 4K (take Intel P4610 as an example below) [root@localhost~]# isdct start -intelssd 0 -nvmeformat LBAFormat=1 [root@localhost~]# nvme list Node SN Model Namespace Usage Format FW Rev ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme0n1 PHLJ8230004W4P0DXX INTEL SSDPE2KX040XX 1 3.84 TB / 3.84 TB 4 KiB + 0 B VDV10131 h3. 2. Use the ext4 file system to format and mount the Intel P4610 [root@localhost~]# mkfs.ext4 /dev/nvme0n1 [root@localhost~]# mount /dev/nvme0n1 -o discard /data/nvme0n1 h3. 3. Use the binary tar package of MariaDB 10.5.9 to initialize the installation (steps are abbreviated). The configuration options are as follows [client] socket=/data/sfdv0n1/mariadb-10.5.9/mysql.sock [mysqld_safe] user=mysql log_error=/data/sfdv0n1/mariadb-10.5.9/error.log [mysqld] socket=/data/sfdv0n1/mariadb-10.5.9/mysql.sock datadir=/data/sfdv0n1/mariadb-10.5.9 basedir=/opt/app/mariadb-10.5.9 user=mysql log_error=/data/sfdv0n1/mariadb-10.5.9/error.log explicit_defaults_for_timestamp=1 innodb_page_size=16384 innodb_buffer_pool_size =32G innodb_buffer_pool_instances=8 innodb_page_cleaners=8 innodb_log_file_size =8G innodb_log_buffer_size = 128M innodb_flush_log_at_trx_commit=1 innodb_thread_concurrency=0 innodb_open_files=100000 innodb_file_per_table=1 innodb_flush_method=O_DIRECT innodb_change_buffering=all innodb_adaptive_flushing=1 innodb_old_blocks_time=1000 innodb_use_native_aio=1 innodb_lock_wait_timeout=120 lock_wait_timeout=60 innodb_io_capacity_max = 100000 innodb_flush_neighbors = 0 innodb_log_write_ahead_size=8192 innodb_doublewrite=1 innodb_compression_algorithm=zlib max_connections=65536 max_prepared_stmt_count=1048576 h3. 4. Running sysbench requires more than 20GB of data to be written to disk. The sysbench command looks like this [root@localhost~]# sysbench --db-driver=mysql --time=900 --threads=16 --report-interval=1 --mysql-socket=/data/sfdv0n1/mariadb-10.5.9/mysql.sock --mysql-user=qbench --mysql-password=qbench --mysql-db=sysbench --tables=32 --table-size=20000000 oltp_read_write --db-ps-mode=disable --percentile=99 --mysql-ignore-errors=1062,1213 --mysql_storage_engine=innodb --create_table_options="page_compressed=1" --rand-type=uniform prepare h3. 5. After a minute or so, the sysbench tells you that the connection is broken. If you look at the MariaDB error log, you can see a crash prompt similar to the one shown in the figure below !image-2021-03-12-14-22-34-220.png|thumbnail! *PS: if the logic of SSD equipment sector is set to 512 bytes, will not appear afore-mentioned crash phenomenon, we again in many different servers, once the logical sector is set to 4 k of SSD equipment, enabling the page compression would be a collapse of the Server process, and once the logic of SSD equipment sector is set to 512 bytes, enable page compression will not appear the phenomenon of Server process crashes, seems page compression of SSD equipment 4 k logical sector support not friendly enough?* |
h2. Repeat steps are as follows (100% reproducible) :
h3. 1. Run the Sysbench benchmark using either Intel P4610 or ScaleFlux CSD 2000 and set the logical sector to 4K (take Intel P4610 as an example below) [root@localhost~]# isdct start -intelssd 0 -nvmeformat LBAFormat=1 [root@localhost~]# nvme list Node SN Model Namespace Usage Format FW Rev ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme0n1 PHLJ8230004W4P0DXX INTEL SSDPE2KX040XX 1 3.84 TB / 3.84 TB 4 KiB + 0 B VDV10131 h3. 2. Use the ext4 file system to format and mount the Intel P4610 [root@localhost~]# mkfs.ext4 /dev/nvme0n1 [root@localhost~]# mount /dev/nvme0n1 -o discard /data/nvme0n1 h3. 3. Use the binary tar package of MariaDB 10.5.9 to initialize the installation (steps are abbreviated). The configuration options are as follows [client] socket=/data/sfdv0n1/mariadb-10.5.9/mysql.sock [mysqld_safe] user=mysql log_error=/data/sfdv0n1/mariadb-10.5.9/error.log [mysqld] socket=/data/sfdv0n1/mariadb-10.5.9/mysql.sock datadir=/data/sfdv0n1/mariadb-10.5.9 basedir=/opt/app/mariadb-10.5.9 user=mysql log_error=/data/sfdv0n1/mariadb-10.5.9/error.log explicit_defaults_for_timestamp=1 innodb_page_size=16384 innodb_buffer_pool_size =32G innodb_buffer_pool_instances=8 innodb_page_cleaners=8 innodb_log_file_size =8G innodb_log_buffer_size = 128M innodb_flush_log_at_trx_commit=1 innodb_thread_concurrency=0 innodb_open_files=100000 innodb_file_per_table=1 innodb_flush_method=O_DIRECT innodb_change_buffering=all innodb_adaptive_flushing=1 innodb_old_blocks_time=1000 innodb_use_native_aio=1 innodb_lock_wait_timeout=120 lock_wait_timeout=60 innodb_io_capacity_max = 100000 innodb_flush_neighbors = 0 innodb_log_write_ahead_size=8192 innodb_doublewrite=1 innodb_compression_algorithm=zlib max_connections=65536 max_prepared_stmt_count=1048576 h3. 4. Running sysbench requires more than 20GB of data to be written to disk. The sysbench command looks like this [root@localhost~]# sysbench --db-driver=mysql --time=900 --threads=16 --report-interval=1 --mysql-socket=/data/sfdv0n1/mariadb-10.5.9/mysql.sock --mysql-user=qbench --mysql-password=qbench --mysql-db=sysbench --tables=32 --table-size=20000000 oltp_read_write --db-ps-mode=disable --percentile=99 --mysql-ignore-errors=1062,1213 --mysql_storage_engine=innodb --create_table_options="page_compressed=1" --rand-type=uniform prepare h3. 5. After a minute or so, the sysbench tells you that the connection is broken. If you look at the MariaDB error log, you can see a crash prompt similar to the one shown in the figure below !image-2021-03-12-14-22-34-220.png|thumbnail! *PS: if the logic of SSD equipment sector is set to 512 bytes, will not appear afore-mentioned crash phenomenon, we again in many different servers, once the logical sector is set to 4 k of SSD equipment, enabling the page compression would be a collapse of the Server process, and once the logic of SSD equipment sector is set to 512 bytes, enable page compression will not appear the phenomenon of Server process crashes, seems page compression of SSD equipment 4 k logical sector support not friendly enough?* |
Labels | need_feedback |
Attachment | isdct-3.0.26.400-1.x86_64.rpm [ 56637 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Link |
This issue relates to |
Link |
This issue relates to |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Assignee | Marko Mäkelä [ marko ] | Vladislav Vaintroub [ wlad ] |
Labels | need_feedback | |
Summary | The SSD device set 4K logical sectors to run page compression causing the process to crash | innodb_flush_method=O_DIRECT fails on compressed tables |
Assignee | Vladislav Vaintroub [ wlad ] | Marko Mäkelä [ marko ] |
Priority | Major [ 3 ] | Blocker [ 1 ] |
issue.field.resolutiondate | 2021-03-18 13:49:57.0 | 2021-03-18 13:49:57.797 |
Fix Version/s | 10.2.38 [ 25207 ] | |
Fix Version/s | 10.3.29 [ 25206 ] | |
Fix Version/s | 10.4.19 [ 25205 ] | |
Fix Version/s | 10.5.10 [ 25204 ] | |
Fix Version/s | 10.6.0 [ 24431 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Link |
This issue relates to |
Workflow | MariaDB v3 [ 120048 ] | MariaDB v4 [ 159019 ] |
Thank you for the report. The crash occurs due to the following:
{
ut_a(cb->m_err == DB_SUCCESS);
This code was refactored in
MDEV-16264.I see that you are using innodb_flush_method=O_DIRECT, which should ensure that DMA is being used. Without it, the Linux kernel could use more CPU in the io_submit() call. In 10.6, we finally changed that to be the default (
MDEV-24854).I suspect that the this is somehow related to the use of page_compressed=1 tables. We have tested MariaDB on various hardware (including NVMe). I think that I ran the ./mtr regression test suite on my NVMe (Intel Optane 960, INTEL SSDPED1D960GAY) when I implemented
MDEV-24854. It includes some tests with page_compressed=1.I think that we need more information to fix this. Could you provide some strace output that could hint what could have gone wrong?
Did I get it right that this does not occur if you are not page_compressed tables? What about ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1 (using 1KiB block size)? I understood that we should avoid setting O_DIRECT on the files in either case. The strace output would help verify that.
If I have understood it correctly, for the ScaleFlux hardware, we would probably want to change page_compressed code so that the various IORequest::PUNCH will never be used, but instead sequences of NUL bytes will be written. That is, we would want to let the file system treat the data files as regular files, and the smart storage would transparently compress the individual sectors.