We have innodb_use_native_aio=ON by default since the introduction of that parameter in MariaDB 5.5. However, to really benefit from the setting, the files should be opened in O_DIRECT mode, to bypass the file system cache. In this way, the reads and writes can be submitted with DMA, using the InnoDB buffer pool directly, and no processor cycles need to be used for copying data.
The setting O_DIRECT should be equivalent to the old default innodb_flush_method=fsync in other aspects. Only the file system cache will be bypassed.
Note: innodb_flush_method=O_DIRECT in combination with a tiny innodb_buffer_pool_size may cause a significant performance regression, because we will no longer be able to take advantage of the file system cache of the operating system kernel. The InnoDB buffer pool will completely replace it. Affected users should configure innodb_flush_method=fsync.
This change will not affect Microsoft Windows. The default there is innodb_flush_method=unbuffered, which is roughly equivalent to O_DIRECT.
Attachments
Issue Links
causes
MDEV-33095innodb_flush_method=O_DIRECT creates excessive errors on Solaris
Closed
relates to
MDEV-26040os_file_set_size() may not work on O_DIRECT files
Closed
MDEV-27772Performance regression with default configuration in 10.6
survived reinstall and sysbench prepare and innodb_flush_method=O_DIRECT
Daniel Black
added a comment - btrfs in above test also succeded when wihout directio
without direct io
$ sudo losetup --direct-io=off -f ../btrfs.blk
$ sudo mount /dev/loop6 /mnt/
$ losetup -a
/dev/loop1: []: (/var/lib/snapd/snaps/multipass_5317.snap)
/dev/loop6: []: (/home/dan/repos/btrfs.blk)
survived reinstall and sysbench prepare and innodb_flush_method=O_DIRECT
For MDEV-28111 (MariaDB Server 10.8.3), I tested O_DIRECT writes of the InnoDB redo log ib_logfile0 with and without O_DSYNC on 3 non-FUA devices today: a SATA HDD, SSD, and a PCIe NVMe drive.
Marko Mäkelä
added a comment - For MDEV-28111 (MariaDB Server 10.8.3), I tested O_DIRECT writes of the InnoDB redo log ib_logfile0 with and without O_DSYNC on 3 non-FUA devices today: a SATA HDD, SSD, and a PCIe NVMe drive.
In my test, O_DSYNC was slower on the HDD, and slightly faster on the SSD and NVMe drives. According to https://lwn.net/Articles/400541/ Linux should work correctly on devices that lack FUA support. The unsafety claim that wlad made matches the situation before 2010: https://linux-scsi.vger.kernel.narkive.com/yNnBRBPn/o-direct-and-barriers
MDEV-28766 (starting with MariaDB Server 10.8.4) allows O_DIRECT to be enabled or disabled on the InnoDB write-ahead log file ib_logfile0 on Linux and Microsoft Windows:
SETGLOBAL innodb_log_file_buffering=OFF;
SETGLOBAL innodb_log_file_buffering=ON;
Marko Mäkelä
added a comment - MDEV-28766 (starting with MariaDB Server 10.8.4) allows O_DIRECT to be enabled or disabled on the InnoDB write-ahead log file ib_logfile0 on Linux and Microsoft Windows:
SET GLOBAL innodb_log_file_buffering= OFF ;
SET GLOBAL innodb_log_file_buffering= ON ;
At least on bcachefs, the use of fcntl(fd, F_SETFL, O_DIRECT) can lead to data corruption. In MDEV-33379 (to be released as part of MariaDB Server 10.11.8, 11.0.6) the code was refactored so that instead of invoking fcntl(2), we pass the O_DIRECT flag to the open(2) system call.
Marko Mäkelä
added a comment - At least on bcachefs, the use of fcntl(fd, F_SETFL, O_DIRECT) can lead to data corruption. In MDEV-33379 (to be released as part of MariaDB Server 10.11.8, 11.0.6) the code was refactored so that instead of invoking fcntl(2) , we pass the O_DIRECT flag to the open(2) system call.
People
Marko Mäkelä
Marko Mäkelä
Votes:
0Vote for this issue
Watchers:
3Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":1360.9000000953674,"ttfb":376.19999980926514,"pageVisibility":"visible","entityId":96914,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"f3ab9da6-b6c6-42e5-a8d7-35d6449a5a7e","navigationType":0,"readyForUser":1444.4000000953674,"redirectCount":0,"resourceLoadedEnd":1844.1999998092651,"resourceLoadedStart":414.30000019073486,"resourceTiming":[{"duration":416.69999980926514,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":414.30000019073486,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":414.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":831,"responseStart":0,"secureConnectionStart":0},{"duration":416.90000009536743,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":414.5,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":414.5,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":831.4000000953674,"responseStart":0,"secureConnectionStart":0},{"duration":425.69999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":414.80000019073486,"connectEnd":414.80000019073486,"connectStart":414.80000019073486,"domainLookupEnd":414.80000019073486,"domainLookupStart":414.80000019073486,"fetchStart":414.80000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":414.80000019073486,"responseEnd":840.5,"responseStart":840.5,"secureConnectionStart":414.80000019073486},{"duration":491.19999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":415,"connectEnd":415,"connectStart":415,"domainLookupEnd":415,"domainLookupStart":415,"fetchStart":415,"redirectEnd":0,"redirectStart":0,"requestStart":415,"responseEnd":906.1999998092651,"responseStart":906.1999998092651,"secureConnectionStart":415},{"duration":502.6000003814697,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":415.19999980926514,"connectEnd":415.19999980926514,"connectStart":415.19999980926514,"domainLookupEnd":415.19999980926514,"domainLookupStart":415.19999980926514,"fetchStart":415.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":415.19999980926514,"responseEnd":917.8000001907349,"responseStart":917.8000001907349,"secureConnectionStart":415.19999980926514},{"duration":509.19999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":415.40000009536743,"connectEnd":415.40000009536743,"connectStart":415.40000009536743,"domainLookupEnd":415.40000009536743,"domainLookupStart":415.40000009536743,"fetchStart":415.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":415.40000009536743,"responseEnd":924.5999999046326,"responseStart":924.5,"secureConnectionStart":415.40000009536743},{"duration":513.2000002861023,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":415.69999980926514,"connectEnd":415.69999980926514,"connectStart":415.69999980926514,"domainLookupEnd":415.69999980926514,"domainLookupStart":415.69999980926514,"fetchStart":415.69999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":415.69999980926514,"responseEnd":928.9000000953674,"responseStart":928.9000000953674,"secureConnectionStart":415.69999980926514},{"duration":579.1999998092651,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":415.80000019073486,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":415.80000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":995,"responseStart":0,"secureConnectionStart":0},{"duration":513.5999999046326,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":416,"connectEnd":416,"connectStart":416,"domainLookupEnd":416,"domainLookupStart":416,"fetchStart":416,"redirectEnd":0,"redirectStart":0,"requestStart":416,"responseEnd":929.5999999046326,"responseStart":929.5999999046326,"secureConnectionStart":416},{"duration":579,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":416.19999980926514,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":416.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":995.1999998092651,"responseStart":0,"secureConnectionStart":0},{"duration":513.7999997138977,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":416.30000019073486,"connectEnd":416.30000019073486,"connectStart":416.30000019073486,"domainLookupEnd":416.30000019073486,"domainLookupStart":416.30000019073486,"fetchStart":416.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":416.30000019073486,"responseEnd":930.0999999046326,"responseStart":930.0999999046326,"secureConnectionStart":416.30000019073486},{"duration":1046.9000000953674,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":418.09999990463257,"connectEnd":418.09999990463257,"connectStart":418.09999990463257,"domainLookupEnd":418.09999990463257,"domainLookupStart":418.09999990463257,"fetchStart":418.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":418.09999990463257,"responseEnd":1465,"responseStart":1465,"secureConnectionStart":418.09999990463257},{"duration":1426.0999999046326,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":418.09999990463257,"connectEnd":418.09999990463257,"connectStart":418.09999990463257,"domainLookupEnd":418.09999990463257,"domainLookupStart":418.09999990463257,"fetchStart":418.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":418.09999990463257,"responseEnd":1844.1999998092651,"responseStart":1844.1999998092651,"secureConnectionStart":418.09999990463257},{"duration":553.1999998092651,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":1022.9000000953674,"connectEnd":1022.9000000953674,"connectStart":1022.9000000953674,"domainLookupEnd":1022.9000000953674,"domainLookupStart":1022.9000000953674,"fetchStart":1022.9000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":1022.9000000953674,"responseEnd":1576.0999999046326,"responseStart":1576.0999999046326,"secureConnectionStart":1022.9000000953674},{"duration":581.5,"initiatorType":"script","name":"https://www.google-analytics.com/analytics.js","startTime":1353.6999998092651,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":1353.6999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1935.1999998092651,"responseStart":0,"secureConnectionStart":0}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":183,"responseStart":376,"responseEnd":410,"domLoading":406,"domInteractive":1939,"domContentLoadedEventStart":1939,"domContentLoadedEventEnd":2001,"domComplete":2735,"loadEventStart":2735,"loadEventEnd":2736,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1886.4000000953674},{"name":"bigPipe.sidebar-id.end","time":1887.3000001907349},{"name":"bigPipe.activity-panel-pipe-id.start","time":1887.5},{"name":"bigPipe.activity-panel-pipe-id.end","time":1892.8000001907349},{"name":"activityTabFullyLoaded","time":2056.199999809265}],"measures":[],"correlationId":"1cd81729c39d3c","effectiveType":"4g","downlink":9.1,"rtt":0,"serverDuration":128,"dbReadsTimeInMs":24,"dbConnsTimeInMs":35,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
btrfs test 5.15.14-200.fc35.x86_64
$ dd if=/dev/zero of=../btrfs.blk bs=1M count=2K
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB, 2.0 GiB) copied, 1.50241 s, 1.4 GB/s
$ sudo losetup --direct-io=on -f ../btrfs.blk
$ sudo mkfs.btrfs /dev/loop6
btrfs-progs v5.15.1
See http://btrfs.wiki.kernel.org for more information.
Performing full device TRIM /dev/loop6 (2.00GiB) ...
NOTE: several default settings have changed in version 5.15, please make sure
this does not affect your deployments:
- DUP for metadata (-m dup)
- enabled no-holes (-O no-holes)
- enabled free-space-tree (-R free-space-tree)
Label: (null)
UUID: c364f0a2-b9a9-4b13-a21a-7639fd896765
Node size: 16384
Sector size: 4096
Filesystem size: 2.00GiB
Block group profiles:
Data: single 8.00MiB
Metadata: DUP 102.38MiB
System: DUP 8.00MiB
SSD detected: yes
Zoned device: no
Incompat features: extref, skinny-metadata, no-holes
Runtime features: free-space-tree
Checksum: crc32c
Number of devices: 1
Devices:
ID SIZE PATH
1 2.00GiB /dev/loop6
$ sudo mount /dev/loop6 /mnt/
$ sudo chown dan: /mnt/
$ scripts/mysql_install_db --no-defaults --srcdir=$OLDPWD --builddir=$PWD --datadir=/mnt/dd
Installing MariaDB/MySQL system tables in '/mnt/dd' ...
OK
$ sql/mysqld --no-defaults --skip-networking --datadir=/mnt/dd --verbose
2022-01-24 14:35:49 0 [Note] sql/mysqld (server 10.8.0-MariaDB) starting as process 41741 ...
2022-01-24 14:35:49 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-01-24 14:35:49 0 [Note] InnoDB: Number of transaction pools: 1
2022-01-24 14:35:49 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-01-24 14:35:49 0 [Note] InnoDB: Using liburing
2022-01-24 14:35:49 0 [Note] InnoDB: Initializing buffer pool, total size = 128.000MiB, chunk size = 2.000MiB
2022-01-24 14:35:49 0 [Note] InnoDB: Completed initialization of buffer pool
2022-01-24 14:35:49 0 [Note] InnoDB: 128 rollback segments are active.
2022-01-24 14:35:49 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-01-24 14:35:49 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
2022-01-24 14:35:49 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
2022-01-24 14:35:49 0 [Note] InnoDB: 10.8.0 started; log sequence number 42173; transaction id 14
2022-01-24 14:35:49 0 [Note] InnoDB: Loading buffer pool(s) from /mnt/dd/ib_buffer_pool
2022-01-24 14:35:49 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-01-24 14:35:49 0 [Note] InnoDB: Buffer pool(s) load completed at 220124 14:35:49
2022-01-24 14:35:49 0 [Note] sql/mysqld: ready for connections.
Version: '10.8.0-MariaDB' socket: '/tmp/mysql.sock' port: 0 Source distribution
$ client/mariadb -S /tmp/mysql.sock
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 3
Server version: 10.8.0-MariaDB Source distribution
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> show global variables like 'innodb_flush_method%';
+---------------------+----------+
| Variable_name | Value |
+---------------------+----------+
| innodb_flush_method | O_DIRECT |
+---------------------+----------+
1 row in set (0.002 sec)
$ echo $SYSBENCH
sysbench /usr/share/sysbench/oltp_update_index.lua --mysql-socket=/tmp/mysql.sock --mysql-user=dan --mysql-db=test --percentile=99 --tables=2 --table_size=2000000
$ $SYSBENCH prepare
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
Creating table 'sbtest1'...
Inserting 2000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...
Creating table 'sbtest2'...
Inserting 2000000 records into 'sbtest2'
Creating a secondary index on 'sbtest2'...
$ $SYSBENCH --rand-seed=42 --rand-type=uniform --max-requests=0 --time=60 --report-interval=5 --threads=2 run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 2
Report intermediate results every 5 second(s)
Initializing random number generator from seed (42).
Initializing worker threads...
Threads started!
[ 5s ] thds: 2 tps: 99.77 qps: 99.77 (r/w/o: 0.00/99.77/0.00) lat (ms,99%): 30.26 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 2 tps: 90.14 qps: 90.14 (r/w/o: 0.00/90.14/0.00) lat (ms,99%): 31.94 err/s: 0.00 reconn/s: 0.00
[ 15s ] thds: 2 tps: 98.88 qps: 98.88 (r/w/o: 0.00/98.88/0.00) lat (ms,99%): 48.34 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 2 tps: 62.60 qps: 62.60 (r/w/o: 0.00/62.60/0.00) lat (ms,99%): 89.16 err/s: 0.00 reconn/s: 0.00
[ 25s ] thds: 2 tps: 32.98 qps: 32.98 (r/w/o: 0.00/32.98/0.00) lat (ms,99%): 186.54 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 2 tps: 61.20 qps: 61.20 (r/w/o: 0.00/61.20/0.00) lat (ms,99%): 54.83 err/s: 0.00 reconn/s: 0.00
[ 35s ] thds: 2 tps: 65.19 qps: 65.19 (r/w/o: 0.00/65.19/0.00) lat (ms,99%): 44.98 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 2 tps: 72.01 qps: 72.01 (r/w/o: 0.00/72.01/0.00) lat (ms,99%): 41.85 err/s: 0.00 reconn/s: 0.00
[ 45s ] thds: 2 tps: 65.19 qps: 65.19 (r/w/o: 0.00/65.19/0.00) lat (ms,99%): 40.37 err/s: 0.00 reconn/s: 0.00
[ 50s ] thds: 2 tps: 54.21 qps: 54.21 (r/w/o: 0.00/54.21/0.00) lat (ms,99%): 211.60 err/s: 0.00 reconn/s: 0.00
[ 55s ] thds: 2 tps: 78.00 qps: 78.00 (r/w/o: 0.00/78.00/0.00) lat (ms,99%): 38.94 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: 2 tps: 64.59 qps: 64.59 (r/w/o: 0.00/64.59/0.00) lat (ms,99%): 62.19 err/s: 0.00 reconn/s: 0.00
SQL statistics:
queries performed:
read: 0
write: 4226
other: 0
total: 4226
transactions: 4226 (70.39 per sec.)
queries: 4226 (70.39 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
General statistics:
total time: 60.0297s
total number of events: 4226
Latency (ms):
min: 7.73
avg: 28.40
max: 481.22
99th percentile: 95.81
sum: 120025.38
Threads fairness:
events (avg/stddev): 2113.0000/0.00
execution time (avg/stddev): 60.0127/0.01