[MDEV-31227] innodb_flush_method=O_DIRECT causes 3x regression in workload - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Minor
Resolution: Not a Bug
Affects Version/s: 10.6.12, 10.6.13
Fix Version/s: N/A
Component/s: Storage Engine - InnoDB
Labels:
None
Environment:
CentOS 7, kernel 3.10

Description

Repro procedure is the same as ~~MDEV-30567~~. Additional details for that in the comments.

In short, this is the performance discrepancy we're seeing in 10.6.13 (10.6.12 performs similarly but slightly worse)-

# Default (O_DIRECT)

# * Q0: real: 14.21  min: 14.21  max: 14.21

# innodb-flush-method = fsync

# * Q0: real: 4.94  min: 4.94  max: 4.94

# innodb-flush-method = O_DSYNC

# * Q0: real: 4.92  min: 4.92  max: 4.92

# innodb-flush-method = littlesync

# * Q0: real: 4.91  min: 4.91  max: 4.91

# innodb-flush-method = nosync

# * Q0: real: 4.95  min: 4.95  max: 4.95

# innodb-flush-method = O_DIRECT

# * Q0: real: 14.21  min: 14.21  max: 14.21

# innodb-flush-method = O_DIRECT_NO_FSYNC

# * Q0: real: 12.81  min: 12.81  max: 12.81

The customer for whom we are concerned about this issue for is currently using 10.4 where innodb_flush_method=fsync is the default. For 10.6 though, this changes to O_DIRECT. For now, we're advising the customer to hard-specify innodb_flush_method=fsync for 10.6. However, we know that O_DIRECT was made default in 10.6 as it generally improves over fsync, so we'd like to know if the regression we're seeing here is due to a bug that, if fixed, would make O_DIRECT the all-around ideal it is intended to be.

Attachments

Issue Links

duplicates

MDEV-27772 Performance regression with default configuration in 10.6

Open

is blocked by

MDEV-29967 innodb_read_ahead_threshold (linear read-ahead) does not work

Closed

relates to

MDEV-30567 rec_get_offsets() is not optimal

Closed

MDEV-30986 Slow full index scan in 10.6 vs 10.5 for the (slow) I/O-bound case

Closed

MDEV-35813 Performance regression in INSERT…SELECT due to unnecessarily making InnoDB log durable

Closed

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(9 mentioned in)

Activity

Ascending order - Click to sort in descending order

Rob Schwyzer (Inactive) created issue - 2023-05-09 23:35

Rob Schwyzer (Inactive) made changes - 2023-05-09 23:35

Field	Original Value	New Value
Link		This issue relates to ~~MDEV-30567~~ [ ~~MDEV-30567~~ ]

Rob Schwyzer (Inactive) made changes - 2023-05-09 23:49

Priority

Major [ 3 ]

Minor [ 4 ]

Marko Mäkelä made changes - 2023-05-10 05:27

Link

This issue is blocked by ~~MDEV-29967~~ [ ~~MDEV-29967~~ ]

Marko Mäkelä added a comment - 2023-05-10 05:27

As per ~~MDEV-29967~~, the linear read-ahead is broken already in 10.5.

Also ~~MDEV-30986~~ has been filed for something similar.

Marko Mäkelä added a comment - 2023-05-10 05:27 As per MDEV-29967 , the linear read-ahead is broken already in 10.5. Also MDEV-30986 has been filed for something similar.

Marko Mäkelä made changes - 2023-05-10 05:27

Link

This issue is blocked by ~~MDEV-30986~~ [ ~~MDEV-30986~~ ]

Julien Fritsch made changes - 2023-05-10 06:53

Fix Version/s		10.4 [ 22408 ]
Fix Version/s		10.5 [ 23123 ]
Fix Version/s		10.6 [ 24028 ]

Julien Fritsch made changes - 2023-05-10 06:57

Fix Version/s	10.4 [ 22408 ]
Fix Version/s	10.5 [ 23123 ]

Julien Fritsch made changes - 2023-05-10 07:00

Assignee

Marko Mäkelä [ marko ]

Rob Schwyzer (Inactive) made changes - 2023-05-10 19:36

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 35293 ]

Rob Schwyzer (Inactive) made changes - 2023-05-10 20:32

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 35294 ]

Rob Schwyzer (Inactive) made changes - 2023-05-10 21:11

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 35306 ]

Rob Schwyzer (Inactive) made changes - 2023-05-12 20:26

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 35294 ]

Rob Schwyzer (Inactive) made changes - 2023-05-15 20:18

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 35312 ]

Rob Schwyzer (Inactive) made changes - 2023-05-17 19:49

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 35314 ]

Marko Mäkelä added a comment - 2023-05-24 14:27

rob.schwyzer@mariadb.com, can you please test the ~~MDEV-30986~~ branch, which implements logical read-ahead?

Marko Mäkelä added a comment - 2023-05-24 14:27 rob.schwyzer@mariadb.com , can you please test the MDEV-30986 branch, which implements logical read-ahead?

Marko Mäkelä made changes - 2023-05-24 14:27

Assignee	Marko Mäkelä [ marko ]	Rob Schwyzer [ JIRAUSER40844 ]
Status	Open [ 1 ]	Needs Feedback [ 10501 ]

Rob Schwyzer (Inactive) made changes - 2023-05-24 22:20

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 35405 ]

Rob Schwyzer (Inactive) made changes - 2023-05-24 22:27

Assignee

Rob Schwyzer [ JIRAUSER40844 ]

Marko Mäkelä [ marko ]

Rob Schwyzer (Inactive) made changes - 2023-05-24 22:27

Status

Needs Feedback [ 10501 ]

Open [ 1 ]

Marko Mäkelä made changes - 2023-06-07 09:02

Assignee	Marko Mäkelä [ marko ]	Rob Schwyzer [ JIRAUSER40844 ]
Status	Open [ 1 ]	Needs Feedback [ 10501 ]

Marko Mäkelä added a comment - 2023-06-07 09:07

If I understood correctly, the workload consists solely of queries like the following:

SELECT c1 FROM "table" ORDER BY c1 LIMIT 1;

This looks like random I/O to me. Would setting innodb_random_read_ahead=ON help here? It is disabled by default.

Marko Mäkelä added a comment - 2023-06-07 09:07 If I understood correctly, the workload consists solely of queries like the following: SELECT c1 FROM "table" ORDER BY c1 LIMIT 1; This looks like random I/O to me. Would setting innodb_random_read_ahead=ON help here? It is disabled by default.

Marko Mäkelä made changes - 2023-06-07 09:10

Link

This issue is blocked by ~~MDEV-30986~~ [ ~~MDEV-30986~~ ]

Marko Mäkelä made changes - 2023-06-07 09:11

Link

This issue relates to ~~MDEV-30986~~ [ ~~MDEV-30986~~ ]

Rob Schwyzer (Inactive) made changes - 2023-06-07 19:09

Assignee

Rob Schwyzer [ JIRAUSER40844 ]

Aleksey Midenkov [ midenok ]

Rob Schwyzer (Inactive) made changes - 2023-06-07 19:30

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 35428 ]

Aleksey Midenkov made changes - 2023-06-12 15:38

Status

Needs Feedback [ 10501 ]

Open [ 1 ]

Aleksey Midenkov made changes - 2023-06-12 15:38

Assignee

Aleksey Midenkov [ midenok ]

Marko Mäkelä [ marko ]

Rob Schwyzer (Inactive) made changes - 2023-06-14 21:54

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 35445 ]

Julien Fritsch made changes - 2023-06-21 06:46

Status

Open [ 1 ]

Confirmed [ 10101 ]

Rob Schwyzer (Inactive) made changes - 2023-06-21 22:56

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 35458 ]

Rob Schwyzer (Inactive) made changes - 2023-06-28 20:25

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 35471 ]

Marko Mäkelä made changes - 2023-06-29 15:46

Status

Confirmed [ 10101 ]

Open [ 1 ]

Marko Mäkelä added a comment - 2023-06-29 15:50

Would the performance of innodb_flush_method=fsync degrade to the level of innodb_flush_method=O_DIRECT if the Linux kernel’s file system cache is emptied before starting the workload? I read somewhere that the command for that should be sudo sysctl vm.drop_caches=3, but I did not try that.

If the performance of innodb_flush_method=fsync will remain better than innodb_flush_method=O_DIRECT even with this, then I would think that the file system cache will act as a cache between the InnoDB buffer pool and the storage.

How large would the file system cache be at the end of the benchmark run? To have a fair comparison, I think that for the innodb_flush_method=O_DIRECT case, the innodb_buffer_pool_size must be increased by that amount, so that both cases will have access to the same amount of memory.

Marko Mäkelä added a comment - 2023-06-29 15:50 Would the performance of innodb_flush_method=fsync degrade to the level of innodb_flush_method=O_DIRECT if the Linux kernel’s file system cache is emptied before starting the workload? I read somewhere that the command for that should be sudo sysctl vm.drop_caches=3 , but I did not try that. If the performance of innodb_flush_method=fsync will remain better than innodb_flush_method=O_DIRECT even with this, then I would think that the file system cache will act as a cache between the InnoDB buffer pool and the storage. How large would the file system cache be at the end of the benchmark run? To have a fair comparison, I think that for the innodb_flush_method=O_DIRECT case, the innodb_buffer_pool_size must be increased by that amount, so that both cases will have access to the same amount of memory.

Marko Mäkelä made changes - 2023-06-29 15:50

Assignee	Marko Mäkelä [ marko ]	Rob Schwyzer [ JIRAUSER40844 ]
Status	Open [ 1 ]	Needs Feedback [ 10501 ]

Aleksey Midenkov added a comment - 2023-07-04 13:22 - edited

After drop caches fsync is slightly worse (7.33 is good number now):

{10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3

vm.drop_caches = 3

{10.6} midenok@snsvr01:cmake$ free -h

              total        used        free      shared  buff/cache   available

Mem:           565G         10G        550G        4.0G        4.7G        549G

Swap:          127G        115M        127G

{10.6} midenok@snsvr01:cmake$ mp -q

! Started: 127393

* Q0: real: 9.50  min: 9.50  max: 9.50

! Stopped: 127393

* Server dirty stats:

faults: 155  rfaults: 17424  fsi: 1279376  fso: 184  rss_max: 226332  cpu: 88  sys: 0.56  usr: 9.19  real: 11.04  ctx0: 8  ctx1: 15353

{10.6} midenok@snsvr01:cmake$ free -h

              total        used        free      shared  buff/cache   available

Mem:           565G         10G        549G        4.0G        5.3G        549G

Swap:          127G        115M        127G

{10.6} midenok@snsvr01:cmake$ mp -q

! Started: 127686

* Q0: real: 7.33  min: 7.33  max: 7.33

! Stopped: 127686

* Server dirty stats:

rfaults: 16757  fsi: 160  fso: 184  rss_max: 229176  cpu: 95  sys: 0.43  usr: 8.00  real: 8.84  ctx0: 19  ctx1: 7081

{10.6} midenok@snsvr01:cmake$ free -h

              total        used        free      shared  buff/cache   available

Mem:           565G         10G        549G        4.0G        5.3G        549G

Swap:          127G        115M        127G

O_DIRECT is same bad:

{10.6} midenok@snsvr01:cmake$ mp -q

! Started: 128117

* Q0: real: 14.46  min: 14.46  max: 14.46

! Stopped: 128117

* Server dirty stats:

rfaults: 20930  fsi: 1300064  fso: 184  rss_max: 245112  cpu: 68  sys: 0.48  usr: 10.52  real: 15.96  ctx0: 43  ctx1: 44265

{10.6} midenok@snsvr01:cmake$ free -h

              total        used        free      shared  buff/cache   available

Mem:           565G         10G        549G        4.0G        5.3G        549G

Swap:          127G        115M        127G

Marko, I'd suggest you to draw your attention to fsi and ctx1 numbers. For O_DIRECT they are much bigger.

fsi: Number of file system inputs by the process.
ctx1: Number of times that the program was context-switched voluntarily, for instance while waiting for an I/O operation to complete.

Aleksey Midenkov added a comment - 2023-07-04 13:22 - edited After drop caches fsync is slightly worse (7.33 is good number now): {10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 vm.drop_caches = 3 {10.6} midenok@snsvr01:cmake$ free -h total used free shared buff/cache available Mem: 565G 10G 550G 4.0G 4.7G 549G Swap: 127G 115M 127G {10.6} midenok@snsvr01:cmake$ mp -q ! Started: 127393 * Q0: real: 9.50 min: 9.50 max: 9.50 ! Stopped: 127393 * Server dirty stats: faults: 155 rfaults: 17424 fsi: 1279376 fso: 184 rss_max: 226332 cpu: 88 sys: 0.56 usr: 9.19 real: 11.04 ctx0: 8 ctx1: 15353 {10.6} midenok@snsvr01:cmake$ free -h total used free shared buff/cache available Mem: 565G 10G 549G 4.0G 5.3G 549G Swap: 127G 115M 127G {10.6} midenok@snsvr01:cmake$ mp -q ! Started: 127686 * Q0: real: 7.33 min: 7.33 max: 7.33 ! Stopped: 127686 * Server dirty stats: rfaults: 16757 fsi: 160 fso: 184 rss_max: 229176 cpu: 95 sys: 0.43 usr: 8.00 real: 8.84 ctx0: 19 ctx1: 7081 {10.6} midenok@snsvr01:cmake$ free -h total used free shared buff/cache available Mem: 565G 10G 549G 4.0G 5.3G 549G Swap: 127G 115M 127G O_DIRECT is same bad: {10.6} midenok@snsvr01:cmake$ mp -q ! Started: 128117 * Q0: real: 14.46 min: 14.46 max: 14.46 ! Stopped: 128117 * Server dirty stats: rfaults: 20930 fsi: 1300064 fso: 184 rss_max: 245112 cpu: 68 sys: 0.48 usr: 10.52 real: 15.96 ctx0: 43 ctx1: 44265 {10.6} midenok@snsvr01:cmake$ free -h total used free shared buff/cache available Mem: 565G 10G 549G 4.0G 5.3G 549G Swap: 127G 115M 127G Marko, I'd suggest you to draw your attention to fsi and ctx1 numbers. For O_DIRECT they are much bigger. fsi: Number of file system inputs by the process. ctx1: Number of times that the program was context-switched voluntarily, for instance while waiting for an I/O operation to complete.

Aleksey Midenkov made changes - 2023-07-04 18:45

Assignee	Rob Schwyzer [ JIRAUSER40844 ]	Marko Mäkelä [ marko ]
Status	Needs Feedback [ 10501 ]	Open [ 1 ]

Marko Mäkelä added a comment - 2023-07-05 07:42

midenok, thank you for the update. If I interpreted the output correctly, the file system cache grew by 0.6 GiB (from 4.7GiB to 5.3GiB) during the test with innodb_flush_method=fsync (file system cache enabled). If that is the case, for the innodb_flush_method=O_DIRECT test to be fair, the innodb_buffer_pool_size should have been increased by the same amount (and the file system cache emptied before running the test).

Which server configuration parameters had been specified? I do not see any server parameters mentioned either in this ticket or in the test script itself.

In particular, had innodb_random_read_ahead=ON been set? Would the prototype of logical read-ahead (~~MDEV-30986~~) improve the O_DIRECT performance? I suppose that when the Linux kernel’s file system cache is being used, the kernel can submit some read-ahead requests on its own.

The most important one would be innodb_buffer_pool_size. The output rss_max is from time(1):

%M Maximum resident set size of the process during its
lifetime, in Kbytes.

Based on this and the amount of available memory, I would assume innodb_buffer_pool_size to be over 100 gigabytes, and hence the 0.6GiB increase of the file system cache might not make much difference. But, I do not think that the kernel file system size would be covered in this number. We would need some monitoring of the global system state.

The ctx1 and fsi are from the following time(1) attributes:

%w Number of waits: times that the program was context-
switched voluntarily, for instance while waiting for an
I/O operation to complete.
%I Number of filesystem inputs by the process.

Supposedly, copying data from the kernel’s file system cache to the memory of the process would not affect either metric.

Read latency would be tracked better by the instrumentation that ~~MDEV-31558~~ is adding.

I would expect there to be a number of performance improvements in the Linux kernel, especially in 5.x and 6.x.

Marko Mäkelä added a comment - 2023-07-05 07:42 midenok , thank you for the update. If I interpreted the output correctly, the file system cache grew by 0.6 GiB (from 4.7GiB to 5.3GiB) during the test with innodb_flush_method=fsync (file system cache enabled). If that is the case, for the innodb_flush_method=O_DIRECT test to be fair, the innodb_buffer_pool_size should have been increased by the same amount (and the file system cache emptied before running the test). Which server configuration parameters had been specified? I do not see any server parameters mentioned either in this ticket or in the test script itself. In particular, had innodb_random_read_ahead=ON been set? Would the prototype of logical read-ahead ( MDEV-30986 ) improve the O_DIRECT performance? I suppose that when the Linux kernel’s file system cache is being used, the kernel can submit some read-ahead requests on its own. The most important one would be innodb_buffer_pool_size . The output rss_max is from time(1) : %M Maximum resident set size of the process during its lifetime, in Kbytes. Based on this and the amount of available memory, I would assume innodb_buffer_pool_size to be over 100 gigabytes, and hence the 0.6GiB increase of the file system cache might not make much difference. But, I do not think that the kernel file system size would be covered in this number. We would need some monitoring of the global system state. The ctx1 and fsi are from the following time(1) attributes: %w Number of waits: times that the program was context- switched voluntarily, for instance while waiting for an I/O operation to complete. %I Number of filesystem inputs by the process. Supposedly, copying data from the kernel’s file system cache to the memory of the process would not affect either metric. Read latency would be tracked better by the instrumentation that MDEV-31558 is adding. I would expect there to be a number of performance improvements in the Linux kernel, especially in 5.x and 6.x.

Marko Mäkelä added a comment - 2023-07-05 12:57

There are two more sources of nondeterminism that should be accounted for, to prevent any unexpected "warm-up" of the buffer pool:

If the file ib_buffer_pool exists in the data directory, it must be deleted before starting the server under test.
Crash recovery, or purging the history of committed transactions: The server should previously have been shut down with innodb_fast_shutdown=0.

Marko Mäkelä added a comment - 2023-07-05 12:57 There are two more sources of nondeterminism that should be accounted for, to prevent any unexpected "warm-up" of the buffer pool: If the file ib_buffer_pool exists in the data directory, it must be deleted before starting the server under test. Crash recovery, or purging the history of committed transactions: The server should previously have been shut down with innodb_fast_shutdown=0 .

Aleksey Midenkov added a comment - 2023-07-05 13:39

innodb_buffer_pool_size did the trick.

O_DIRECT

* Q0: real: 7.52  min: 7.52  max: 7.52

! Stopped: 220861

* Server dirty stats:

rfaults: 18686  fsi: 1238336  fso: 464  rss_max: 748692  cpu: 110  sys: 1.17  usr: 8.78  real: 9.04  ctx0: 20  ctx1: 128360

fsync

* Q0: real: 7.55  min: 7.55  max: 7.55

! Stopped: 221140

* Server dirty stats:

rfaults: 19576  fsi: 2144  fso: 456  rss_max: 744456  cpu: 110  sys: 1.24  usr: 8.77  real: 9.06  ctx0: 43  ctx1: 11396

InnoDB config:

innodb_random_read_ahead=ON

innodb_buffer_pool_size=1342177280

innodb-flush-method = fsync

# innodb-flush-method = O_DSYNC

# innodb-flush-method = littlesync

# innodb-flush-method = nosync

# innodb-flush-method = O_DIRECT

# innodb-flush-method = O_DIRECT_NO_FSYNC

innodb_stats_persistent=OFF

innodb_stats_auto_recalc=OFF

Aleksey Midenkov added a comment - 2023-07-05 13:39 innodb_buffer_pool_size did the trick. O_DIRECT * Q0: real: 7.52 min: 7.52 max: 7.52 ! Stopped: 220861 * Server dirty stats: rfaults: 18686 fsi: 1238336 fso: 464 rss_max: 748692 cpu: 110 sys: 1.17 usr: 8.78 real: 9.04 ctx0: 20 ctx1: 128360 fsync * Q0: real: 7.55 min: 7.55 max: 7.55 ! Stopped: 221140 * Server dirty stats: rfaults: 19576 fsi: 2144 fso: 456 rss_max: 744456 cpu: 110 sys: 1.24 usr: 8.77 real: 9.06 ctx0: 43 ctx1: 11396 InnoDB config: innodb_random_read_ahead=ON innodb_buffer_pool_size=1342177280 innodb-flush-method = fsync # innodb-flush-method = O_DSYNC # innodb-flush-method = littlesync # innodb-flush-method = nosync # innodb-flush-method = O_DIRECT # innodb-flush-method = O_DIRECT_NO_FSYNC innodb_stats_persistent=OFF innodb_stats_auto_recalc=OFF

Aleksey Midenkov added a comment - 2023-07-05 13:47

After dropping FS cache, with the above config:

O_DIRECT

{10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3

vm.drop_caches = 3

{10.6} midenok@snsvr01:cmake$ mp -q

! Started: 222519

* Q0: real: 8.31  min: 8.31  max: 8.31

! Stopped: 222519

* Server dirty stats:

faults: 149  rfaults: 21529  fsi: 1275536  fso: 456  rss_max: 752284  cpu: 104  sys: 1.30  usr: 8.98  real: 9.80  ctx0: 50  ctx1: 117028

fsync

{10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3

vm.drop_caches = 3

{10.6} midenok@snsvr01:cmake$ mp -q

! Started: 223108

* Q0: real: 7.86  min: 7.86  max: 7.86

! Stopped: 223108

* Server dirty stats:

faults: 162  rfaults: 23074  fsi: 1280512  fso: 456  rss_max: 761780  cpu: 107  sys: 1.17  usr: 8.80  real: 9.32  ctx0: 45  ctx1: 103667

Aleksey Midenkov added a comment - 2023-07-05 13:47 After dropping FS cache, with the above config: O_DIRECT {10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 vm.drop_caches = 3 {10.6} midenok@snsvr01:cmake$ mp -q ! Started: 222519 * Q0: real: 8.31 min: 8.31 max: 8.31 ! Stopped: 222519 * Server dirty stats: faults: 149 rfaults: 21529 fsi: 1275536 fso: 456 rss_max: 752284 cpu: 104 sys: 1.30 usr: 8.98 real: 9.80 ctx0: 50 ctx1: 117028 fsync {10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 vm.drop_caches = 3 {10.6} midenok@snsvr01:cmake$ mp -q ! Started: 223108 * Q0: real: 7.86 min: 7.86 max: 7.86 ! Stopped: 223108 * Server dirty stats: faults: 162 rfaults: 23074 fsi: 1280512 fso: 456 rss_max: 761780 cpu: 107 sys: 1.17 usr: 8.80 real: 9.32 ctx0: 45 ctx1: 103667

Aleksey Midenkov added a comment - 2023-07-05 14:00

O_DIRECT innodb_random_read_ahead=OFF vs ON

innodb_random_read_ahead=OFF

innodb_random_read_ahead=OFF

innodb_buffer_pool_size=1342177280

innodb-flush-method = O_DIRECT

innodb_stats_persistent=OFF

innodb_stats_auto_recalc=OFF

{10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 && mp -q3

vm.drop_caches = 3

! Started: 245416

* Q0: real: 10.33  min: 10.33  max: 10.33

* Q1: real: 7.23  min: 7.23  max: 10.33

* Q2: real: 7.35  min: 7.23  max: 10.33

* Average for 3 queries:

real: 8.30  min: 7.23  max: 10.33

! Stopped: 245416

* Server dirty stats:

faults: 164  rfaults: 21545  fsi: 1278528  fso: 456  rss_max: 741840  cpu: 91  sys: 0.93  usr: 23.69  real: 26.91  ctx0: 60  ctx1: 87688

innodb_random_read_ahead=ON

innodb_random_read_ahead=ON

innodb_buffer_pool_size=1342177280

innodb-flush-method = O_DIRECT

innodb_stats_persistent=OFF

innodb_stats_auto_recalc=OFF

{10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 && mp -q3

vm.drop_caches = 3

! Started: 245958

* Q0: real: 8.08  min: 8.08  max: 8.08

* Q1: real: 7.40  min: 7.40  max: 8.08

* Q2: real: 7.31  min: 7.31  max: 8.08

* Average for 3 queries:

real: 7.59  min: 7.31  max: 8.08

! Stopped: 245958

* Server dirty stats:

faults: 164  rfaults: 21209  fsi: 1278112  fso: 472  rss_max: 745548  cpu: 100  sys: 1.24  usr: 23.55  real: 24.68  ctx0: 54  ctx1: 118957

Aleksey Midenkov added a comment - 2023-07-05 14:00 O_DIRECT innodb_random_read_ahead=OFF vs ON innodb_random_read_ahead=OFF innodb_random_read_ahead=OFF innodb_buffer_pool_size=1342177280 innodb-flush-method = O_DIRECT innodb_stats_persistent=OFF innodb_stats_auto_recalc=OFF {10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 && mp -q3 vm.drop_caches = 3 ! Started: 245416 * Q0: real: 10.33 min: 10.33 max: 10.33 * Q1: real: 7.23 min: 7.23 max: 10.33 * Q2: real: 7.35 min: 7.23 max: 10.33 * Average for 3 queries: real: 8.30 min: 7.23 max: 10.33 ! Stopped: 245416 * Server dirty stats: faults: 164 rfaults: 21545 fsi: 1278528 fso: 456 rss_max: 741840 cpu: 91 sys: 0.93 usr: 23.69 real: 26.91 ctx0: 60 ctx1: 87688 innodb_random_read_ahead=ON innodb_random_read_ahead=ON innodb_buffer_pool_size=1342177280 innodb-flush-method = O_DIRECT innodb_stats_persistent=OFF innodb_stats_auto_recalc=OFF {10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 && mp -q3 vm.drop_caches = 3 ! Started: 245958 * Q0: real: 8.08 min: 8.08 max: 8.08 * Q1: real: 7.40 min: 7.40 max: 8.08 * Q2: real: 7.31 min: 7.31 max: 8.08 * Average for 3 queries: real: 7.59 min: 7.31 max: 8.08 ! Stopped: 245958 * Server dirty stats: faults: 164 rfaults: 21209 fsi: 1278112 fso: 472 rss_max: 745548 cpu: 100 sys: 1.24 usr: 23.55 real: 24.68 ctx0: 54 ctx1: 118957

Aleksey Midenkov added a comment - 2023-07-05 14:04 - edited

fsync innodb_random_read_ahead=OFF vs ON

innodb_random_read_ahead=OFF

innodb_random_read_ahead=OFF

innodb_buffer_pool_size=1342177280

innodb-flush-method = fsync

innodb_stats_persistent=OFF

innodb_stats_auto_recalc=OFF

{10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 && mp -q3

vm.drop_caches = 3

! Started: 246520

* Q0: real: 8.53  min: 8.53  max: 8.53

* Q1: real: 7.19  min: 7.19  max: 8.53

* Q2: real: 7.13  min: 7.13  max: 8.53

* Average for 3 queries:

real: 7.61  min: 7.13  max: 8.53

! Stopped: 246520

* Server dirty stats:

faults: 164  rfaults: 24188  fsi: 1280376  fso: 456  rss_max: 762416  cpu: 96  sys: 0.96  usr: 23.11  real: 24.84  ctx0: 25  ctx1: 65197

innodb_random_read_ahead=ON

innodb_random_read_ahead=ON

innodb_buffer_pool_size=1342177280

innodb-flush-method = fsync

innodb_stats_persistent=OFF

innodb_stats_auto_recalc=OFF

{10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 && mp -q3

vm.drop_caches = 3

! Started: 247049

* Q0: real: 8.32  min: 8.32  max: 8.32

* Q1: real: 7.59  min: 7.59  max: 8.32

* Q2: real: 7.48  min: 7.48  max: 8.32

* Average for 3 queries:

real: 7.79  min: 7.48  max: 8.32

! Stopped: 247049

* Server dirty stats:

faults: 164  rfaults: 31906  fsi: 1281136  fso: 472  rss_max: 765080  cpu: 101  sys: 1.29  usr: 24.53  real: 25.34  ctx0: 44  ctx1: 103060

Aleksey Midenkov added a comment - 2023-07-05 14:04 - edited fsync innodb_random_read_ahead=OFF vs ON innodb_random_read_ahead=OFF innodb_random_read_ahead=OFF innodb_buffer_pool_size=1342177280 innodb-flush-method = fsync innodb_stats_persistent=OFF innodb_stats_auto_recalc=OFF {10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 && mp -q3 vm.drop_caches = 3 ! Started: 246520 * Q0: real: 8.53 min: 8.53 max: 8.53 * Q1: real: 7.19 min: 7.19 max: 8.53 * Q2: real: 7.13 min: 7.13 max: 8.53 * Average for 3 queries: real: 7.61 min: 7.13 max: 8.53 ! Stopped: 246520 * Server dirty stats: faults: 164 rfaults: 24188 fsi: 1280376 fso: 456 rss_max: 762416 cpu: 96 sys: 0.96 usr: 23.11 real: 24.84 ctx0: 25 ctx1: 65197 innodb_random_read_ahead=ON innodb_random_read_ahead=ON innodb_buffer_pool_size=1342177280 innodb-flush-method = fsync innodb_stats_persistent=OFF innodb_stats_auto_recalc=OFF {10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 && mp -q3 vm.drop_caches = 3 ! Started: 247049 * Q0: real: 8.32 min: 8.32 max: 8.32 * Q1: real: 7.59 min: 7.59 max: 8.32 * Q2: real: 7.48 min: 7.48 max: 8.32 * Average for 3 queries: real: 7.79 min: 7.48 max: 8.32 ! Stopped: 247049 * Server dirty stats: faults: 164 rfaults: 31906 fsi: 1281136 fso: 472 rss_max: 765080 cpu: 101 sys: 1.29 usr: 24.53 real: 25.34 ctx0: 44 ctx1: 103060

Marko Mäkelä added a comment - 2023-07-05 17:19

Thank you! One of my suspicions was that this was a case of the working set exceeding innodb_buffer_pool_size, and then having all "excess" RAM acting as a cache between the durable storage and InnoDB, or not having any kernel file system cache at all.

Here is one more resource that I just came across: the README.md file of the nocache tool explains how you might be able to limit the size of the Linux kernel’s file system cache while a process is running. It might be useful for a valid comparison when using a smaller InnoDB buffer pool.

Is there anything that needs to be changed in the code for this particular ticket? Out of curiosity, what was the original innodb_buffer_pool_size? The default 128MiB (vs. 1280MiB for the updated results)?

Marko Mäkelä added a comment - 2023-07-05 17:19 Thank you! One of my suspicions was that this was a case of the working set exceeding innodb_buffer_pool_size , and then having all "excess" RAM acting as a cache between the durable storage and InnoDB, or not having any kernel file system cache at all. Here is one more resource that I just came across: the README.md file of the nocache tool explains how you might be able to limit the size of the Linux kernel’s file system cache while a process is running. It might be useful for a valid comparison when using a smaller InnoDB buffer pool. Is there anything that needs to be changed in the code for this particular ticket? Out of curiosity, what was the original innodb_buffer_pool_size ? The default 128MiB (vs. 1280MiB for the updated results)?

Rob Schwyzer (Inactive) made changes - 2023-07-05 20:25

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 35496 ]

Marko Mäkelä made changes - 2023-07-06 08:01

Assignee	Marko Mäkelä [ marko ]	Rob Schwyzer [ JIRAUSER40844 ]
Status	Open [ 1 ]	Needs Feedback [ 10501 ]

Aleksey Midenkov added a comment - 2023-07-06 13:28

> The default 128MiB (vs. 1280MiB for the updated results)?

Right, it was the default.

Aleksey Midenkov added a comment - 2023-07-06 13:28 > The default 128MiB (vs. 1280MiB for the updated results)? Right, it was the default.

Aleksey Midenkov made changes - 2023-07-06 13:29

Assignee

Rob Schwyzer [ JIRAUSER40844 ]

Aleksey Midenkov [ midenok ]

Aleksey Midenkov made changes - 2023-07-06 13:29

Status

Needs Feedback [ 10501 ]

Open [ 1 ]

Aleksey Midenkov made changes - 2023-07-06 13:29

Component/s		Storage Engine - InnoDB [ 10129 ]
Fix Version/s		N/A [ 14700 ]
Fix Version/s	10.6 [ 24028 ]
Resolution		Not a Bug [ 6 ]
Status	Open [ 1 ]	Closed [ 6 ]

Marko Mäkelä added a comment - 2023-07-07 08:16

In ~~MDEV-30986~~ the following approach worked for me. The statement under test would access about 400MiB of data.

mkdir /var/tmp/var && rm -fr var && ln -s /var/tmp/var var

./mtr --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 name_of_test

./mtr --mysqld=--innodb-buffer-pool-size=400m --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 name_of_test

sudo sysctl vm.drop_caches=3

systemd-run --scope --property=MemoryLimit=250M -- ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 name_of_test

sudo sysctl vm.drop_caches=3

systemd-run --scope --property=MemoryLimit=550M -- ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-buffer-pool-size=400m --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 name_of_test

For that test, I got the best read performance (for either buffer pool size) by using the default innodb_flush_method=O_DIRECT and enabling innodb_random_read_ahead=ON.

Marko Mäkelä added a comment - 2023-07-07 08:16 In MDEV-30986 the following approach worked for me. The statement under test would access about 400MiB of data. mkdir /var/tmp/var && rm -fr var && ln -s /var/tmp/var var ./mtr --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 name_of_test ./mtr --mysqld=--innodb-buffer-pool-size=400m --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 name_of_test sudo sysctl vm.drop_caches=3 systemd-run --scope --property=MemoryLimit=250M -- ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 name_of_test sudo sysctl vm.drop_caches=3 systemd-run --scope --property=MemoryLimit=550M -- ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-buffer-pool-size=400m --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 name_of_test For that test, I got the best read performance (for either buffer pool size) by using the default innodb_flush_method=O_DIRECT and enabling innodb_random_read_ahead=ON .

Rob Schwyzer (Inactive) made changes - 2023-07-07 17:21

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 35293 ]

Marko Mäkelä made changes - 2023-08-15 07:32

Link

This issue duplicates MDEV-27772 [ MDEV-27772 ]

Jira Automation (IT) made changes - 2024-07-04 01:12

Zendesk Related Tickets		201658
Zendesk active tickets		201658

Marko Mäkelä made changes - 2025-03-10 14:13

Link

This issue relates to ~~MDEV-35813~~ [ ~~MDEV-35813~~ ]

People

Assignee:: Aleksey Midenkov

Reporter:: Rob Schwyzer (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2023-05-09 23:35

Updated:: 2025-03-10 14:13

Resolved:: 2023-07-06 13:29

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

O_DIRECT innodb_random_read_ahead=OFF vs ON

innodb_random_read_ahead=OFF

innodb_random_read_ahead=ON

fsync innodb_random_read_ahead=OFF vs ON

innodb_random_read_ahead=OFF

innodb_random_read_ahead=ON

People

Dates

Git Integration