Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31227

innodb_flush_method=O_DIRECT causes 3x regression in workload

Details

    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Not a Bug
    • 10.6.12, 10.6.13
    • N/A
    • None
    • CentOS 7, kernel 3.10

    Description

      Repro procedure is the same as MDEV-30567. Additional details for that in the comments.

      In short, this is the performance discrepancy we're seeing in 10.6.13 (10.6.12 performs similarly but slightly worse)-

      # Default (O_DIRECT)
      # * Q0: real: 14.21  min: 14.21  max: 14.21
       
      # innodb-flush-method = fsync
      # * Q0: real: 4.94  min: 4.94  max: 4.94
       
      # innodb-flush-method = O_DSYNC
      # * Q0: real: 4.92  min: 4.92  max: 4.92
       
      # innodb-flush-method = littlesync
      # * Q0: real: 4.91  min: 4.91  max: 4.91
       
      # innodb-flush-method = nosync
      # * Q0: real: 4.95  min: 4.95  max: 4.95
       
      # innodb-flush-method = O_DIRECT
      # * Q0: real: 14.21  min: 14.21  max: 14.21
       
      # innodb-flush-method = O_DIRECT_NO_FSYNC
      # * Q0: real: 12.81  min: 12.81  max: 12.81

      The customer for whom we are concerned about this issue for is currently using 10.4 where innodb_flush_method=fsync is the default. For 10.6 though, this changes to O_DIRECT. For now, we're advising the customer to hard-specify innodb_flush_method=fsync for 10.6. However, we know that O_DIRECT was made default in 10.6 as it generally improves over fsync, so we'd like to know if the regression we're seeing here is due to a bug that, if fixed, would make O_DIRECT the all-around ideal it is intended to be.

      Attachments

        Issue Links

          Activity

            O_DIRECT innodb_random_read_ahead=OFF vs ON

            innodb_random_read_ahead=OFF

            innodb_random_read_ahead=OFF
            innodb_buffer_pool_size=1342177280
            innodb-flush-method = O_DIRECT
            innodb_stats_persistent=OFF
            innodb_stats_auto_recalc=OFF
            

            {10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 && mp -q3
            vm.drop_caches = 3
            ! Started: 245416
            * Q0: real: 10.33  min: 10.33  max: 10.33
            * Q1: real: 7.23  min: 7.23  max: 10.33
            * Q2: real: 7.35  min: 7.23  max: 10.33
            * Average for 3 queries:
            real: 8.30  min: 7.23  max: 10.33
            ! Stopped: 245416
            * Server dirty stats:
            faults: 164  rfaults: 21545  fsi: 1278528  fso: 456  rss_max: 741840  cpu: 91  sys: 0.93  usr: 23.69  real: 26.91  ctx0: 60  ctx1: 87688
            

            innodb_random_read_ahead=ON

            innodb_random_read_ahead=ON
            innodb_buffer_pool_size=1342177280
            innodb-flush-method = O_DIRECT
            innodb_stats_persistent=OFF
            innodb_stats_auto_recalc=OFF
            

            {10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 && mp -q3
            vm.drop_caches = 3
            ! Started: 245958
            * Q0: real: 8.08  min: 8.08  max: 8.08
            * Q1: real: 7.40  min: 7.40  max: 8.08
            * Q2: real: 7.31  min: 7.31  max: 8.08
            * Average for 3 queries:
            real: 7.59  min: 7.31  max: 8.08
            ! Stopped: 245958
            * Server dirty stats:
            faults: 164  rfaults: 21209  fsi: 1278112  fso: 472  rss_max: 745548  cpu: 100  sys: 1.24  usr: 23.55  real: 24.68  ctx0: 54  ctx1: 118957
            

            midenok Aleksey Midenkov added a comment - O_DIRECT innodb_random_read_ahead=OFF vs ON innodb_random_read_ahead=OFF innodb_random_read_ahead=OFF innodb_buffer_pool_size=1342177280 innodb-flush-method = O_DIRECT innodb_stats_persistent=OFF innodb_stats_auto_recalc=OFF {10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 && mp -q3 vm.drop_caches = 3 ! Started: 245416 * Q0: real: 10.33 min: 10.33 max: 10.33 * Q1: real: 7.23 min: 7.23 max: 10.33 * Q2: real: 7.35 min: 7.23 max: 10.33 * Average for 3 queries: real: 8.30 min: 7.23 max: 10.33 ! Stopped: 245416 * Server dirty stats: faults: 164 rfaults: 21545 fsi: 1278528 fso: 456 rss_max: 741840 cpu: 91 sys: 0.93 usr: 23.69 real: 26.91 ctx0: 60 ctx1: 87688 innodb_random_read_ahead=ON innodb_random_read_ahead=ON innodb_buffer_pool_size=1342177280 innodb-flush-method = O_DIRECT innodb_stats_persistent=OFF innodb_stats_auto_recalc=OFF {10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 && mp -q3 vm.drop_caches = 3 ! Started: 245958 * Q0: real: 8.08 min: 8.08 max: 8.08 * Q1: real: 7.40 min: 7.40 max: 8.08 * Q2: real: 7.31 min: 7.31 max: 8.08 * Average for 3 queries: real: 7.59 min: 7.31 max: 8.08 ! Stopped: 245958 * Server dirty stats: faults: 164 rfaults: 21209 fsi: 1278112 fso: 472 rss_max: 745548 cpu: 100 sys: 1.24 usr: 23.55 real: 24.68 ctx0: 54 ctx1: 118957
            midenok Aleksey Midenkov added a comment - - edited

            fsync innodb_random_read_ahead=OFF vs ON

            innodb_random_read_ahead=OFF

            innodb_random_read_ahead=OFF
            innodb_buffer_pool_size=1342177280
            innodb-flush-method = fsync
            innodb_stats_persistent=OFF
            innodb_stats_auto_recalc=OFF
            

            {10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 && mp -q3
            vm.drop_caches = 3
            ! Started: 246520
            * Q0: real: 8.53  min: 8.53  max: 8.53
            * Q1: real: 7.19  min: 7.19  max: 8.53
            * Q2: real: 7.13  min: 7.13  max: 8.53
            * Average for 3 queries:
            real: 7.61  min: 7.13  max: 8.53
            ! Stopped: 246520
            * Server dirty stats:
            faults: 164  rfaults: 24188  fsi: 1280376  fso: 456  rss_max: 762416  cpu: 96  sys: 0.96  usr: 23.11  real: 24.84  ctx0: 25  ctx1: 65197
            

            innodb_random_read_ahead=ON

            innodb_random_read_ahead=ON
            innodb_buffer_pool_size=1342177280
            innodb-flush-method = fsync
            innodb_stats_persistent=OFF
            innodb_stats_auto_recalc=OFF
            

            {10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 && mp -q3
            vm.drop_caches = 3
            ! Started: 247049
            * Q0: real: 8.32  min: 8.32  max: 8.32
            * Q1: real: 7.59  min: 7.59  max: 8.32
            * Q2: real: 7.48  min: 7.48  max: 8.32
            * Average for 3 queries:
            real: 7.79  min: 7.48  max: 8.32
            ! Stopped: 247049
            * Server dirty stats:
            faults: 164  rfaults: 31906  fsi: 1281136  fso: 472  rss_max: 765080  cpu: 101  sys: 1.29  usr: 24.53  real: 25.34  ctx0: 44  ctx1: 103060
            

            midenok Aleksey Midenkov added a comment - - edited fsync innodb_random_read_ahead=OFF vs ON innodb_random_read_ahead=OFF innodb_random_read_ahead=OFF innodb_buffer_pool_size=1342177280 innodb-flush-method = fsync innodb_stats_persistent=OFF innodb_stats_auto_recalc=OFF {10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 && mp -q3 vm.drop_caches = 3 ! Started: 246520 * Q0: real: 8.53 min: 8.53 max: 8.53 * Q1: real: 7.19 min: 7.19 max: 8.53 * Q2: real: 7.13 min: 7.13 max: 8.53 * Average for 3 queries: real: 7.61 min: 7.13 max: 8.53 ! Stopped: 246520 * Server dirty stats: faults: 164 rfaults: 24188 fsi: 1280376 fso: 456 rss_max: 762416 cpu: 96 sys: 0.96 usr: 23.11 real: 24.84 ctx0: 25 ctx1: 65197 innodb_random_read_ahead=ON innodb_random_read_ahead=ON innodb_buffer_pool_size=1342177280 innodb-flush-method = fsync innodb_stats_persistent=OFF innodb_stats_auto_recalc=OFF {10.6} midenok@snsvr01:cmake$ sudo sysctl vm.drop_caches=3 && mp -q3 vm.drop_caches = 3 ! Started: 247049 * Q0: real: 8.32 min: 8.32 max: 8.32 * Q1: real: 7.59 min: 7.59 max: 8.32 * Q2: real: 7.48 min: 7.48 max: 8.32 * Average for 3 queries: real: 7.79 min: 7.48 max: 8.32 ! Stopped: 247049 * Server dirty stats: faults: 164 rfaults: 31906 fsi: 1281136 fso: 472 rss_max: 765080 cpu: 101 sys: 1.29 usr: 24.53 real: 25.34 ctx0: 44 ctx1: 103060

            Thank you! One of my suspicions was that this was a case of the working set exceeding innodb_buffer_pool_size, and then having all "excess" RAM acting as a cache between the durable storage and InnoDB, or not having any kernel file system cache at all.

            Here is one more resource that I just came across: the README.md file of the nocache tool explains how you might be able to limit the size of the Linux kernel’s file system cache while a process is running. It might be useful for a valid comparison when using a smaller InnoDB buffer pool.

            Is there anything that needs to be changed in the code for this particular ticket? Out of curiosity, what was the original innodb_buffer_pool_size? The default 128MiB (vs. 1280MiB for the updated results)?

            marko Marko Mäkelä added a comment - Thank you! One of my suspicions was that this was a case of the working set exceeding innodb_buffer_pool_size , and then having all "excess" RAM acting as a cache between the durable storage and InnoDB, or not having any kernel file system cache at all. Here is one more resource that I just came across: the README.md file of the nocache tool explains how you might be able to limit the size of the Linux kernel’s file system cache while a process is running. It might be useful for a valid comparison when using a smaller InnoDB buffer pool. Is there anything that needs to be changed in the code for this particular ticket? Out of curiosity, what was the original innodb_buffer_pool_size ? The default 128MiB (vs. 1280MiB for the updated results)?

            > The default 128MiB (vs. 1280MiB for the updated results)?

            Right, it was the default.

            midenok Aleksey Midenkov added a comment - > The default 128MiB (vs. 1280MiB for the updated results)? Right, it was the default.

            In MDEV-30986 the following approach worked for me. The statement under test would access about 400MiB of data.

            mkdir /var/tmp/var && rm -fr var && ln -s /var/tmp/var var
            ./mtr --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 name_of_test
            ./mtr --mysqld=--innodb-buffer-pool-size=400m --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 name_of_test
            sudo sysctl vm.drop_caches=3
            systemd-run --scope --property=MemoryLimit=250M -- ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 name_of_test
            sudo sysctl vm.drop_caches=3
            systemd-run --scope --property=MemoryLimit=550M -- ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-buffer-pool-size=400m --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 name_of_test
            

            For that test, I got the best read performance (for either buffer pool size) by using the default innodb_flush_method=O_DIRECT and enabling innodb_random_read_ahead=ON.

            marko Marko Mäkelä added a comment - In MDEV-30986 the following approach worked for me. The statement under test would access about 400MiB of data. mkdir /var/tmp/var && rm -fr var && ln -s /var/tmp/var var ./mtr --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 name_of_test ./mtr --mysqld=--innodb-buffer-pool-size=400m --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 name_of_test sudo sysctl vm.drop_caches=3 systemd-run --scope --property=MemoryLimit=250M -- ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 name_of_test sudo sysctl vm.drop_caches=3 systemd-run --scope --property=MemoryLimit=550M -- ./mtr --mysqld=--innodb-flush-method=fsync --mysqld=--innodb-buffer-pool-size=400m --mysqld=--innodb-buffer-pool-dump-at-shutdown=0 name_of_test For that test, I got the best read performance (for either buffer pool size) by using the default innodb_flush_method=O_DIRECT and enabling innodb_random_read_ahead=ON .

            People

              midenok Aleksey Midenkov
              rob.schwyzer@mariadb.com Rob Schwyzer (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.