[MDEV-25341] innodb buffer pool soft decommit of memory Created: 2021-04-05  Updated: 2023-08-18  Resolved: 2022-09-12

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Fix Version/s: 10.11.0

Type: Task Priority: Major
Reporter: Daniel Black Assignee: Daniel Black
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
causes MDEV-31953 madvise(frame, srv_page_size, MADV_FR... Closed
Relates
relates to MDEV-29429 "free" innodb pages from dropped tabl... Open
relates to MDEV-29431 SQL interface and Storage Engine inte... Open
relates to MDEV-29432 innodb huge pages reclaim Open
relates to MDEV-29445 reorganise innodb buffer pool (and re... Stalled
relates to MDEV-24670 avoid OOM by linux kernel co-operativ... Closed
relates to MDEV-25340 Server startup with large innodb_buff... Open
relates to MDEV-25342 autosize innodb_buffer_pool_chunk_size Closed

 Description   

Innodb buffer can fill up on things like during a mariadb-dump.

The logically successor to MDEV-25340 to keep only the memory required for innodb actually mapped is as part of the LRU to purge off old pages if they really haven't been used in a while. These will be still allocated to mariadbd, however in a virtual non-paged in sense.

This can be done on linux with madvise(MADV_FREE) (Linux and FreeBSD). Windows has VirtualFree MEM_DECOMMIT/MEM_RESET.

These a proactive releases, and them mechanisms implemented can be reused to serve reactive mechanism like MDEV-24670.



 Comments   
Comment by Daniel Black [ 2022-08-12 ]

Used MADV_FREE. (which will fail on huge memory) on a page by page basis.

MEM_RESET appears good, but would require MEM_RESET_UNDO on allocation again. Happy to include this too.
https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualalloc

Comment by Marko Mäkelä [ 2022-08-12 ]

The change so far looks OK to me. For huge pages, I think that we could add something to actively free pages. Should there be some interface, say, a SET GLOBAL variable, to prompt immediate eviction of pages that are no longer useful? The choice of actions could be something like the following:

  • Evict all pages that became useless after DROP TABLE or a table rebuild.
  • Evict not-recently-used pages that are in a clean state and not in use.
  • Evict not-recently-used pages that not in use.

Related to the "not-recently-used" above, there probably should be "minimum buffer pool size" parameter, so that some of the buffer pool could always be preserved. The usage scenario for that could be that most of the time, the working set is small, but occasionally more memory will be needed for running a larger query.

The retained pages could be relocated together, so that there could be a chance to free a number of adjacent buffers (entire huge pages) at once. Relocation is not possible if a page latch or a reference is being held by some thread.

Comment by Matthias Leich [ 2022-08-16 ]

origin/bb-10.11-danielblack-MDEV-25341-innodb-politeness 6b76c3313fc3936f1b26aea6c2d8c5b9841bb042 2022-08-14T10:12:12+10:00
performed well in RQG testing. Failures observed are in the official trees too.

Comment by Marko Mäkelä [ 2022-08-31 ]

This looks OK to me. Let mleich test the updated branch once more.

Comment by Daniel Black [ 2022-09-03 ]

Test run 1:

Test of updated pr, 30G, 16 vcpu laptop:

Stared in default configuration with 20G innodb buffer pool with datadir on tmpfs.

prepared with sysbench --mysql-user=dan --mysql-db=test --mysql-socket=/tmp/build-mariadb-server-10.11.sock --threads=16 --time=600 --report-interval=20 --tables=32 --table_size=1000000 oltp_read_write prepare, though restarted the server after prepare to run.

On cold started, server started with 273 maps and 5.69G mapped in:

start of test

$ while [ 1 ] ; do echo "$(date) pages: $(wc -l < /proc/$(pidof mysqld)/maps) DRS,TRS,VSZ,RSS  $(ps  -C mysqld -o drs,trs,vsz,rss | tail -n 1)"; sleep 1 ; done
Sat 03 Sep 2022 09:56:57 AEST pages: 273 DRS,TRS,VSZ,RSS  24585907 9156 24595064 5691376
Sat 03 Sep 2022 09:56:58 AEST pages: 273 DRS,TRS,VSZ,RSS  24585907 9156 24595064 5691376
Sat 03 Sep 2022 09:56:59 AEST pages: 273 DRS,TRS,VSZ,RSS  24585907 9156 24595064 5691376
Sat 03 Sep 2022 09:57:00 AEST pages: 273 DRS,TRS,VSZ,RSS  24585907 9156 24595064 5691376
Sat 03 Sep 2022 09:57:01 AEST pages: 273 DRS,TRS,VSZ,RSS  24585907 9156 24595064 5691376

Sat 03 Sep 2022 09:49:00 AEST $sysbench --mysql-user=dan --mysql-db=test --mysql-socket=/tmp/build-mariadb-server-10.11.sock  --threads=16 --time=600 --report-interval=20 --tables=32 --table_size=1000000 oltp_read_write run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
 
Running the test with following options:
Number of threads: 16
Report intermediate results every 20 second(s)
Initializing random number generator from current time
 
 
Initializing worker threads...
 
Threads started!
 
[ 20s ] thds: 16 tps: 9096.27 qps: 181935.92 (r/w/o: 127357.04/36385.53/18193.34) lat (ms,95%): 2.91 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 16 tps: 8904.07 qps: 178079.45 (r/w/o: 124655.16/35616.14/17808.14) lat (ms,95%): 3.07 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: 16 tps: 8853.59 qps: 177073.10 (r/w/o: 123951.41/35414.51/17707.18) lat (ms,95%): 3.07 err/s: 0.00 reconn/s: 0.00
[ 80s ] thds: 16 tps: 8864.84 qps: 177294.25 (r/w/o: 124105.51/35459.06/17729.68) lat (ms,95%): 3.07 err/s: 0.00 reconn/s: 0.00
[ 100s ] thds: 16 tps: 9133.53 qps: 182673.78 (r/w/o: 127872.70/36534.02/18267.06) lat (ms,95%): 2.97 err/s: 0.00 reconn/s: 0.00
[ 120s ] thds: 16 tps: 9184.83 qps: 183696.48 (r/w/o: 128587.20/36739.62/18369.66) lat (ms,95%): 2.97 err/s: 0.00 reconn/s: 0.00
[ 140s ] thds: 16 tps: 9212.69 qps: 184249.44 (r/w/o: 128973.62/36850.48/18425.34) lat (ms,95%): 2.97 err/s: 0.00 reconn/s: 0.00
[ 160s ] thds: 16 tps: 9239.05 qps: 184782.91 (r/w/o: 129348.30/36956.45/18478.15) lat (ms,95%): 2.97 err/s: 0.00 reconn/s: 0.00
[ 180s ] thds: 16 tps: 9213.78 qps: 184275.92 (r/w/o: 128993.06/36855.30/18427.55) lat (ms,95%): 2.97 err/s: 0.00 reconn/s: 0.00
[ 200s ] thds: 16 tps: 9180.36 qps: 183607.59 (r/w/o: 128525.48/36721.39/18360.72) lat (ms,95%): 2.97 err/s: 0.00 reconn/s: 0.00
[ 220s ] thds: 16 tps: 9192.66 qps: 183852.76 (r/w/o: 128697.00/36770.49/18385.27) lat (ms,95%): 2.97 err/s: 0.00 reconn/s: 0.00
[ 240s ] thds: 16 tps: 9265.44 qps: 185306.56 (r/w/o: 129714.25/37061.44/18530.87) lat (ms,95%): 2.97 err/s: 0.00 reconn/s: 0.00

Sysbench started at a consistent rate.

At this point the buffer pool was:

$mysql -u dan -S /tmp/build-mariadb-server-10.11.sock -e 'show global status like "innodb_buffer_pool_pages%"'
+-----------------------------------------+---------+
| Variable_name                           | Value   |
+-----------------------------------------+---------+
| Innodb_buffer_pool_pages_data           | 442706  |
| Innodb_buffer_pool_pages_dirty          | 88191   |
| Innodb_buffer_pool_pages_flushed        | 2960918 |
| Innodb_buffer_pool_pages_free           | 855278  |
| Innodb_buffer_pool_pages_made_not_young | 542455  |
| Innodb_buffer_pool_pages_made_young     | 902580  |
| Innodb_buffer_pool_pages_misc           | 0       |
| Innodb_buffer_pool_pages_old            | 163400  |
| Innodb_buffer_pool_pages_total          | 1297984 |
| Innodb_buffer_pool_pages_lru_flushed    | 0       |
| Innodb_buffer_pool_pages_lru_freed      | 0       |
+-----------------------------------------+---------+
 
 
$mysql -u dan -S /tmp/build-mariadb-server-10.11.sock -e 'set global innodb_max_dirty_pages_pct=0'

At this time the sysbench tps stayed the same:

[ 260s ] thds: 16 tps: 9234.88 qps: 184701.26 (r/w/o: 129291.76/36939.68/18469.82) lat (ms,95%): 2.97 err/s: 0.00 reconn/s: 0.00
[ 280s ] thds: 16 tps: 9167.76 qps: 183354.80 (r/w/o: 128347.84/36671.49/18335.47) lat (ms,95%): 2.97 err/s: 0.00 reconn/s: 0.00
[ 300s ] thds: 16 tps: 9199.79 qps: 183995.78 (r/w/o: 128797.35/36798.81/18399.63) lat (ms,95%): 2.97 err/s: 0.00 reconn/s: 0.00
[ 320s ] thds: 16 tps: 9085.71 qps: 181714.43 (r/w/o: 127200.09/36342.93/18171.41) lat (ms,95%): 2.97 err/s: 0.00 reconn/s: 0.00
[ 340s ] thds: 16 tps: 9198.14 qps: 183962.28 (r/w/o: 128773.36/36792.63/18396.29) lat (ms,95%): 2.97 err/s: 0.00 reconn/s: 0.00
[ 360s ] thds: 16 tps: 9170.48 qps: 183407.56 (r/w/o: 128384.71/36681.90/18340.95) lat (ms,95%): 2.97 err/s: 0.00 reconn/s: 0.00

And no big changes in virtual mapping of pages/rss memory:

Sat 03 Sep 2022 10:01:54 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 8061700
Sat 03 Sep 2022 10:01:55 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 8070676
Sat 03 Sep 2022 10:01:56 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 8078068
Sat 03 Sep 2022 10:01:57 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 8087564
Sat 03 Sep 2022 10:01:58 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 8099440
Sat 03 Sep 2022 10:01:59 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 8108680
Sat 03 Sep 2022 10:02:00 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 8117392
Sat 03 Sep 2022 10:02:01 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 8125312
Sat 03 Sep 2022 10:02:02 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 8133760
Sat 03 Sep 2022 10:02:03 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 8142208
Sat 03 Sep 2022 10:02:04 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 8150656
Sat 03 Sep 2022 10:02:05 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 8158840
Sat 03 Sep 2022 10:02:06 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 8167552

Just before the next step the innodb buffer pool was:

mysql -u dan -S /tmp/build-mariadb-server-10.11.sock -e 'show global status like "innodb_buffer_pool_pages%"'
+-----------------------------------------+---------+
| Variable_name                           | Value   |
+-----------------------------------------+---------+
| Innodb_buffer_pool_pages_data           | 518717  |
| Innodb_buffer_pool_pages_dirty          | 88487   |
| Innodb_buffer_pool_pages_flushed        | 4978087 |
| Innodb_buffer_pool_pages_free           | 779267  |
| Innodb_buffer_pool_pages_made_not_young | 585320  |
| Innodb_buffer_pool_pages_made_young     | 1172425 |
| Innodb_buffer_pool_pages_misc           | 0       |
| Innodb_buffer_pool_pages_old            | 191459  |
| Innodb_buffer_pool_pages_total          | 1297984 |
| Innodb_buffer_pool_pages_lru_flushed    | 0       |
| Innodb_buffer_pool_pages_lru_freed      | 0       |
+-----------------------------------------+---------+

This is the point where memory was squeezed - 26G of locked memory:

$sudo ./target/release/eatmemory -m --alloc-definition=1 -s 26000
Killed

Killed from from the OOM killer

reduce rss size quickly until server was killed

Sat 03 Sep 2022 10:03:57 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 9029708
Sat 03 Sep 2022 10:03:58 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 8214668
Sat 03 Sep 2022 10:03:59 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 6950240
Sat 03 Sep 2022 10:04:01 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 5461480
Sat 03 Sep 2022 10:04:02 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4832316
Sat 03 Sep 2022 10:04:03 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4605364
Sat 03 Sep 2022 10:04:05 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4587420
Sat 03 Sep 2022 10:04:06 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4559304
Sat 03 Sep 2022 10:04:07 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4538528
Sat 03 Sep 2022 10:04:08 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4489348
Sat 03 Sep 2022 10:04:10 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4446184
Sat 03 Sep 2022 10:04:13 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4385052
Sat 03 Sep 2022 10:04:14 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4345644
Sat 03 Sep 2022 10:04:15 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4296496
Sat 03 Sep 2022 10:04:17 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4226108
Sat 03 Sep 2022 10:04:19 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4157748
Sat 03 Sep 2022 10:04:21 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4179292
Sat 03 Sep 2022 10:04:22 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4188908
Sat 03 Sep 2022 10:04:23 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4196520
Sat 03 Sep 2022 10:04:24 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4209808
Sat 03 Sep 2022 10:04:26 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4235948
Sat 03 Sep 2022 10:04:27 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4253008
Sat 03 Sep 2022 10:04:28 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4262616
Sat 03 Sep 2022 10:04:29 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4280436
Sat 03 Sep 2022 10:04:30 AEST pages: 339 DRS,TRS,VSZ,RSS  25639583 9156 25648740 4302776
bash: /proc//maps: No such file or directory

expected sysbench dropoff

[ 380s ] thds: 16 tps: 8508.37 qps: 170170.18 (r/w/o: 119120.25/34033.19/17016.74) lat (ms,95%): 3.19 err/s: 0.00 reconn/s: 0.00
[ 400s ] thds: 16 tps: 1358.79 qps: 27175.66 (r/w/o: 19022.89/5435.20/2717.58) lat (ms,95%): 44.98 err/s: 0.00 reconn/s: 0.00
FATAL: mysql_stmt_execute() returned error 2013 (Lost connection to server during query) for query 'SELECT SUM(k) FROM sbtest10 WHERE id BETWEEN ? AND ?'

From early in the test perf results happened:

$ perf record -g -p $(pidof mysqld) -F 16
[ perf record: Woken up 5732801 times to write data ]
[ perf record: Captured and wrote 18.911 MB perf.data (82199 samples) ]

Of all the samples a few notable functions related to this PR:

     0.00%     0.00%  mysqld  mariadbd  [.] buf_LRU_get_free_only    
 
buf_LRU_block_free_non_file_page, set_os_used, set_os_unused (didn't show up in any samples)
 
     0.00%     0.00%  mysqld  libc.so.6  [.] __madvise              

OOM killer:

dmesg

[62583.817703] perf: interrupt took too long (5130 > 5031), lowering kernel.perf_event_max_sample_rate to 38000
[62594.988081] perf: interrupt took too long (6425 > 6412), lowering kernel.perf_event_max_sample_rate to 31000
[63089.747836] Purging GPU memory, 0 pages freed, 0 pages still pinned, 2112 pages left available.
[63089.747919] slack invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=200
[63089.747922] CPU: 4 PID: 3771 Comm: slack Tainted: G        W         -------  ---  6.0.0-0.rc2.19.fc38.x86_64 #1
[63089.747924] Hardware name: Acer Nitro AN515-57/Scala_TLS, BIOS V1.17 06/02/2022
[63089.747925] Call Trace:
[63089.747926]  <TASK>
[63089.747928]  dump_stack_lvl+0x44/0x5c
[63089.747932]  dump_header+0x4a/0x1ff
[63089.747934]  oom_kill_process.cold+0xb/0x10
[63089.747936]  out_of_memory+0x1be/0x4f0
[63089.747939]  __alloc_pages_slowpath.constprop.0+0xc17/0xd50
[63089.747943]  __alloc_pages+0x201/0x220
[63089.747944]  __folio_alloc+0x17/0x50
[63089.747946]  ? policy_node+0x51/0x70
[63089.747948]  vma_alloc_folio+0x88/0x320
[63089.747949]  do_swap_page+0x7ac/0xb40
[63089.747952]  ? psi_task_change+0x83/0x120
[63089.747955]  ? __mod_node_page_state+0x72/0xc0
[63089.747957]  __handle_mm_fault+0x837/0xe40
[63089.747959]  handle_mm_fault+0xae/0x290
[63089.747961]  do_user_addr_fault+0x1cd/0x690
[63089.747964]  exc_page_fault+0x70/0x170
[63089.747966]  asm_exc_page_fault+0x22/0x30

as expected user space faulting of memory address caused it (eatmemory):

[63089.748021] Mem-Info:
[63089.748021] active_anon:25995 inactive_anon:28763 isolated_anon:0
                active_file:1535 inactive_file:1417 isolated_file:0
                unevictable:6585743 dirty:0 writeback:3
                slab_reclaimable:40034 slab_unreclaimable:80365
                mapped:3943 shmem:26303 pagetables:22102 bounce:0
                kernel_misc_reclaimable:0
                free:50492 free_pcp:63 free_cma:0

Decent reclaimable amount.

[63089.748056] 34000 total pagecache pages
[63089.748056] 4267 pages in swap cache
[63089.748057] Free swap  = 2264564kB
[63089.748057] Total swap = 16502776kB
[63089.748058] 8332110 pages RAM
[63089.748058] 0 pages HighMem/MovableOnly
[63089.748058] 171770 pages reserved
[63089.748059] 0 pages cma reserved
[63089.748059] 0 pages hwpoisoned
[63089.748059] Tasks state (memory values in pages):
...
...
[63089.748296] [ 171566]     0 171566  6657181  6584482 52830208       39           200 eatmemory
[63089.748298] [ 172293]     0 172293    55367      106    69632       25             0 sleep
[63089.748299] [ 172298]  1000 172298    91841      592   368640    25168           200 perf
[63089.748300] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-7b4a5622-7723-4011-9c12-8a95251f2509.scope,task=eatmemory,pid=171566,uid=0
[63089.748310] Out of memory: Killed process 171566 (eatmemory) total-vm:26628724kB, anon-rss:26336484kB, file-rss:1444kB, shmem-rss:0kB, UID:0 pgtables:51592kB oom_score_adj:200
[63091.796602] oom_reaper: reaped process 171566 (eatmemory), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

(oddly didn't see mysqld/mariadbd though terminal reported kill 9)

Comment by Daniel Black [ 2022-09-03 ]

Test 2 with changes:

As test 1 used < 50% of the buffer pool and most of the impacts of this change are write based lets look at the impacts under a memory constrained high write test.

$ mariadblocal --innodb-buffer-pool-size=8G --innodb_max_dirty_pages_pct=20

Ran the sysbench prepare. Then started recording at the start of sysbench run:

sysbench run

$sysbench --mysql-user=dan --mysql-db=test --mysql-socket=/tmp/build-mariadb-server-10.11.sock  --threads=16 --time=600 --report-interval=20 --tables=32 --table_size=1000000 oltp_write_only run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
 
Running the test with following options:
Number of threads: 16
Report intermediate results every 20 second(s)
Initializing random number generator from current time
 
 
Initializing worker threads...
 
Threads started!
 
[ 20s ] thds: 16 tps: 28841.06 qps: 173049.15 (r/w/o: 0.00/115366.23/57682.92) lat (ms,95%): 1.30 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 16 tps: 30279.59 qps: 181677.15 (r/w/o: 0.00/121118.07/60559.08) lat (ms,95%): 1.25 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: 16 tps: 28521.88 qps: 171130.79 (r/w/o: 0.00/114087.03/57043.76) lat (ms,95%): 1.32 err/s: 0.00 reconn/s: 0.00
[ 80s ] thds: 16 tps: 26514.46 qps: 159087.68 (r/w/o: 0.00/106058.65/53029.03) lat (ms,95%): 1.32 err/s: 0.00 reconn/s: 0.00
[ 100s ] thds: 16 tps: 26596.77 qps: 159579.57 (r/w/o: 0.00/106386.23/53193.34) lat (ms,95%): 1.30 err/s: 0.00 reconn/s: 0.00
[ 120s ] thds: 16 tps: 26614.08 qps: 159684.84 (r/w/o: 0.00/106456.52/53228.31) lat (ms,95%): 1.30 err/s: 0.00 reconn/s: 0.00
[ 140s ] thds: 16 tps: 28534.46 qps: 171206.33 (r/w/o: 0.00/114137.45/57068.88) lat (ms,95%): 1.18 err/s: 0.00 reconn/s: 0.00
[ 160s ] thds: 16 tps: 29573.10 qps: 177438.79 (r/w/o: 0.00/118292.60/59146.20) lat (ms,95%): 1.14 err/s: 0.00 reconn/s: 0.00
[ 180s ] thds: 16 tps: 30068.12 qps: 180409.24 (r/w/o: 0.00/120272.90/60136.35) lat (ms,95%): 1.14 err/s: 0.00 reconn/s: 0.00
[ 200s ] thds: 16 tps: 30389.74 qps: 182338.22 (r/w/o: 0.00/121558.73/60779.49) lat (ms,95%): 1.12 err/s: 0.00 reconn/s: 0.00
[ 220s ] thds: 16 tps: 29932.09 qps: 179592.03 (r/w/o: 0.00/119728.00/59864.03) lat (ms,95%): 1.14 err/s: 0.00 reconn/s: 0.00
[ 240s ] thds: 16 tps: 29611.89 qps: 177671.80 (r/w/o: 0.00/118447.92/59223.88) lat (ms,95%): 1.14 err/s: 0.00 reconn/s: 0.00
[ 260s ] thds: 16 tps: 30119.01 qps: 180713.95 (r/w/o: 0.00/120475.98/60237.97) lat (ms,95%): 1.12 err/s: 0.00 reconn/s: 0.00
[ 280s ] thds: 16 tps: 30294.22 qps: 181765.84 (r/w/o: 0.00/121177.31/60588.53) lat (ms,95%): 1.14 err/s: 0.00 reconn/s: 0.00
[ 300s ] thds: 16 tps: 27774.82 qps: 166648.14 (r/w/o: 0.00/111098.59/55549.55) lat (ms,95%): 1.23 err/s: 0.00 reconn/s: 0.00
 
 
[ 320s ] thds: 16 tps: 39110.69 qps: 234664.67 (r/w/o: 0.00/156443.25/78221.42) lat (ms,95%): 0.74 err/s: 0.00 reconn/s: 0.00

perf stats

$ perf stat -p $(pidof mariadbd) -e major-faults -e minor-faults -e cs -e dTLB-load-misses -e dTLB-loads -e dTLB-store-misses -e dTLB-stores -- sleep 300
 
 Performance counter stats for process id '220488':
 
                 1      major-faults:u                                              
            70,871      minor-faults:u                                              
                 0      cs:u                                                        
     3,393,489,253      dTLB-load-misses:u        #    0.42% of all dTLB cache accesses
   811,245,260,049      dTLB-loads:u                                                
       744,565,106      dTLB-store-misses:u                                         
   489,518,021,210      dTLB-stores:u                                               
 
     300.029098475 seconds time elapsed

rss - rather unremarkable

$ while [ 1 ] ; do echo "$(date) pages: $(wc -l < /proc/$(pidof mysqld)/maps) DRS,TRS,VSZ,RSS  $(ps  -C mysqld -o drs,trs,vsz,rss | tail -n 1)"; sleep 1 ; done
Sat 03 Sep 2022 11:46:17 AEST pages: 342 DRS,TRS,VSZ,RSS  13000039 9156 13009196 8710408
Sat 03 Sep 2022 11:46:18 AEST pages: 342 DRS,TRS,VSZ,RSS  13000039 9156 13009196 8710408
Sat 03 Sep 2022 11:46:19 AEST pages: 342 DRS,TRS,VSZ,RSS  13000039 9156 13009196 8710408
Sat 03 Sep 2022 11:46:20 AEST pages: 342 DRS,TRS,VSZ,RSS  13000039 9156 13009196 8710408
Sat 03 Sep 2022 11:46:21 AEST pages: 342 DRS,TRS,VSZ,RSS  13000039 9156 13009196 8710408
Sat 03 Sep 2022 11:46:22 AEST pages: 342 DRS,TRS,VSZ,RSS  13000039 9156 13009196 8710408
Sat 03 Sep 2022 11:46:23 AEST pages: 342 DRS,TRS,VSZ,RSS  13000039 9156 13009196 8758300
Sat 03 Sep 2022 11:46:24 AEST pages: 342 DRS,TRS,VSZ,RSS  13000039 9156 13009196 8929108
Sat 03 Sep 2022 11:46:25 AEST pages: 342 DRS,TRS,VSZ,RSS  13000039 9156 13009196 8963956
Sat 03 Sep 2022 11:46:26 AEST pages: 342 DRS,TRS,VSZ,RSS  13000039 9156 13009196 8974516
Sat 03 Sep 2022 11:46:27 AEST pages: 342 DRS,TRS,VSZ,RSS  13000039 9156 13009196 8982436
Sat 03 Sep 2022 11:46:28 AEST pages: 342 DRS,TRS,VSZ,RSS  13000039 9156 13009196 8987188
...
Sat 03 Sep 2022 11:51:02 AEST pages: 344 DRS,TRS,VSZ,RSS  13000339 9156 13009496 8992104
Sat 03 Sep 2022 11:51:03 AEST pages: 344 DRS,TRS,VSZ,RSS  13000339 9156 13009496 8992104
Sat 03 Sep 2022 11:51:04 AEST pages: 344 DRS,TRS,VSZ,RSS  13000339 9156 13009496 8992104
Sat 03 Sep 2022 11:51:05 AEST pages: 344 DRS,TRS,VSZ,RSS  13000339 9156 13009496 8992104
Sat 03 Sep 2022 11:51:06 AEST pages: 344 DRS,TRS,VSZ,RSS  13000339 9156 13009496 8992104
Sat 03 Sep 2022 11:51:07 AEST pages: 344 DRS,TRS,VSZ,RSS  13000339 9156 13009496 8992104
Sat 03 Sep 2022 11:51:08 AEST pages: 344 DRS,TRS,VSZ,RSS  13000339 9156 13009496 8992104
Sat 03 Sep 2022 11:51:09 AEST pages: 344 DRS,TRS,VSZ,RSS  13000339 9156 13009496 8992104

perf report

Samples: 15M of event 'cycles:u', Event count (approx.): 4848655646262
Overhead  Comman  Shared Ob  Symbol
   0.00%  mysqld  libc.so.6  [.] __madvise                                                                                                                                                                      
   0.00%  mysqld  mariadbd   [.] madvise@plt    
   0.00%  mysqld  mariadbd  [.] buf_LRU_block_free_non_file_page       
   0.00%  mysqld  mariadbd  [.] buf_LRU_get_free_only    

During test innodb pages filled up quickly and as expected ran with no free pages:

Sat 03 Sep 2022 11:38:38 AEST $mysql -u dan -S /tmp/build-mariadb-server-10.11.sock -e 'show global status like "innodb_buffer_pool_pages%"'
+-----------------------------------------+---------+
| Variable_name                           | Value   |
+-----------------------------------------+---------+
| Innodb_buffer_pool_pages_data           | 519168  |
| Innodb_buffer_pool_pages_dirty          | 89049   |
| Innodb_buffer_pool_pages_flushed        | 6801526 |
| Innodb_buffer_pool_pages_free           | 0       |
| Innodb_buffer_pool_pages_made_not_young | 2682167 |
| Innodb_buffer_pool_pages_made_young     | 544449  |
| Innodb_buffer_pool_pages_misc           | 0       |
| Innodb_buffer_pool_pages_old            | 191626  |
| Innodb_buffer_pool_pages_total          | 519168  |
| Innodb_buffer_pool_pages_lru_flushed    | 0       |
| Innodb_buffer_pool_pages_lru_freed      | 0       |
+-----------------------------------------+---------+
 
Sat 03 Sep 2022 11:41:36 AEST $mysql -u dan -S /tmp/build-mariadb-server-10.11.sock -e 'show global status like "innodb_buffer_pool_pages%"'
+-----------------------------------------+----------+
| Variable_name                           | Value    |
+-----------------------------------------+----------+
| Innodb_buffer_pool_pages_data           | 519168   |
| Innodb_buffer_pool_pages_dirty          | 88789    |
| Innodb_buffer_pool_pages_flushed        | 14824269 |
| Innodb_buffer_pool_pages_free           | 0        |
| Innodb_buffer_pool_pages_made_not_young | 2753474  |
| Innodb_buffer_pool_pages_made_young     | 762473   |
| Innodb_buffer_pool_pages_misc           | 0        |
| Innodb_buffer_pool_pages_old            | 191626   |
| Innodb_buffer_pool_pages_total          | 519168   |
| Innodb_buffer_pool_pages_lru_flushed    | 64       |
| Innodb_buffer_pool_pages_lru_freed      | 0        |
+-----------------------------------------+----------+

Comment by Daniel Black [ 2022-09-03 ]

This is the control test for Test 2 above

(like above with change reverted - so running on bc563f1a4b0)

sysbench run

Sat 03 Sep 2022 12:03:20 AEST $sysbench --mysql-user=dan --mysql-db=test --mysql-socket=/tmp/build-mariadb-server-10.11.sock  --threads=16 --time=600 --report-interval=20 --tables=32 --table_size=1000000 oltp_write_only run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
 
Running the test with following options:
Number of threads: 16
Report intermediate results every 20 second(s)
Initializing random number generator from current time
 
 
Initializing worker threads...
 
Threads started!
 
[ 20s ] thds: 16 tps: 26622.12 qps: 159734.28 (r/w/o: 0.00/106489.39/53244.89) lat (ms,95%): 1.39 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 16 tps: 26362.80 qps: 158177.45 (r/w/o: 0.00/105451.70/52725.75) lat (ms,95%): 1.37 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: 16 tps: 27388.88 qps: 164333.51 (r/w/o: 0.00/109555.76/54777.75) lat (ms,95%): 1.25 err/s: 0.00 reconn/s: 0.00
[ 80s ] thds: 16 tps: 28835.49 qps: 173012.43 (r/w/o: 0.00/115341.50/57670.93) lat (ms,95%): 1.18 err/s: 0.00 reconn/s: 0.00
[ 100s ] thds: 16 tps: 29049.53 qps: 174297.22 (r/w/o: 0.00/116198.12/58099.11) lat (ms,95%): 1.16 err/s: 0.00 reconn/s: 0.00
[ 120s ] thds: 16 tps: 28434.59 qps: 170607.55 (r/w/o: 0.00/113738.37/56869.18) lat (ms,95%): 1.18 err/s: 0.00 reconn/s: 0.00
[ 140s ] thds: 16 tps: 28294.86 qps: 169769.29 (r/w/o: 0.00/113179.61/56589.68) lat (ms,95%): 1.18 err/s: 0.00 reconn/s: 0.00
[ 160s ] thds: 16 tps: 28412.45 qps: 170474.89 (r/w/o: 0.00/113650.05/56824.85) lat (ms,95%): 1.18 err/s: 0.00 reconn/s: 0.00
[ 180s ] thds: 16 tps: 27445.89 qps: 164674.72 (r/w/o: 0.00/109782.95/54891.77) lat (ms,95%): 1.23 err/s: 0.00 reconn/s: 0.00
[ 200s ] thds: 16 tps: 27836.61 qps: 167019.49 (r/w/o: 0.00/111346.31/55673.18) lat (ms,95%): 1.21 err/s: 0.00 reconn/s: 0.00
[ 220s ] thds: 16 tps: 29266.08 qps: 175596.79 (r/w/o: 0.00/117064.52/58532.26) lat (ms,95%): 1.16 err/s: 0.00 reconn/s: 0.00
[ 240s ] thds: 16 tps: 30208.79 qps: 181252.55 (r/w/o: 0.00/120835.17/60417.38) lat (ms,95%): 1.12 err/s: 0.00 reconn/s: 0.00
[ 260s ] thds: 16 tps: 29311.08 qps: 175866.82 (r/w/o: 0.00/117244.46/58622.36) lat (ms,95%): 1.14 err/s: 0.00 reconn/s: 0.00
[ 280s ] thds: 16 tps: 28993.19 qps: 173959.11 (r/w/o: 0.00/115972.69/57986.42) lat (ms,95%): 1.14 err/s: 0.00 reconn/s: 0.00
[ 300s ] thds: 16 tps: 29275.21 qps: 175651.41 (r/w/o: 0.00/117101.09/58550.32) lat (ms,95%): 1.10 err/s: 0.00 reconn/s: 0.00

perf stats

$ perf stat -p $(pidof mariadbd) -e major-faults -e minor-faults -e cs -e dTLB-load-misses -e dTLB-loads -e dTLB-store-misses -e dTLB-stores -- sleep 300
 
 Performance counter stats for process id '261382':
 
                 0      major-faults:u                                              
            70,684      minor-faults:u                                              
                 0      cs:u                                                        
     3,046,534,435      dTLB-load-misses:u        #    0.38% of all dTLB cache accesses
   801,705,778,029      dTLB-loads:u                                                
       704,918,347      dTLB-store-misses:u                                         
   483,420,678,702      dTLB-stores:u                                               
 
     300.031959788 seconds time elapsed

rss

$ while [ 1 ] ; do echo "$(date) pages: $(wc -l < /proc/$(pidof mysqld)/maps) DRS,TRS,VSZ,RSS  $(ps  -C mysqld -o drs,trs,vsz,rss | tail -n 1)"; sleep 1 ; done
Sat 03 Sep 2022 12:03:27 AEST pages: 343 DRS,TRS,VSZ,RSS  13073771 9156 13082928 8699840
Sat 03 Sep 2022 12:03:28 AEST pages: 343 DRS,TRS,VSZ,RSS  13073771 9156 13082928 8699840
Sat 03 Sep 2022 12:03:29 AEST pages: 343 DRS,TRS,VSZ,RSS  13073771 9156 13082928 8699840
Sat 03 Sep 2022 12:03:30 AEST pages: 343 DRS,TRS,VSZ,RSS  13073771 9156 13082928 8699840
Sat 03 Sep 2022 12:03:31 AEST pages: 343 DRS,TRS,VSZ,RSS  13073771 9156 13082928 8699840
Sat 03 Sep 2022 12:03:32 AEST pages: 343 DRS,TRS,VSZ,RSS  13073771 9156 13082928 8699840
..
Sat 03 Sep 2022 12:04:32 AEST pages: 343 DRS,TRS,VSZ,RSS  13073771 9156 13082928 8978144
Sat 03 Sep 2022 12:04:33 AEST pages: 343 DRS,TRS,VSZ,RSS  13073771 9156 13082928 8979184
Sat 03 Sep 2022 12:04:34 AEST pages: 343 DRS,TRS,VSZ,RSS  13073771 9156 13082928 8979304
Sat 03 Sep 2022 12:04:35 AEST pages: 343 DRS,TRS,VSZ,RSS  13073771 9156 13082928 8979304
..
Sat 03 Sep 2022 12:08:39 AEST pages: 343 DRS,TRS,VSZ,RSS  13073771 9156 13082928 8981108
Sat 03 Sep 2022 12:08:40 AEST pages: 343 DRS,TRS,VSZ,RSS  13073771 9156 13082928 8981108
Sat 03 Sep 2022 12:08:41 AEST pages: 343 DRS,TRS,VSZ,RSS  13073771 9156 13082928 8981108

So tps appears to be about the same.

There's a large rise in dTLB-load-misses (11.38%) and dTLB-store-misses (5.62%) the portion of compared to dTLB-loads/stores is still low however.

The cost to the CPU(mmu) of a load/store miss is a memory access to look up the mapping.

Comment by Matthias Leich [ 2022-09-05 ]

origin/bb-10.11-danielblack-MDEV-25341-innodb-politeness e9d2ef69408ab7c8c64330f7c359433e41e61c1e 2022-08-31T13:49:33+03:00
performed well in RQG testing.

Comment by Daniel Black [ 2022-09-12 ]

Thank you all for reviews and patience.

Generated at Thu Feb 08 09:36:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.