[MDEV-29967] innodb_read_ahead_threshold (linear read-ahead) does not work Created: 2022-11-07  Updated: 2023-09-26  Resolved: 2023-05-11

Status: Closed
Project: MariaDB Server
Component/s: Server, Storage Engine - InnoDB
Affects Version/s: 10.5.4, 10.6.0, 10.6.8, 10.9.3
Fix Version/s: 10.5.22, 10.6.15, 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2

Type: Bug Priority: Critical
Reporter: Christian Hesse Assignee: Marko Mäkelä
Resolution: Fixed Votes: 3
Labels: performance, regression
Environment:

Arch Linux VM running on VMware ESXi


Issue Links:
Blocks
blocks MDEV-31227 innodb_flush_method=O_DIRECT causes 3... Closed
Problem/Incident
is caused by MDEV-15053 Reduce buf_pool_t::mutex contention Closed
Relates
relates to MDEV-26055 Adaptive flushing is still not gettin... Closed
relates to MDEV-30400 Assertion `height == btr_page_get_lev... Closed
relates to MDEV-31254 InnoDB: Trying to read doublewrite bu... Closed
relates to MDEV-29343 MariaDB 10.6.x slower mysqldump etc. Closed
relates to MDEV-30986 Slow full index scan in 10.6 vs 10.5 ... Closed

 Description   

We upgraded an MariaDB Galera Cluster from Version 10.5.13 to 10.9.3. Everything is up and running, from a client's point of view.

However we see a regression in I/O access when running mariadb-dump. A custom script (dumping and compressing every table in separate file) ran for about 45 minutes with MariaDB 10.5.13, the time increased to about two and a half hour with MariaDB 10.9.3.

A simple cp can do about 500MB/s on the machines, with mariadb-dump we get about 25MB/s - limited by disk I/O (where mariadbd is in I/O wait / state "D"). MariaDB 10.5.13 did something about 100MB/s - limited by compression, so could probably do a lot more.

This happens with all available I/O methods, tried with aio, uring and native. Also tried different filesystems (ext4 & btrfs) which does not make a difference. Same results with a cluster node and standalone (non-Galera) installation.

No idea if I/O for regular SQL queries regressed as well. The machines have a reasonable amount of RAM, so no delay is noticeable there. Running mariabackup for Galera state transfer reaches expected transfer rates.



 Comments   
Comment by Marko Mäkelä [ 2022-11-07 ]

What are your settings of innodb_buffer_pool_size and innodb_flush_method on 10.5.13 and 10.9.3? Note that the default for the latter was changed in MDEV-24854 (10.6).

Comment by Daniel Black [ 2022-11-08 ]

Are you able to test the mariadb-dump from 10.5.X and see if its a regression in the SQL generated by the backup or the server behaviour.

Are you able to enable the general log on the galera doing the backup and compare the comparative timing of queries or groups of queries? Or enable the slow query log with a short long_query_time? Is the global status able to record the effects of just the mariadb-dump?

Comment by Christian Hesse [ 2022-11-08 ]

Changing innodb_flush_method from O_DIRECT to fsync did the trick. Running mariadb-dump is back to expected transfer rates. I will keep this setting for our backup instance at least.

Just for completeness... The variable innodb_buffer_pool_size is set to 8589934592 (which is 8GB). But please note that the instance is started from a filesystem snapshot just for the backup, so there are no other queries to fill the buffer. Everything is read from disk.

So looks like O_DIRECT causes to read in small chunks, and a lot of these reads cause high I/O load and wait. On the other hand the operating system's filesystem cache causes to read bigger chunks (via read-ahead?). Is this something to be fixed for I/O via O_DIRECT? I think others may be suffering the same, possibly without actively noticing (but drinking an unnecessary amount of coffee while waiting for the dumps).

Comment by Arthur van Kleef [ 2022-11-10 ]

We are seeing this as well after upgrading from 10.4.26 to 10.6.10. Time to dump a big database schema jumped from ~4,5 hours to ~7 hours. We have always been using innodb_flush_method=O_DIRECT.

It's as if read-ahead is not working properly, because on 10.6.10 Innodb_buffer_pool_read_ahead status variable stays at 0, while on 10.4.x it was hundreds of millions .

We monitor performance with atop which tells that avq (average queue depth, similar to iostat aqu-sz I think?) was at ~30.00 when mysqldump on 10.4, while at 10.6 it stays at ~1.00.

I tried to enable innodb_random_read_ahead (idk how to enable/disable the linear read-ahead) and that seems to restore mysqldump speed somewhat to the old performance of 10.4 and loading the buffer_pool at startup is also way faster then. But it only helps for a brief period: after a couple of hours it stops working? (Innodb_buffer_pool_read_ahead_rnd no longer going up).

Changing to innodb_flush_method=fsync seems to restore mysqldump speed (rareq-sz goes up from 16kb to ~112kb).

It seems very similar to MDEV-29343 btw

Comment by Daniel Black [ 2022-11-11 ]

eworm, at0r, most useful. Thanks.

I suspect innodb_flush_method=fsync, but not O_DIRECT would get read ahead benefits from /sys/block/<disk>/queue/read_ahead_kb.

Added 10.6.10, (and not 10.5) was this commit affecting linear read ahead (large restructure, so I may have missed something).

To disable linear read ahead from default would be innodb_read_ahead=none.

Comment by Daniel Black [ 2022-11-14 ]

tested:

10.5.19-fe330b63fe4ee51a26ac6e222d8eb4ee44a73d50

MariaDB [test]> create table t1 as select seq from seq_1_to_10000000;
Query OK, 10000000 rows affected (39.684 sec)
Records: 10000000  Duplicates: 0  Warnings: 0
MariaDB [test]> shutdown;

Copied datadir to 10.6 instance. Removed ib_buffer_pool from both instances.

10.6.12-505da21e336f1e46b655f78fc0ad42e6cf17494d

$ mc -e 'show global status like "innodb_buffer_pool_read%"'
+---------------------------------------+-------+
| Variable_name                         | Value |
+---------------------------------------+-------+
| Innodb_buffer_pool_read_ahead_rnd     | 0     |
| Innodb_buffer_pool_read_ahead         | 0     |
| Innodb_buffer_pool_read_ahead_evicted | 0     |
| Innodb_buffer_pool_read_requests      | 991   |
| Innodb_buffer_pool_reads              | 189   |
+---------------------------------------+-------+

And it stays at this.

Compared to 10.5 restarted which is in the seconds after start:

2022-11-14 10:39:10 0 [Note] InnoDB: 10.5.19 started; log sequence number 2732064850; transaction id 110
2022-11-14 10:39:10 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-11-14 10:39:10 0 [Note] InnoDB: Loading buffer pool(s) from /tmp/build-mariadb-server-10.5-datadir/ib_buffer_pool
2022-11-14 10:39:10 0 [Note] InnoDB: Cannot open '/tmp/build-mariadb-server-10.5-datadir/ib_buffer_pool' for reading: No such file or directory
2022-11-14 10:39:10 0 [Note] Reading of all Master_info entries succeeded
2022-11-14 10:39:10 0 [Note] Added new Master_info '' to hash table
2022-11-14 10:39:10 0 [Note] sql/mysqld: ready for connections.
Version: '10.5.19-MariaDB-1:10.5.13+maria~stretch'  socket: '/tmp/build-mariadb-server-10.5.sock'  port: 0  mariadb.org binary distribution
...
 
$  mc -e 'show global status like "innodb_buffer_pool_read%"; select now()'
+---------------------------------------+---------+
| Variable_name                         | Value   |
+---------------------------------------+---------+
| Innodb_buffer_pool_read_ahead_rnd     | 0       |
| Innodb_buffer_pool_read_ahead         | 0       |
| Innodb_buffer_pool_read_ahead_evicted | 0       |
| Innodb_buffer_pool_read_requests      | 7710479 |
| Innodb_buffer_pool_reads              | 8234    |
+---------------------------------------+---------+
+---------------------+
| now()               |
+---------------------+
| 2022-11-14 10:39:12 |
+---------------------+
 
~/repos/build-mariadb-server-10.5 
$  mc -e 'show global status like "innodb_buffer_pool_read%"; select now()'
+---------------------------------------+----------+
| Variable_name                         | Value    |
+---------------------------------------+----------+
| Innodb_buffer_pool_read_ahead_rnd     | 0        |
| Innodb_buffer_pool_read_ahead         | 0        |
| Innodb_buffer_pool_read_ahead_evicted | 0        |
| Innodb_buffer_pool_read_requests      | 13399085 |
| Innodb_buffer_pool_reads              | 14165    |
+---------------------------------------+----------+
+---------------------+
| now()               |
+---------------------+
| 2022-11-14 10:39:14 |
+---------------------+
 
~/repos/build-mariadb-server-10.5 
$  mc -e 'show global status like "innodb_buffer_pool_read%"; select now()'
+---------------------------------------+----------+
| Variable_name                         | Value    |
+---------------------------------------+----------+
| Innodb_buffer_pool_read_ahead_rnd     | 0        |
| Innodb_buffer_pool_read_ahead         | 0        |
| Innodb_buffer_pool_read_ahead_evicted | 0        |
| Innodb_buffer_pool_read_requests      | 16941559 |
| Innodb_buffer_pool_reads              | 17694    |
+---------------------------------------+----------+
+---------------------+
| now()               |
+---------------------+
| 2022-11-14 10:39:15 |
+---------------------+
 
~/repos/build-mariadb-server-10.5 
$  mc -e 'show global status like "innodb_buffer_pool_read%"; select now()'
+---------------------------------------+----------+
| Variable_name                         | Value    |
+---------------------------------------+----------+
| Innodb_buffer_pool_read_ahead_rnd     | 0        |
| Innodb_buffer_pool_read_ahead         | 0        |
| Innodb_buffer_pool_read_ahead_evicted | 0        |
| Innodb_buffer_pool_read_requests      | 18306840 |
| Innodb_buffer_pool_reads              | 18755    |
+---------------------------------------+----------+
+---------------------+
| now()               |
+---------------------+
| 2022-11-14 10:39:16 |
+---------------------+
 
~/repos/build-mariadb-server-10.5 
$  mc -e 'show global status like "innodb_buffer_pool_read%"; select now()'
+---------------------------------------+----------+
| Variable_name                         | Value    |
+---------------------------------------+----------+
| Innodb_buffer_pool_read_ahead_rnd     | 0        |
| Innodb_buffer_pool_read_ahead         | 0        |
| Innodb_buffer_pool_read_ahead_evicted | 0        |
| Innodb_buffer_pool_read_requests      | 20046063 |
| Innodb_buffer_pool_reads              | 20154    |
+---------------------------------------+----------+
+---------------------+
| now()               |
+---------------------+
| 2022-11-14 10:39:17 |
+---------------------+

So 10.5 had something aggressively reading in the data from the start, without a single query being executed.

6 seconds of perf showed a lot of time in the purge thread.

$ perf record -g -- sql/mysqld --no-defaults --datadir=/tmp/${PWD##*/}-datadir --socket=/tmp/${PWD##*/}.sock --plugin-dir=${PWD}/mysql-test/var/plugins/ --verbose --skip-networking
2022-11-14 10:50:14 0 [Note] sql/mysqld (mysqld 10.5.19-MariaDB-1:10.5.13+maria~stretch) starting as process 33543 ...
2022-11-14 10:50:14 0 [Note] InnoDB: Uses event mutexes
2022-11-14 10:50:14 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-11-14 10:50:14 0 [Note] InnoDB: Number of pools: 1
2022-11-14 10:50:14 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-11-14 10:50:14 0 [Note] InnoDB: Using Linux native AIO
2022-11-14 10:50:14 0 [Note] InnoDB: Initializing buffer pool, total size = 134217728, chunk size = 134217728
2022-11-14 10:50:14 0 [Note] InnoDB: Completed initialization of buffer pool
2022-11-14 10:50:14 0 [Note] InnoDB: 128 rollback segments are active.
2022-11-14 10:50:14 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-11-14 10:50:14 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2022-11-14 10:50:14 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2022-11-14 10:50:14 0 [Note] InnoDB: 10.5.19 started; log sequence number 2732064862; transaction id 110
2022-11-14 10:50:14 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-11-14 10:50:14 0 [Note] InnoDB: Loading buffer pool(s) from /tmp/build-mariadb-server-10.5-datadir/ib_buffer_pool
2022-11-14 10:50:14 0 [Note] InnoDB: Cannot open '/tmp/build-mariadb-server-10.5-datadir/ib_buffer_pool' for reading: No such file or directory
2022-11-14 10:50:14 0 [Note] Reading of all Master_info entries succeeded
2022-11-14 10:50:14 0 [Note] Added new Master_info '' to hash table
2022-11-14 10:50:14 0 [Note] sql/mysqld: ready for connections.
Version: '10.5.19-MariaDB-1:10.5.13+maria~stretch'  socket: '/tmp/build-mariadb-server-10.5.sock'  port: 0  mariadb.org binary distribution
2022-11-14 10:50:22 0 [Note] sql/mysqld (initiated by: unknown): Normal shutdown
2022-11-14 10:50:22 0 [Note] Event Scheduler: Purging the queue. 0 events
2022-11-14 10:50:22 0 [Note] InnoDB: FTS optimize thread exiting.
2022-11-14 10:50:22 0 [Note] InnoDB: Starting shutdown...
2022-11-14 10:50:22 0 [Note] InnoDB: Dumping buffer pool(s) to /tmp/build-mariadb-server-10.5-datadir/ib_buffer_pool
2022-11-14 10:50:22 0 [Note] InnoDB: Restricted to 2016 pages due to innodb_buf_pool_dump_pct=25
2022-11-14 10:50:22 0 [Note] InnoDB: Buffer pool(s) dump completed at 221114 10:50:22
2022-11-14 10:50:23 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2022-11-14 10:50:23 0 [Note] InnoDB: Shutdown completed; log sequence number 2732064874; transaction id 111
2022-11-14 10:50:23 0 [Note] sql/mysqld: Shutdown complete
 
[ perf record: Woken up 13 times to write data ]
[ perf record: Captured and wrote 3.899 MB perf.data (32688 samples) ]
 
~/repos/build-mariadb-server-10.5 
$ perf report -g
 
Samples: 32K of event 'cycles:u', Event count (approx.): 36937185750
  Children      Self  Command  Shared Object         Symbol
+   86.08%     0.00%  mysqld   libstdc++.so.6.0.30   [.] 0x00007f39be6dbb03                                                                                                                                         â—†
+   86.08%     0.00%  mysqld   mariadbd              [.] tpool::thread_pool_generic::worker_main                                                                                                                    â–’
+   86.08%     0.00%  mysqld   mariadbd              [.] tpool::task_group::execute                                                                                                                                 â–’
+   50.53%     0.00%  mysqld   mariadbd              [.] purge_worker_callback                                                                                                                                      â–’
+   50.47%     4.01%  mysqld   mariadbd              [.] que_run_threads                                                                                                                                            â–’
+   45.43%     6.27%  mysqld   mariadbd              [.] row_purge_step                                                                                                                                             â–’
+   35.55%     0.00%  mysqld   mariadbd              [.] purge_coordinator_callback                                                                                                                                 â–’
+   34.30%     0.34%  mysqld   mariadbd              [.] row_purge_reset_trx_id                                                                                                                                     â–’
+   34.03%     7.32%  mysqld   mariadbd              [.] trx_purge                                                                                                                                                  â–’
+   32.75%     0.08%  mysqld   mariadbd              [.] row_purge_reposition_pcur                                                                                                                                  â–’
+   31.78%     0.09%  mysqld   mariadbd              [.] row_search_on_row_ref                                                                                                                                      â–’
+   31.70%     0.28%  mysqld   mariadbd              [.] btr_pcur_open_low                                                                                                                                          â–’
+   31.02%     2.00%  mysqld   mariadbd              [.] btr_cur_search_to_nth_level                                                                                                                                â–’
+   21.92%     2.91%  mysqld   mariadbd              [.] page_cur_search_with_match                                                                                                                                 â–’
+   20.56%     1.52%  mysqld   mariadbd              [.] buf_page_get_gen                                                                                                                                           â–’
+   19.12%    10.86%  mysqld   mariadbd              [.] buf_page_get_low                                                                                                                                           â–’
+   11.10%     2.51%  mysqld   mariadbd              [.] cmp_dtuple_rec_with_match_low                                                                                                                              â–’
+    7.84%     5.83%  mysqld   mariadbd              [.] rec_get_offsets_func                                                                                                                                       â–’
+    6.03%     0.00%  mysqld   [unknown]             [.] 0000000000000000                                                                                                                                           â–’
+    5.28%     0.00%  mysqld   [unknown]             [.] 0x0000611000032b98                                                                                                                                         â–’
+    5.28%     0.00%  mysqld   [unknown]             [.] 0x00007f39995fb370  

So on the read ahead in 10.6 - none on a clean startup:

10.6.12

$ mc -e 'show global status like "innodb_buffer_pool_read%"'
+---------------------------------------+-------+
| Variable_name                         | Value |
+---------------------------------------+-------+
| Innodb_buffer_pool_read_ahead_rnd     | 0     |
| Innodb_buffer_pool_read_ahead         | 0     |
| Innodb_buffer_pool_read_ahead_evicted | 0     |
| Innodb_buffer_pool_read_requests      | 991   |
| Innodb_buffer_pool_reads              | 189   |
+---------------------------------------+-------+
 
~/repos/build-mariadb-server-10.6 
$ time client/mysqldump -S /tmp/build-mariadb-server-10.6.sock test > /dev/null
 
real	0m2.111s
user	0m0.503s
sys	0m0.026s
 
~/repos/build-mariadb-server-10.6 
$ mc -e 'show global status like "innodb_buffer_pool_read%"'
+---------------------------------------+----------+
| Variable_name                         | Value    |
+---------------------------------------+----------+
| Innodb_buffer_pool_read_ahead_rnd     | 0        |
| Innodb_buffer_pool_read_ahead         | 0        |
| Innodb_buffer_pool_read_ahead_evicted | 1        |
| Innodb_buffer_pool_read_requests      | 10022373 |
| Innodb_buffer_pool_reads              | 21560    |
+---------------------------------------+----------+

Compared to 10.5, after the ~10-20 seconds for the reads/read request to stabilize:

$  mc -e 'show global status like "innodb_buffer_pool%"; select now()'
+-----------------------------------------+-----------+
| Variable_name                           | Value     |
+-----------------------------------------+-----------+
| Innodb_buffer_pool_dump_status          |           |
| Innodb_buffer_pool_load_status          |           |
| Innodb_buffer_pool_resize_status        |           |
| Innodb_buffer_pool_load_incomplete      | OFF       |
| Innodb_buffer_pool_pages_data           | 8065      |
| Innodb_buffer_pool_bytes_data           | 132136960 |
| Innodb_buffer_pool_pages_dirty          | 0         |
| Innodb_buffer_pool_bytes_dirty          | 0         |
| Innodb_buffer_pool_pages_flushed        | 0         |
| Innodb_buffer_pool_pages_free           | 0         |
| Innodb_buffer_pool_pages_made_not_young | 36325519  |
| Innodb_buffer_pool_pages_made_young     | 21        |
| Innodb_buffer_pool_pages_misc           | 0         |
| Innodb_buffer_pool_pages_old            | 2975      |
| Innodb_buffer_pool_pages_total          | 8065      |
| Innodb_buffer_pool_pages_lru_flushed    | 0         |
| Innodb_buffer_pool_read_ahead_rnd       | 0         |
| Innodb_buffer_pool_read_ahead           | 0         |
| Innodb_buffer_pool_read_ahead_evicted   | 0         |
| Innodb_buffer_pool_read_requests        | 56189465  |
| Innodb_buffer_pool_reads                | 48791     |
| Innodb_buffer_pool_wait_free            | 0         |
| Innodb_buffer_pool_write_requests       | 515       |
+-----------------------------------------+-----------+
+---------------------+
| now()               |
+---------------------+
| 2022-11-14 11:06:13 |
+---------------------+
 
~/repos/build-mariadb-server-10.5 
$ time client/mysqldump -S /tmp/build-mariadb-server-10.5.sock test > /dev/null
 
real	0m6.390s
user	0m1.275s
sys	0m0.088s
 
~/repos/build-mariadb-server-10.5 
$  mc -e 'show global status like "innodb_buffer_pool%"; select now()'
+-----------------------------------------+-----------+
| Variable_name                           | Value     |
+-----------------------------------------+-----------+
| Innodb_buffer_pool_dump_status          |           |
| Innodb_buffer_pool_load_status          |           |
| Innodb_buffer_pool_resize_status        |           |
| Innodb_buffer_pool_load_incomplete      | OFF       |
| Innodb_buffer_pool_pages_data           | 8065      |
| Innodb_buffer_pool_bytes_data           | 132136960 |
| Innodb_buffer_pool_pages_dirty          | 0         |
| Innodb_buffer_pool_bytes_dirty          | 0         |
| Innodb_buffer_pool_pages_flushed        | 0         |
| Innodb_buffer_pool_pages_free           | 0         |
| Innodb_buffer_pool_pages_made_not_young | 46346890  |
| Innodb_buffer_pool_pages_made_young     | 21        |
| Innodb_buffer_pool_pages_misc           | 0         |
| Innodb_buffer_pool_pages_old            | 2975      |
| Innodb_buffer_pool_pages_total          | 8065      |
| Innodb_buffer_pool_pages_lru_flushed    | 0         |
| Innodb_buffer_pool_read_ahead_rnd       | 0         |
| Innodb_buffer_pool_read_ahead           | 0         |
| Innodb_buffer_pool_read_ahead_evicted   | 0         |
| Innodb_buffer_pool_read_requests        | 66210840  |
| Innodb_buffer_pool_reads                | 70161     |
| Innodb_buffer_pool_wait_free            | 0         |
| Innodb_buffer_pool_write_requests       | 515       |
+-----------------------------------------+-----------+
+---------------------+
| now()               |
+---------------------+
| 2022-11-14 11:06:28 |
+---------------------+

So a longer dump, ~ same number of Innodb_buffer_pool_reads, no read ahead either.

10.4 by comparison, after redo log was applied. Read and read ahead happening on startup.

10.4-20969aa41251045a890a29e7b68ae5abae7d0a95

$ sql/mysqld --no-defaults --datadir=/tmp/${PWD##*/}-datadir --socket=/tmp/${PWD##*/}.sock --plugin-dir=${PWD}/mysql-test/var/plugins/ --verbose --skip-networking --innodb-buffer-pool-load-at-startup=0
2022-11-14 11:34:23 0 [Note] sql/mysqld (mysqld 10.4.28-MariaDB) starting as process 47533 ...
2022-11-14 11:34:23 0 [Note] InnoDB: Using Linux native AIO
2022-11-14 11:34:23 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2022-11-14 11:34:23 0 [Note] InnoDB: Uses event mutexes
2022-11-14 11:34:23 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-11-14 11:34:23 0 [Note] InnoDB: Number of pools: 1
2022-11-14 11:34:23 0 [Note] InnoDB: Using SSE2 crc32 instructions
2022-11-14 11:34:23 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2022-11-14 11:34:23 0 [Note] InnoDB: Completed initialization of buffer pool
2022-11-14 11:34:23 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2022-11-14 11:34:23 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2022-11-14 11:34:23 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-11-14 11:34:23 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2022-11-14 11:34:23 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2022-11-14 11:34:23 0 [Note] InnoDB: Waiting for purge to start
2022-11-14 11:34:23 0 [Note] InnoDB: 10.4.28 started; log sequence number 842732234; transaction id 40
2022-11-14 11:34:23 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-11-14 11:34:23 0 [Note] Reading of all Master_info entries succeeded
2022-11-14 11:34:23 0 [Note] Added new Master_info '' to hash table
2022-11-14 11:34:23 0 [Note] sql/mysqld: ready for connections.
Version: '10.4.28-MariaDB'  socket: '/tmp/build-mariadb-server-10.4.sock'  port: 0  Source distribution
 
$  mc -e 'show global status like "innodb_buffer_pool%"; select now()'
+---------------------------------------+----------+
| Variable_name                         | Value    |
+---------------------------------------+----------+
| Innodb_buffer_pool_dump_status        |          |
| Innodb_buffer_pool_load_status        |          |
| Innodb_buffer_pool_resize_status      |          |
| Innodb_buffer_pool_load_incomplete    | OFF      |
| Innodb_buffer_pool_pages_data         | 3114     |
| Innodb_buffer_pool_bytes_data         | 51019776 |
| Innodb_buffer_pool_pages_dirty        | 131      |
| Innodb_buffer_pool_bytes_dirty        | 2146304  |
| Innodb_buffer_pool_pages_flushed      | 0        |
| Innodb_buffer_pool_pages_free         | 3586     |
| Innodb_buffer_pool_pages_misc         | 1327     |
| Innodb_buffer_pool_pages_total        | 8027     |
| Innodb_buffer_pool_read_ahead_rnd     | 0        |
| Innodb_buffer_pool_read_ahead         | 1626     |
| Innodb_buffer_pool_read_ahead_evicted | 0        |
| Innodb_buffer_pool_read_requests      | 2827972  |
| Innodb_buffer_pool_reads              | 1358     |
| Innodb_buffer_pool_wait_free          | 0        |
| Innodb_buffer_pool_write_requests     | 715      |
+---------------------------------------+----------+
+---------------------+
| now()               |
+---------------------+
| 2022-11-14 11:34:24 |
+---------------------+
 
~/repos/build-mariadb-server-10.4 
$  mc -e 'show global status like "innodb_buffer_pool%"; select now()'
+---------------------------------------+----------+
| Variable_name                         | Value    |
+---------------------------------------+----------+
| Innodb_buffer_pool_dump_status        |          |
| Innodb_buffer_pool_load_status        |          |
| Innodb_buffer_pool_resize_status      |          |
| Innodb_buffer_pool_load_incomplete    | OFF      |
| Innodb_buffer_pool_pages_data         | 4687     |
| Innodb_buffer_pool_bytes_data         | 76791808 |
| Innodb_buffer_pool_pages_dirty        | 0        |
| Innodb_buffer_pool_bytes_dirty        | 0        |
| Innodb_buffer_pool_pages_flushed      | 131      |
| Innodb_buffer_pool_pages_free         | 1349     |
| Innodb_buffer_pool_pages_misc         | 1991     |
| Innodb_buffer_pool_pages_total        | 8027     |
| Innodb_buffer_pool_read_ahead_rnd     | 0        |
| Innodb_buffer_pool_read_ahead         | 2420     |
| Innodb_buffer_pool_read_ahead_evicted | 0        |
| Innodb_buffer_pool_read_requests      | 4281798  |
| Innodb_buffer_pool_reads              | 2137     |
| Innodb_buffer_pool_wait_free          | 0        |
| Innodb_buffer_pool_write_requests     | 715      |
+---------------------------------------+----------+
+---------------------+
| now()               |
+---------------------+
| 2022-11-14 11:34:25 |
+---------------------+
 
~/repos/build-mariadb-server-10.4 
$  mc -e 'show global status like "innodb_buffer_pool%"; select now()'
+---------------------------------------+----------+
| Variable_name                         | Value    |
+---------------------------------------+----------+
| Innodb_buffer_pool_dump_status        |          |
| Innodb_buffer_pool_load_status        |          |
| Innodb_buffer_pool_resize_status      |          |
| Innodb_buffer_pool_load_incomplete    | OFF      |
| Innodb_buffer_pool_pages_data         | 5525     |
| Innodb_buffer_pool_bytes_data         | 90521600 |
| Innodb_buffer_pool_pages_dirty        | 0        |
| Innodb_buffer_pool_bytes_dirty        | 0        |
| Innodb_buffer_pool_pages_flushed      | 131      |
| Innodb_buffer_pool_pages_free         | 141      |
| Innodb_buffer_pool_pages_misc         | 2361     |
| Innodb_buffer_pool_pages_total        | 8027     |
| Innodb_buffer_pool_read_ahead_rnd     | 0        |
| Innodb_buffer_pool_read_ahead         | 2881     |
| Innodb_buffer_pool_read_ahead_evicted | 0        |
| Innodb_buffer_pool_read_requests      | 5195230  |
| Innodb_buffer_pool_reads              | 2514     |
| Innodb_buffer_pool_wait_free          | 0        |
| Innodb_buffer_pool_write_requests     | 715      |
+---------------------------------------+----------+
+---------------------+
| now()               |
+---------------------+
| 2022-11-14 11:34:26 |
+---------------------+

After stabilization:

$  mc -e 'show global status like "innodb_buffer_pool%"; select now()'
+---------------------------------------+----------+
| Variable_name                         | Value    |
+---------------------------------------+----------+
| Innodb_buffer_pool_dump_status        |          |
| Innodb_buffer_pool_load_status        |          |
| Innodb_buffer_pool_resize_status      |          |
| Innodb_buffer_pool_load_incomplete    | OFF      |
| Innodb_buffer_pool_pages_data         | 4931     |
| Innodb_buffer_pool_bytes_data         | 80789504 |
| Innodb_buffer_pool_pages_dirty        | 0        |
| Innodb_buffer_pool_bytes_dirty        | 0        |
| Innodb_buffer_pool_pages_flushed      | 131      |
| Innodb_buffer_pool_pages_free         | 1        |
| Innodb_buffer_pool_pages_misc         | 3095     |
| Innodb_buffer_pool_pages_total        | 8027     |
| Innodb_buffer_pool_read_ahead_rnd     | 0        |
| Innodb_buffer_pool_read_ahead         | 18277    |
| Innodb_buffer_pool_read_ahead_evicted | 0        |
| Innodb_buffer_pool_read_requests      | 31143231 |
| Innodb_buffer_pool_reads              | 13594    |
| Innodb_buffer_pool_wait_free          | 0        |
| Innodb_buffer_pool_write_requests     | 715      |
+---------------------------------------+----------+
+---------------------+
| now()               |
+---------------------+
| 2022-11-14 11:37:52 |
+---------------------+
 
~/repos/build-mariadb-server-10.4 
$  time client/mysqldump -S /tmp/build-mariadb-server-10.4.sock test > /dev/null
 
real	0m5.776s
user	0m0.575s
sys	0m0.038s
 
~/repos/build-mariadb-server-10.4 
$  mc -e 'show global status like "innodb_buffer_pool%"; select now()'
+---------------------------------------+----------+
| Variable_name                         | Value    |
+---------------------------------------+----------+
| Innodb_buffer_pool_dump_status        |          |
| Innodb_buffer_pool_load_status        |          |
| Innodb_buffer_pool_resize_status      |          |
| Innodb_buffer_pool_load_incomplete    | OFF      |
| Innodb_buffer_pool_pages_data         | 5878     |
| Innodb_buffer_pool_bytes_data         | 96305152 |
| Innodb_buffer_pool_pages_dirty        | 0        |
| Innodb_buffer_pool_bytes_dirty        | 0        |
| Innodb_buffer_pool_pages_flushed      | 131      |
| Innodb_buffer_pool_pages_free         | 0        |
| Innodb_buffer_pool_pages_misc         | 2149     |
| Innodb_buffer_pool_pages_total        | 8027     |
| Innodb_buffer_pool_read_ahead_rnd     | 0        |
| Innodb_buffer_pool_read_ahead         | 35647    |
| Innodb_buffer_pool_read_ahead_evicted | 0        |
| Innodb_buffer_pool_read_requests      | 41182641 |
| Innodb_buffer_pool_reads              | 14523    |
| Innodb_buffer_pool_wait_free          | 0        |
| Innodb_buffer_pool_write_requests     | 715      |
+---------------------------------------+----------+
+---------------------+
| now()               |
+---------------------+
| 2022-11-14 11:38:06 |
+---------------------+

High number of Innodb_buffer_pool_read_ahead compared to Innodb_buffer_pool_reads to satisfy the mysqldump.

Comment by Daniel Black [ 2022-11-15 ]

10.6 tracing from a select sum(seq) from tbl

Thread 9 "mysqld" hit Breakpoint 1, buf_read_ahead_linear (page_id=page_id@entry={m_id = 21474836607}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:640
640	    count+= buf_read_page_low(&err, space, false, ibuf_mode, new_low, zip_size,
 
->
 
buf_read_page_low (err=err@entry=0x7ffff6c6aa7c, space=0x2525b18, sync=false, mode=mode@entry=132, page_id={m_id = 21474836547}, zip_size=zip_size@entry=0, unzip=<optimized out>) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:271
 
gets to:
 
(gdb) 
300		bpage = buf_page_init_for_read(mode, page_id, zip_size, unzip);
 
(always returns null here, so no pages are read ahead)

10.6 path through buf_page_init_for_read

buf_page_init_for_read (mode=mode@entry=132, page_id=page_id@entry={m_id = 21474836548}, zip_size=zip_size@entry=0, unzip=false) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:99
99	  mtr_t mtr;
(gdb) info locals
mtr = {m_last = 0x80, m_last_offset = 128, m_log_mode = 0, m_modifications = 0, m_made_dirty = 0, m_inside_ibuf = 0, m_trim_pages = 0, m_memo = {m_heap = 0x0, m_list = {<ilist<mtr_buf_t::block_t, void>> = {sentinel_ = {next = 0x7ffff6c6a3f8, prev = 0x7ffff6c6a3f8}}, size_ = 1}, m_size = 0, m_first_block = {<ilist_node<void>> = {next = 0x7ffff6c6a3d8, prev = 0x7ffff6c6a3d8}, m_data = "\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000"..., m_used = 0}}, m_log = {m_heap = 0x0, m_list = {<ilist<mtr_buf_t::block_t, void>> = {sentinel_ = {next = 0x7ffff6c6a628, prev = 0x7ffff6c6a628}}, size_ = 1}, m_size = 0, m_first_block = {<ilist_node<void>> = {next = 0x7ffff6c6a608, prev = 0x7ffff6c6a608}, m_data = '\000' <repeats 33 times>, "\227\375*W\246\261\005\000\000\000\000\000\000\000\000\000@\000\000\000\000\000\000\000\247\306\366\377\177\000\000\216@\321\000\000\000\000\000\001\000\000\000\000\000\000\000\200\354\001\254\377\177\000\000\023\000\005\000\350\373\364m\002mysql\000index_stats", '\000' <repeats 46 times>, "\220\247\306\366\377\177\000\000h\\R\002\000\000\000\000\340\354\340\323\377\177\000\000\030[R\002\000\000\000\000\001\316\001\254\377\177\000\000"..., m_used = 0}}, m_user_space = 0x0, m_commit_lsn = 0, m_freed_space = 0x0, m_freed_pages = 0x0}
bpage = <optimized out>
block = <optimized out>
chain = <optimized out>
hash_page = <optimized out>
(gdb) p/x page_id
$8 = {m_id = 0x500000044}
(gdb) n
101	  if (mode == BUF_READ_IBUF_PAGES_ONLY)
(gdb) p mode
$9 = 132
(gdb) n
118	  if (!zip_size || unzip || recv_recovery_is_on())
(gdb) p zip_size 
$10 = 0
(gdb) p unzip
$11 = false
(gdb) s
120	    block= buf_LRU_get_free_block(false);
(gdb) n
[Thread 0x7fffd0e00640 (LWP 244150) exited]
121	    block->initialise(page_id, zip_size, buf_page_t::READ_FIX);
(gdb) p block
$12 = (buf_block_t *) 0x7fffd3e0ed80
(gdb) n
124	    block->page.lock.x_lock(true);
(gdb) p block
$13 = (buf_block_t *) 0x7fffd3e0ed80
(gdb) n
127	  buf_pool_t::hash_chain &chain= buf_pool.page_hash.cell_get(page_id.fold());
(gdb) 
129	  mysql_mutex_lock(&buf_pool.mutex);
(gdb) 
135	    if (block)
(gdb) p block
$14 = (buf_block_t *) 0x7fffd3e0ed80
(gdb) n
137	      block->page.lock.x_unlock(true);
(gdb) n
139	      buf_LRU_block_free_non_file_page(block);
(gdb) 
233	  mysql_mutex_unlock(&buf_pool.mutex);
(gdb) 
101	  if (mode == BUF_READ_IBUF_PAGES_ONLY)
(gdb) 
235	  if (mode == BUF_READ_IBUF_PAGES_ONLY)
(gdb) info locals
mtr = {m_last = 0x80, m_last_offset = 128, m_log_mode = 0, m_modifications = 0, m_made_dirty = 0, m_inside_ibuf = 0, m_trim_pages = 0, m_memo = {m_heap = 0x0, m_list = {<ilist<mtr_buf_t::block_t, void>> = {sentinel_ = {next = 0x7ffff6c6a3f8, prev = 0x7ffff6c6a3f8}}, size_ = 1}, m_size = 0, m_first_block = {<ilist_node<void>> = {next = 0x7ffff6c6a3d8, prev = 0x7ffff6c6a3d8}, m_data = "\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000"..., m_used = 0}}, m_log = {m_heap = 0x0, m_list = {<ilist<mtr_buf_t::block_t, void>> = {sentinel_ = {next = 0x7ffff6c6a628, prev = 0x7ffff6c6a628}}, size_ = 1}, m_size = 0, m_first_block = {<ilist_node<void>> = {next = 0x7ffff6c6a608, prev = 0x7ffff6c6a608}, m_data = '\000' <repeats 33 times>, "\227\375*W\246\261\005\000\000\000\000\000\000\000\000\000@\000\000\000\000\000\000\000\247\306\366\377\177\000\000\216@\321\000\000\000\000\000\001\000\000\000\000\000\000\000\200\354\001\254\377\177\000\000\023\000\005\000\350\373\364m\002mysql\000index_stats", '\000' <repeats 46 times>, "\220\247\306\366\377\177\000\000h\\R\002\000\000\000\000\340\354\340\323\377\177\000\000\030[R\002\000\000\000\000\001\316\001\254\377\177\000\000"..., m_used = 0}}, m_user_space = 0x0, m_commit_lsn = 0, m_freed_space = 0x0, m_freed_pages = 0x0}
bpage = 0x0
block = <optimized out>
chain = @0x22cd6f0: {first = 0x7fffd3e0c800}
hash_page = <optimized out>
(gdb) finish
Run till exit from #0  buf_page_init_for_read (mode=mode@entry=132, page_id=page_id@entry={m_id = 21474836548}, zip_size=zip_size@entry=0, unzip=<optimized out>) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:235
buf_read_page_low (err=err@entry=0x7ffff6c6aa7c, space=0x2525b18, sync=false, mode=mode@entry=132, page_id={m_id = 21474836548}, zip_size=zip_size@entry=0, unzip=false) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:302
302		if (bpage == NULL) {
Value returned is $15 = (buf_page_t *) 0x0

10.5, saw no count on Innodb_buffer_pool_read_ahead on the same query.

10.4 was returning a non-null bpage at this same point.

Comment by Daniel Black [ 2022-11-16 ]

Well 10.4 comes up with a decent low -> high range to begin the read ahead:

Thread 32 "mysqld" hit Breakpoint 1, buf_read_ahead_linear (page_id={m_space = 7, m_page_no = 64}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.4/storage/innobase/buf/buf0rea.cc:570
570		for (i = low; i < high; i++) {
low = 64
high = 128

10.4, not always completely sane

Thread 32 "mysqld" hit Breakpoint 1, buf_read_ahead_linear (page_id={m_space = 7, m_page_no = 127}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.4/storage/innobase/buf/buf0rea.cc:570
low = 64
high = 128

(I'd expect the low=128 at least, the current page 127 is already read, and previous pages aren't "ahead" unless descending). I'm not sure I understand the significance of the buf_read_ahead_linear_area mask.

But at least its not 10.6, where a read ahead covers the previous pages only (this occurred during create table t1 as select seq from seq_1_to_10000000;) :

10.6

Thread 95 "mysqld" hit Breakpoint 2, buf_read_ahead_linear (page_id={m_id = 63}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:631
631	  for (ulint ibuf_mode= ibuf ? BUF_READ_IBUF_PAGES_ONLY : BUF_READ_ANY_PAGE;
new_low = {m_id = 0}
new_high_1 = {m_id = 63}
 
Thread 95 "mysqld" hit Breakpoint 2, buf_read_ahead_linear (page_id={m_id = 255}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:631
631	  for (ulint ibuf_mode= ibuf ? BUF_READ_IBUF_PAGES_ONLY : BUF_READ_ANY_PAGE;
new_low = {m_id = 192}
new_high_1 = {m_id = 255}

Or a basic case that never hits:

10.6 preproducer

MariaDB [test]> create table t1 as select seq from seq_1_to_10000000;
Query OK, 10000000 rows affected (8 min 20.608 sec)
Records: 10000000  Duplicates: 0  Warnings: 0
 
MariaDB [test]> set global innodb_fast_shutdown=0;
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [test]> shutdown;
Query OK, 0 rows affected (0.001 sec)
 
MariaDB [test]> show global status like 'innodb_buffer_pool%';
ERROR 2006 (HY000): Server has gone away
No connection. Trying to reconnect...
Connection id:    3
Current database: test
 
+-----------------------------------------+---------+
| Variable_name                           | Value   |
+-----------------------------------------+---------+
...
| Innodb_buffer_pool_pages_data           | 402     |
...
| Innodb_buffer_pool_pages_total          | 8089    |
..
| Innodb_buffer_pool_read_ahead_rnd       | 0       |
| Innodb_buffer_pool_read_ahead           | 0       |
| Innodb_buffer_pool_read_ahead_evicted   | 0       |
| Innodb_buffer_pool_read_requests        | 1163    |
| Innodb_buffer_pool_reads                | 271     |
+-----------------------------------------+---------+
25 rows in set (0.015 sec)
 
MariaDB [test]> select sum(seq) from t1;
+----------------+
| sum(seq)       |
+----------------+
| 50000005000000 |
+----------------+
1 row in set (54.859 sec)
 
MariaDB [test]> show global status like 'innodb_buffer_pool%';
+-----------------------------------------+-----------+
| Variable_name                           | Value     |
+-----------------------------------------+-----------+
..
| Innodb_buffer_pool_pages_data           | 8089      |
..
| Innodb_buffer_pool_read_ahead_rnd       | 0         |
| Innodb_buffer_pool_read_ahead           | 0         |
| Innodb_buffer_pool_read_ahead_evicted   | 1         |
| Innodb_buffer_pool_read_requests        | 10022534  |
| Innodb_buffer_pool_reads                | 21641     |
+-----------------------------------------+-----------+
25 rows in set (0.003 sec)

10.6 restart

$ gdb --args sql/mysqld --no-defaults --datadir=$HOME/mariadb-datadir  --socket=/tmp/${PWD##*/}.sock --plugin-dir=${PWD}/mysql-test/var/plugins/ --verbose --skip-networking --innodb-buffer-pool-load-at-startup=0 --innodb_read_ahead_threshold=4
..
(gdb) b storage/innobase/buf/buf0rea.cc:631
Breakpoint 1 at 0x152af8c: file /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc, line 631.
(gdb) r

(no breakpoints hit), not a single page matched the criteria to attempt to read ahead.

Comment by Daniel Black [ 2022-11-16 ]

10.4 ranges of reads low -> high map to the page_id in question:

10.4 a sane read ahead setup

Thread 32 "mysqld" hit Breakpoint 1, buf_read_ahead_linear (page_id={m_space = 7, m_page_no = 0}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.4/storage/innobase/buf/buf0rea.cc:570
570		for (i = low; i < high; i++) {
low = 0
high = 64

10.4, but not always completely sane

Thread 32 "mysqld" hit Breakpoint 1, buf_read_ahead_linear (page_id={m_space = 7, m_page_no = 127}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.4/storage/innobase/buf/buf0rea.cc:570
low = 64
high = 128

As a forward read ahead, this shoudl start at 128 (127 already read by the time this is called), and its read ahead, not behind.

However 10.6 is always trying to read behind (this was during create table t1 as select seq from seq_1_to_10000000;):

10.6

Thread 26 "mysqld" hit Breakpoint 2, buf_read_ahead_linear (page_id={m_id = 63}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:631
631	  for (ulint ibuf_mode= ibuf ? BUF_READ_IBUF_PAGES_ONLY : BUF_READ_ANY_PAGE;
new_low = {m_id = 0}
new_high_1 = {m_id = 63}
 
or:
 
Thread 26 "mysqld" hit Breakpoint 2, buf_read_ahead_linear (page_id={m_id = 255}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:631
631	  for (ulint ibuf_mode= ibuf ? BUF_READ_IBUF_PAGES_ONLY : BUF_READ_ANY_PAGE;
new_low = {m_id = 192}
new_high_1 = {m_id = 255}

A basic reproducer:

should produce read aheads

MariaDB [test]> create table t1 as select seq from seq_1_to_10000000;
Query OK, 10000000 rows affected (8 min 20.608 sec)
Records: 10000000  Duplicates: 0  Warnings: 0
 
MariaDB [test]> set global innodb_fast_shutdown=0;
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [test]> shutdown;
Query OK, 0 rows affected (0.001 sec)
 
MariaDB [test]> show global status like 'innodb_buffer_pool%';
ERROR 2006 (HY000): Server has gone away
No connection. Trying to reconnect...
Connection id:    3
Current database: test
 
+-----------------------------------------+---------+
| Variable_name                           | Value   |
+-----------------------------------------+---------+
| Innodb_buffer_pool_pages_data           | 402     |
| Innodb_buffer_pool_pages_dirty          | 0       |
| Innodb_buffer_pool_pages_free           | 7687    |
| Innodb_buffer_pool_pages_latched        | 144     |
| Innodb_buffer_pool_pages_total          | 8089    |
| Innodb_buffer_pool_read_ahead_rnd       | 0       |
| Innodb_buffer_pool_read_ahead           | 0       |
| Innodb_buffer_pool_read_ahead_evicted   | 0       |
| Innodb_buffer_pool_read_requests        | 1163    |
| Innodb_buffer_pool_reads                | 271     |
+-----------------------------------------+---------+
25 rows in set (0.015 sec)
 
MariaDB [test]> select sum(seq) from t1;
+----------------+
| sum(seq)       |
+----------------+
| 50000005000000 |
+----------------+
1 row in set (54.859 sec)
 
MariaDB [test]> show global status like 'innodb_buffer_pool%';
+-----------------------------------------+-----------+
| Variable_name                           | Value     |
+-----------------------------------------+-----------+
| Innodb_buffer_pool_pages_data           | 8089      |
| Innodb_buffer_pool_read_ahead_rnd       | 0         |
| Innodb_buffer_pool_read_ahead           | 0         |
| Innodb_buffer_pool_read_ahead_evicted   | 1         |
| Innodb_buffer_pool_read_requests        | 10022534  |
| Innodb_buffer_pool_reads                | 21641     |
+-----------------------------------------+-----------+

10.6 restart executed with breakpoint on buf0rea.cc, line 631 failed to trigger

(gdb) b storage/innobase/buf/buf0rea.cc:631
Breakpoint 1 at 0x152af8c: file /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc, line 631.
(gdb) r
Starting program: /home/dan/repos/build-mariadb-server-10.6-debug/sql/mysqld --no-defaults --datadir=/home/dan/mariadb-datadir --socket=/tmp/build-mariadb-server-10.6-debug.sock --plugin-dir=/home/dan/repos/build-mariadb-server-10.6-debug/mysql-test/var/plugins/ --verbose --skip-networking --innodb-buffer-pool-load-at-startup=0 --innodb_read_ahead_threshold=4

Comment by Daniel Black [ 2022-11-16 ]

But small bit of inspiration is buf_read_ahead_linear is being called from the wrong spot. A more logical place to call it would be btr_pcur_move_to_next_page / btr_pcur_move_to_prev.

The advantages here are:

  • At this point the direction of read ahead is obvious (by function name)
  • there's a page_id on cursor->btr_cur->page_cur->block.page
  • one of the cursor structures could keep track of the count of consecutive reads
  • further to the above the fuzzy walking the tree and looking at access time heuristics can be dropped

disadvantages:

  • Those applications doing ref calls on incremental elements in a table won't get a read ahead detected.
  • No readahead in buffer pool loads (which would be a rather easy fix)

Is it known at list layer if a PK lookup is also required by the query? if so a secondary key look ahead could prefetch the PK clustered pages as well.

Maybe too much for a 10.5 fix, but useful?

Comment by Marko Mäkelä [ 2023-01-31 ]

The B-tree cursors were refactored in MDEV-30400. I can think of an additional point where read-ahead could be initiated. In btr_cur_t::search_leaf() and btr_cur_t::open_leaf() when reaching level 1 (a page that points to the leaf page), we will know all the child page numbers. If the end of the key range was made available at this point, we could read ahead all child pages that are withing the search range. Currently, we have no knowledge of the range here.

In addition, whenever we reach a leaf page, we might post a read-ahead request for its right sibling page (or left, in the rare case of reverse range scan). Also for this, it would be useful to know the other end of the desired key range, so that we can avoid unnecessary read-ahead when no access to the sibling page will be needed.

Comment by Marko Mäkelä [ 2023-01-31 ]

I believe that some progress on MDEV-16402 will be needed before we can reliably get the end-of-range conditions passed to the B-tree traversal.

Comment by Marko Mäkelä [ 2023-03-03 ]

MDEV-26055 might provide some improvement here, by periodically making some space available in buf_pool.free.

Comment by Marko Mäkelä [ 2023-04-25 ]

In a chat, psergei claimed the following:

You can overload handler::read_range_first() and you will see the endpoints of the range being scanned.

If that works, MDEV-16402 should not block this ticket.

Comment by Marko Mäkelä [ 2023-05-09 ]

The following test shows no read-ahead on 10.5 for me:

--source include/have_innodb.inc
--source include/have_sequence.inc
create table t1 engine=innodb as select seq from seq_1_to_10000000;
let $datadir=`select @@datadir`;
SET GLOBAL innodb_fast_shutdown=0;
--source include/shutdown_mysqld.inc
remove_file $datadir/ib_buffer_pool;
--source include/start_mysqld.inc
show global status like 'innodb_buffer_pool_read%';
sleep 10;
show global status like 'innodb_buffer_pool_read%';

The invocation was as follows:

./mtr --mysqld=--innodb-{buffer-pool,log-file}-size=1g main.name_of_test

10.5 d821fd7fab2723b84f774aaa4882acb86b35769d

show global status like 'innodb_buffer_pool_read%';
Variable_name	Value
Innodb_buffer_pool_read_ahead_rnd	0
Innodb_buffer_pool_read_ahead	0
Innodb_buffer_pool_read_ahead_evicted	0
Innodb_buffer_pool_read_requests	932
Innodb_buffer_pool_reads	155
show global status like 'innodb_buffer_pool_read%';
Variable_name	Value
Innodb_buffer_pool_read_ahead_rnd	0
Innodb_buffer_pool_read_ahead	0
Innodb_buffer_pool_read_ahead_evicted	0
Innodb_buffer_pool_read_requests	932
Innodb_buffer_pool_reads	155

On 10.4, the numbers are similar:

10.4 84b9fc25a29b94a37eb9d5ac2e2c0f75c0efafda

show global status like 'innodb_buffer_pool_read%';
Variable_name	Value
Innodb_buffer_pool_read_ahead_rnd	0
Innodb_buffer_pool_read_ahead	0
Innodb_buffer_pool_read_ahead_evicted	0
Innodb_buffer_pool_read_requests	1103
Innodb_buffer_pool_reads	161
show global status like 'innodb_buffer_pool_read%';
Variable_name	Value
Innodb_buffer_pool_read_ahead_rnd	0
Innodb_buffer_pool_read_ahead	0
Innodb_buffer_pool_read_ahead_evicted	0
Innodb_buffer_pool_read_requests	1103
Innodb_buffer_pool_reads	161

If I remove the SET GLOBAL innodb_fast_shutdown=0, then the purge of transaction history (MDEV-12288) will kick in and trigger read-ahead in 10.4, like danblack noted:

10.4 84b9fc25a29b94a37eb9d5ac2e2c0f75c0efafda

# restart
show global status like 'innodb_buffer_pool_read%';
Variable_name	Value
Innodb_buffer_pool_read_ahead_rnd	0
Innodb_buffer_pool_read_ahead	0
Innodb_buffer_pool_read_ahead_evicted	0
Innodb_buffer_pool_read_requests	341640
Innodb_buffer_pool_reads	508
show global status like 'innodb_buffer_pool_read%';
Variable_name	Value
Innodb_buffer_pool_read_ahead_rnd	0
Innodb_buffer_pool_read_ahead	6673
Innodb_buffer_pool_read_ahead_evicted	0
Innodb_buffer_pool_read_requests	11977134
Innodb_buffer_pool_reads	5182

In 10.4, all read-ahead is triggered by the following in buf_page_get_low():

	if (mode != BUF_PEEK_IF_IN_POOL && !access_time) {
		/* In the case of a first access, try to apply linear
		read-ahead */
 
		buf_read_ahead_linear(page_id, zip_size, ibuf_inside(mtr));
	}

In 10.5 there will be no read-ahead reported, just like danblack noted. The condition for detecting the first access to a page is apparently broken and will need to be revised.

Comment by Marko Mäkelä [ 2023-05-09 ]

The check in the 10.5 buf_read_ahead_linear() that enough past page requests have been in the expected order is filtering out all requests. That code was refactored in MDEV-15053.

Comment by Marko Mäkelä [ 2023-05-09 ]

It is more complex than that. In 10.4, buf_read_ahead_linear() is being invoked in descending order of page number, because row_purge_reset_trx_id() is being invoked in descending order of test.t1.DB_ROW_ID. It looks like the LIFO purge order was replaced with FIFO in this commit of MDEV-16678.

I think that the FIFO purge order makes sense, so I would not touch that. To make 10.4 use the FIFO order for test purposes, I applied the following patch:

diff --git a/storage/innobase/row/row0purge.cc b/storage/innobase/row/row0purge.cc
index 6ed03f0eabb..d30ca77f30f 100644
--- a/storage/innobase/row/row0purge.cc
+++ b/storage/innobase/row/row0purge.cc
@@ -1469,7 +1469,8 @@ row_purge_step(
 		trx_purge_rec_t*purge_rec;
 
 		purge_rec = static_cast<trx_purge_rec_t*>(
-			ib_vector_pop(node->undo_recs));
+			ib_vector_get(node->undo_recs, 0));
+		ib_vector_remove(node->undo_recs, *(void**)purge_rec);
 
 		node->roll_ptr = purge_rec->roll_ptr;
 

Also with this patch present, 10.4 will trigger some linear read-ahead:

show global status like 'innodb_buffer_pool_read%';
Variable_name	Value
Innodb_buffer_pool_read_ahead_rnd	0
Innodb_buffer_pool_read_ahead	0
Innodb_buffer_pool_read_ahead_evicted	0
Innodb_buffer_pool_read_requests	294129
Innodb_buffer_pool_reads	461
show global status like 'innodb_buffer_pool_read%';
Variable_name	Value
Innodb_buffer_pool_read_ahead_rnd	0
Innodb_buffer_pool_read_ahead	2560
Innodb_buffer_pool_read_ahead_evicted	0
Innodb_buffer_pool_read_requests	4128875
Innodb_buffer_pool_reads	1755

The first linear read-ahead was triggered when page 127 of the table t1 was being accessed.

Next, I will try to see where the 10.4 and 10.5 buf_read_ahead_linear() differ. The intention seems to be to enable read-ahead when in the previous 64-page extent, at most 8 pages had not been accessed in ascending order. This condition is never met in 10.5.

Comment by Marko Mäkelä [ 2023-05-09 ]

The following condition appears to be wrong:

diff --git a/storage/innobase/buf/buf0rea.cc b/storage/innobase/buf/buf0rea.cc
--- a/storage/innobase/buf/buf0rea.cc
+++ b/storage/innobase/buf/buf0rea.cc
@@ -582,7 +582,7 @@ buf_read_ahead_linear(const page_id_t page_id, ulint zip_size, bool ibuf)
 
   /* We will check that almost all pages in the area have been accessed
   in the desired order. */
-  const bool descending= page_id == low;
+  const bool descending= page_id != low;
 
   if (!descending && page_id != high_1)
     /* This is not a border page of the area */

There also is something wrong with buf_read_page_low(), because it always returns false.

Comment by Marko Mäkelä [ 2023-05-09 ]

There were a couple of off-by-one bugs too. I am really sorry for this. In MDEV-15053, I had removed one buffer page lookup, for the current page, and instead added a condition if (i == page_id) inside the loop.

diff --git a/storage/innobase/buf/buf0rea.cc b/storage/innobase/buf/buf0rea.cc
index ff163f74b08..26a84939306 100644
--- a/storage/innobase/buf/buf0rea.cc
+++ b/storage/innobase/buf/buf0rea.cc
@@ -582,7 +582,7 @@ buf_read_ahead_linear(const page_id_t page_id, ulint zip_size, bool ibuf)
 
   /* We will check that almost all pages in the area have been accessed
   in the desired order. */
-  const bool descending= page_id == low;
+  const bool descending= page_id != low;
 
   if (!descending && page_id != high_1)
     /* This is not a border page of the area */
@@ -612,7 +612,7 @@ buf_read_ahead_linear(const page_id_t page_id, ulint zip_size, bool ibuf)
                                uint32_t{buf_pool.read_ahead_area});
   page_id_t new_low= low, new_high_1= high_1;
   unsigned prev_accessed= 0;
-  for (page_id_t i= low; i != high_1; ++i)
+  for (page_id_t i= low; i <= high_1; ++i)
   {
     const ulint fold= i.fold();
     page_hash_latch *hash_lock= buf_pool.page_hash.lock<false>(fold);
@@ -647,12 +647,21 @@ buf_read_ahead_linear(const page_id_t page_id, ulint zip_size, bool ibuf)
       if (prev == FIL_NULL || next == FIL_NULL)
         goto hard_fail;
       page_id_t id= page_id;
-      if (descending && next - 1 == page_id.page_no())
-        id.set_page_no(prev);
-      else if (!descending && prev + 1 == page_id.page_no())
-        id.set_page_no(next);
+      if (descending)
+      {
+        if (id == high_1)
+          ++id;
+        else if (next - 1 != page_id.page_no())
+          goto hard_fail;
+        else
+          id.set_page_no(prev);
+      }
       else
-        goto hard_fail; /* Successor or predecessor not in the right order */
+      {
+        if (prev + 1 != page_id.page_no())
+          goto hard_fail;
+        id.set_page_no(next);
+      }
 
       new_low= id - (id.page_no() % buf_read_ahead_area);
       new_high_1= new_low + (buf_read_ahead_area - 1);
@@ -693,7 +702,7 @@ buf_read_ahead_linear(const page_id_t page_id, ulint zip_size, bool ibuf)
   /* If we got this far, read-ahead can be sensible: do it */
   count= 0;
   for (ulint ibuf_mode= ibuf ? BUF_READ_IBUF_PAGES_ONLY : BUF_READ_ANY_PAGE;
-       new_low != new_high_1; ++new_low)
+       new_low <= new_high_1; ++new_low)
   {
     if (ibuf_bitmap_page(new_low, zip_size))
       continue;

With this, I am seeing linear read-ahead taking place:

10.5 d821fd7fab2723b84f774aaa4882acb86b35769d with patch

show global status like 'innodb_buffer_pool_read%';
Variable_name	Value
Innodb_buffer_pool_read_ahead_rnd	0
Innodb_buffer_pool_read_ahead	0
Innodb_buffer_pool_read_ahead_evicted	0
Innodb_buffer_pool_read_requests	403049
Innodb_buffer_pool_reads	526
show global status like 'innodb_buffer_pool_read%';
Variable_name	Value
Innodb_buffer_pool_read_ahead_rnd	0
Innodb_buffer_pool_read_ahead	11648
Innodb_buffer_pool_read_ahead_evicted	0
Innodb_buffer_pool_read_requests	21995204
Innodb_buffer_pool_reads	5955

Comment by Marko Mäkelä [ 2023-05-10 ]

I reviewed the logic of buf_read_ahead_random(), and I do not notice any regression there. I see that the default is innodb_random_read_ahead=OFF. Only linear read-ahead is enabled by default. It would be disabled if read_ahead_threshold=0. The default value is 56 and the maximum is 64.

The next step is to test with innodb_random_read_ahead=ON and innodb_read_ahead_threshold=0 to see if the random read-ahead will kick in.

Comment by Marko Mäkelä [ 2023-05-10 ]

Random read ahead seems to be fine. I added a wait for everything to be purged, so that we can compare the total execution time between 10.4 and 10.5.

--source include/have_innodb.inc
--source include/have_sequence.inc
create table t1 engine=innodb as select seq from seq_1_to_10000000;
let $datadir=`select @@datadir`;
--source include/shutdown_mysqld.inc
remove_file $datadir/ib_buffer_pool;
--source include/start_mysqld.inc
show global status like 'innodb_buffer_pool_read%';
SET GLOBAL innodb_max_purge_lag_wait=0;
show global status like 'innodb_buffer_pool_read%';
DROP TABLE t1;

./mtr --mysqld=--innodb-{buffer-pool,log-file}-size=1g --mysqld=--innodb-read-ahead-threshold=0 --mysqld=--innodb-random-read-ahead=ON --innodb-purge-threas=32 main.name_of_test

10.4 84b9fc25a29b94a37eb9d5ac2e2c0f75c0efafda

show global status like 'innodb_buffer_pool_read%';
Variable_name	Value
Innodb_buffer_pool_read_ahead_rnd	283
Innodb_buffer_pool_read_ahead	0
Innodb_buffer_pool_read_ahead_evicted	0
Innodb_buffer_pool_read_requests	292045
Innodb_buffer_pool_reads	515
SET GLOBAL innodb_max_purge_lag_wait=0;
show global status like 'innodb_buffer_pool_read%';
Variable_name	Value
Innodb_buffer_pool_read_ahead_rnd	24883
Innodb_buffer_pool_read_ahead	0
Innodb_buffer_pool_read_ahead_evicted	0
Innodb_buffer_pool_read_requests	31552786
Innodb_buffer_pool_reads	32008
Spent 102.762 of 110 seconds executing testcases

10.5 d821fd7fab2723b84f774aaa4882acb86b35769d

show global status like 'innodb_buffer_pool_read%';
Variable_name	Value
Innodb_buffer_pool_read_ahead_rnd	383
Innodb_buffer_pool_read_ahead	0
Innodb_buffer_pool_read_ahead_evicted	0
Innodb_buffer_pool_read_requests	408332
Innodb_buffer_pool_reads	640
SET GLOBAL innodb_max_purge_lag_wait=0;
show global status like 'innodb_buffer_pool_read%';
Variable_name	Value
Innodb_buffer_pool_read_ahead_rnd	24883
Innodb_buffer_pool_read_ahead	0
Innodb_buffer_pool_read_ahead_evicted	0
Innodb_buffer_pool_read_requests	40023169
Innodb_buffer_pool_reads	32001
Spent 64.291 of 67 seconds executing testcases

We can see that 10.5 was a little more eager to issue random read-ahead requests during the test. Despite there being 25% more read requests from the buffer pool (we should probably find the reason why), 10.5 completed the test much faster. During the wait for purge, 10.4 was using some 660% of CPU, despite there being 32 threads available. 10.5 was using only 100% of CPU during that phase, possibly due to MDEV-16678 and related changes. If there had been DML into multiple tables or multiple partitions of a table, it should have been able to use more threads. Starting with 10.6, enabling the MDEV-515 bulk load would avoid the need of purging individual rows altogether.

Comment by Matthias Leich [ 2023-05-11 ]

origin/bb-10.5-MDEV-29967 9e1ff0435dfa31c16d3138e802971dda8b25e3b1 2023-05-10T12:43:17+03:00
behaved well in RQG testing.

Comment by Marko Mäkelä [ 2023-05-11 ]

In 10.6, the test would not trigger any read-ahead, because the CREATE…SELECT statement will be executed with table-level and not row-level undo logging (MDEV-515). It would complete for me in about 31 seconds. A test with row-level undo logging will start like this:

create table t1(a int primary key) engine=innodb;
insert into t1 select seq from seq_1_to_10000000;

10.6 c271057288f71746d1816824f338f2d9c47f67c1

show global status like 'innodb_buffer_pool_read%';
Variable_name	Value
Innodb_buffer_pool_read_ahead_rnd	0
Innodb_buffer_pool_read_ahead	0
Innodb_buffer_pool_read_ahead_evicted	0
Innodb_buffer_pool_read_requests	5903
Innodb_buffer_pool_reads	413
SET GLOBAL innodb_max_purge_lag_wait=0;
show global status like 'innodb_buffer_pool_read%';
Variable_name	Value
Innodb_buffer_pool_read_ahead_rnd	0
Innodb_buffer_pool_read_ahead	14720
Innodb_buffer_pool_read_ahead_evicted	0
Innodb_buffer_pool_read_requests	670313
Innodb_buffer_pool_reads	24061
Spent 48.995 of 52 seconds executing testcases

Comment by Marko Mäkelä [ 2023-07-07 ]

In MDEV-30986 I just found out that setting innodb_random_read_ahead=ON may improve read performance for some workloads. It is disabled by default.

Generated at Thu Feb 08 10:12:38 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.