Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.5.4, 10.6.0, 10.6.8, 10.9.3
-
Arch Linux VM running on VMware ESXi
Description
We upgraded an MariaDB Galera Cluster from Version 10.5.13 to 10.9.3. Everything is up and running, from a client's point of view.
However we see a regression in I/O access when running mariadb-dump. A custom script (dumping and compressing every table in separate file) ran for about 45 minutes with MariaDB 10.5.13, the time increased to about two and a half hour with MariaDB 10.9.3.
A simple cp can do about 500MB/s on the machines, with mariadb-dump we get about 25MB/s - limited by disk I/O (where mariadbd is in I/O wait / state "D"). MariaDB 10.5.13 did something about 100MB/s - limited by compression, so could probably do a lot more.
This happens with all available I/O methods, tried with aio, uring and native. Also tried different filesystems (ext4 & btrfs) which does not make a difference. Same results with a cluster node and standalone (non-Galera) installation.
No idea if I/O for regular SQL queries regressed as well. The machines have a reasonable amount of RAM, so no delay is noticeable there. Running mariabackup for Galera state transfer reaches expected transfer rates.
Attachments
Issue Links
- blocks
-
MDEV-31227 innodb_flush_method=O_DIRECT causes 3x regression in workload
-
- Closed
-
- is caused by
-
MDEV-15053 Reduce buf_pool_t::mutex contention
-
- Closed
-
- relates to
-
MDEV-26055 Adaptive flushing is still not getting invoked in 10.5.11
-
- Closed
-
-
MDEV-30400 Assertion `height == btr_page_get_level(page_cur_get_page(page_cursor))' failed in btr_cur_search_to_nth_level on INSERT
-
- Closed
-
-
MDEV-31254 InnoDB: Trying to read doublewrite buffer page
-
- Closed
-
-
MDEV-29343 MariaDB 10.6.x slower mysqldump etc.
-
- Closed
-
-
MDEV-30986 Slow full index scan in 10.6 vs 10.5 for the (slow) I/O-bound case
-
- Closed
-
Activity
Are you able to test the mariadb-dump from 10.5.X and see if its a regression in the SQL generated by the backup or the server behaviour.
Are you able to enable the general log on the galera doing the backup and compare the comparative timing of queries or groups of queries? Or enable the slow query log with a short long_query_time? Is the global status able to record the effects of just the mariadb-dump?
Changing innodb_flush_method from O_DIRECT to fsync did the trick. Running mariadb-dump is back to expected transfer rates. I will keep this setting for our backup instance at least.
Just for completeness... The variable innodb_buffer_pool_size is set to 8589934592 (which is 8GB). But please note that the instance is started from a filesystem snapshot just for the backup, so there are no other queries to fill the buffer. Everything is read from disk.
So looks like O_DIRECT causes to read in small chunks, and a lot of these reads cause high I/O load and wait. On the other hand the operating system's filesystem cache causes to read bigger chunks (via read-ahead?). Is this something to be fixed for I/O via O_DIRECT? I think others may be suffering the same, possibly without actively noticing (but drinking an unnecessary amount of coffee while waiting for the dumps).
We are seeing this as well after upgrading from 10.4.26 to 10.6.10. Time to dump a big database schema jumped from ~4,5 hours to ~7 hours. We have always been using innodb_flush_method=O_DIRECT.
It's as if read-ahead is not working properly, because on 10.6.10 Innodb_buffer_pool_read_ahead status variable stays at 0, while on 10.4.x it was hundreds of millions .
We monitor performance with atop which tells that avq (average queue depth, similar to iostat aqu-sz I think?) was at ~30.00 when mysqldump on 10.4, while at 10.6 it stays at ~1.00.
I tried to enable innodb_random_read_ahead (idk how to enable/disable the linear read-ahead) and that seems to restore mysqldump speed somewhat to the old performance of 10.4 and loading the buffer_pool at startup is also way faster then. But it only helps for a brief period: after a couple of hours it stops working? (Innodb_buffer_pool_read_ahead_rnd no longer going up).
Changing to innodb_flush_method=fsync seems to restore mysqldump speed (rareq-sz goes up from 16kb to ~112kb).
It seems very similar to MDEV-29343 btw
eworm, at0r, most useful. Thanks.
I suspect innodb_flush_method=fsync, but not O_DIRECT would get read ahead benefits from /sys/block/<disk>/queue/read_ahead_kb.
Added 10.6.10, (and not 10.5) was this commit affecting linear read ahead (large restructure, so I may have missed something).
To disable linear read ahead from default would be innodb_read_ahead=none.
tested:
10.5.19-fe330b63fe4ee51a26ac6e222d8eb4ee44a73d50 |
MariaDB [test]> create table t1 as select seq from seq_1_to_10000000;
|
Query OK, 10000000 rows affected (39.684 sec)
|
Records: 10000000 Duplicates: 0 Warnings: 0
|
MariaDB [test]> shutdown;
|
Copied datadir to 10.6 instance. Removed ib_buffer_pool from both instances.
10.6.12-505da21e336f1e46b655f78fc0ad42e6cf17494d |
$ mc -e 'show global status like "innodb_buffer_pool_read%"'
|
+---------------------------------------+-------+
|
| Variable_name | Value |
|
+---------------------------------------+-------+
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 0 |
|
| Innodb_buffer_pool_read_ahead_evicted | 0 |
|
| Innodb_buffer_pool_read_requests | 991 |
|
| Innodb_buffer_pool_reads | 189 |
|
+---------------------------------------+-------+
|
And it stays at this.
Compared to 10.5 restarted which is in the seconds after start:
2022-11-14 10:39:10 0 [Note] InnoDB: 10.5.19 started; log sequence number 2732064850; transaction id 110
|
2022-11-14 10:39:10 0 [Note] Plugin 'FEEDBACK' is disabled.
|
2022-11-14 10:39:10 0 [Note] InnoDB: Loading buffer pool(s) from /tmp/build-mariadb-server-10.5-datadir/ib_buffer_pool
|
2022-11-14 10:39:10 0 [Note] InnoDB: Cannot open '/tmp/build-mariadb-server-10.5-datadir/ib_buffer_pool' for reading: No such file or directory
|
2022-11-14 10:39:10 0 [Note] Reading of all Master_info entries succeeded
|
2022-11-14 10:39:10 0 [Note] Added new Master_info '' to hash table
|
2022-11-14 10:39:10 0 [Note] sql/mysqld: ready for connections.
|
Version: '10.5.19-MariaDB-1:10.5.13+maria~stretch' socket: '/tmp/build-mariadb-server-10.5.sock' port: 0 mariadb.org binary distribution
|
...
|
|
$ mc -e 'show global status like "innodb_buffer_pool_read%"; select now()'
|
+---------------------------------------+---------+
|
| Variable_name | Value |
|
+---------------------------------------+---------+
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 0 |
|
| Innodb_buffer_pool_read_ahead_evicted | 0 |
|
| Innodb_buffer_pool_read_requests | 7710479 |
|
| Innodb_buffer_pool_reads | 8234 |
|
+---------------------------------------+---------+
|
+---------------------+
|
| now() |
|
+---------------------+
|
| 2022-11-14 10:39:12 |
|
+---------------------+
|
|
~/repos/build-mariadb-server-10.5
|
$ mc -e 'show global status like "innodb_buffer_pool_read%"; select now()'
|
+---------------------------------------+----------+
|
| Variable_name | Value |
|
+---------------------------------------+----------+
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 0 |
|
| Innodb_buffer_pool_read_ahead_evicted | 0 |
|
| Innodb_buffer_pool_read_requests | 13399085 |
|
| Innodb_buffer_pool_reads | 14165 |
|
+---------------------------------------+----------+
|
+---------------------+
|
| now() |
|
+---------------------+
|
| 2022-11-14 10:39:14 |
|
+---------------------+
|
|
~/repos/build-mariadb-server-10.5
|
$ mc -e 'show global status like "innodb_buffer_pool_read%"; select now()'
|
+---------------------------------------+----------+
|
| Variable_name | Value |
|
+---------------------------------------+----------+
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 0 |
|
| Innodb_buffer_pool_read_ahead_evicted | 0 |
|
| Innodb_buffer_pool_read_requests | 16941559 |
|
| Innodb_buffer_pool_reads | 17694 |
|
+---------------------------------------+----------+
|
+---------------------+
|
| now() |
|
+---------------------+
|
| 2022-11-14 10:39:15 |
|
+---------------------+
|
|
~/repos/build-mariadb-server-10.5
|
$ mc -e 'show global status like "innodb_buffer_pool_read%"; select now()'
|
+---------------------------------------+----------+
|
| Variable_name | Value |
|
+---------------------------------------+----------+
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 0 |
|
| Innodb_buffer_pool_read_ahead_evicted | 0 |
|
| Innodb_buffer_pool_read_requests | 18306840 |
|
| Innodb_buffer_pool_reads | 18755 |
|
+---------------------------------------+----------+
|
+---------------------+
|
| now() |
|
+---------------------+
|
| 2022-11-14 10:39:16 |
|
+---------------------+
|
|
~/repos/build-mariadb-server-10.5
|
$ mc -e 'show global status like "innodb_buffer_pool_read%"; select now()'
|
+---------------------------------------+----------+
|
| Variable_name | Value |
|
+---------------------------------------+----------+
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 0 |
|
| Innodb_buffer_pool_read_ahead_evicted | 0 |
|
| Innodb_buffer_pool_read_requests | 20046063 |
|
| Innodb_buffer_pool_reads | 20154 |
|
+---------------------------------------+----------+
|
+---------------------+
|
| now() |
|
+---------------------+
|
| 2022-11-14 10:39:17 |
|
+---------------------+
|
So 10.5 had something aggressively reading in the data from the start, without a single query being executed.
6 seconds of perf showed a lot of time in the purge thread.
$ perf record -g -- sql/mysqld --no-defaults --datadir=/tmp/${PWD##*/}-datadir --socket=/tmp/${PWD##*/}.sock --plugin-dir=${PWD}/mysql-test/var/plugins/ --verbose --skip-networking
|
2022-11-14 10:50:14 0 [Note] sql/mysqld (mysqld 10.5.19-MariaDB-1:10.5.13+maria~stretch) starting as process 33543 ...
|
2022-11-14 10:50:14 0 [Note] InnoDB: Uses event mutexes
|
2022-11-14 10:50:14 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
|
2022-11-14 10:50:14 0 [Note] InnoDB: Number of pools: 1
|
2022-11-14 10:50:14 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
|
2022-11-14 10:50:14 0 [Note] InnoDB: Using Linux native AIO
|
2022-11-14 10:50:14 0 [Note] InnoDB: Initializing buffer pool, total size = 134217728, chunk size = 134217728
|
2022-11-14 10:50:14 0 [Note] InnoDB: Completed initialization of buffer pool
|
2022-11-14 10:50:14 0 [Note] InnoDB: 128 rollback segments are active.
|
2022-11-14 10:50:14 0 [Note] InnoDB: Creating shared tablespace for temporary tables
|
2022-11-14 10:50:14 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
|
2022-11-14 10:50:14 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
|
2022-11-14 10:50:14 0 [Note] InnoDB: 10.5.19 started; log sequence number 2732064862; transaction id 110
|
2022-11-14 10:50:14 0 [Note] Plugin 'FEEDBACK' is disabled.
|
2022-11-14 10:50:14 0 [Note] InnoDB: Loading buffer pool(s) from /tmp/build-mariadb-server-10.5-datadir/ib_buffer_pool
|
2022-11-14 10:50:14 0 [Note] InnoDB: Cannot open '/tmp/build-mariadb-server-10.5-datadir/ib_buffer_pool' for reading: No such file or directory
|
2022-11-14 10:50:14 0 [Note] Reading of all Master_info entries succeeded
|
2022-11-14 10:50:14 0 [Note] Added new Master_info '' to hash table
|
2022-11-14 10:50:14 0 [Note] sql/mysqld: ready for connections.
|
Version: '10.5.19-MariaDB-1:10.5.13+maria~stretch' socket: '/tmp/build-mariadb-server-10.5.sock' port: 0 mariadb.org binary distribution
|
2022-11-14 10:50:22 0 [Note] sql/mysqld (initiated by: unknown): Normal shutdown
|
2022-11-14 10:50:22 0 [Note] Event Scheduler: Purging the queue. 0 events
|
2022-11-14 10:50:22 0 [Note] InnoDB: FTS optimize thread exiting.
|
2022-11-14 10:50:22 0 [Note] InnoDB: Starting shutdown...
|
2022-11-14 10:50:22 0 [Note] InnoDB: Dumping buffer pool(s) to /tmp/build-mariadb-server-10.5-datadir/ib_buffer_pool
|
2022-11-14 10:50:22 0 [Note] InnoDB: Restricted to 2016 pages due to innodb_buf_pool_dump_pct=25
|
2022-11-14 10:50:22 0 [Note] InnoDB: Buffer pool(s) dump completed at 221114 10:50:22
|
2022-11-14 10:50:23 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
|
2022-11-14 10:50:23 0 [Note] InnoDB: Shutdown completed; log sequence number 2732064874; transaction id 111
|
2022-11-14 10:50:23 0 [Note] sql/mysqld: Shutdown complete
|
|
[ perf record: Woken up 13 times to write data ]
|
[ perf record: Captured and wrote 3.899 MB perf.data (32688 samples) ]
|
|
~/repos/build-mariadb-server-10.5
|
$ perf report -g
|
|
Samples: 32K of event 'cycles:u', Event count (approx.): 36937185750
|
Children Self Command Shared Object Symbol
|
+ 86.08% 0.00% mysqld libstdc++.so.6.0.30 [.] 0x00007f39be6dbb03 â—†
|
+ 86.08% 0.00% mysqld mariadbd [.] tpool::thread_pool_generic::worker_main â–’
|
+ 86.08% 0.00% mysqld mariadbd [.] tpool::task_group::execute â–’
|
+ 50.53% 0.00% mysqld mariadbd [.] purge_worker_callback â–’
|
+ 50.47% 4.01% mysqld mariadbd [.] que_run_threads â–’
|
+ 45.43% 6.27% mysqld mariadbd [.] row_purge_step â–’
|
+ 35.55% 0.00% mysqld mariadbd [.] purge_coordinator_callback â–’
|
+ 34.30% 0.34% mysqld mariadbd [.] row_purge_reset_trx_id â–’
|
+ 34.03% 7.32% mysqld mariadbd [.] trx_purge â–’
|
+ 32.75% 0.08% mysqld mariadbd [.] row_purge_reposition_pcur â–’
|
+ 31.78% 0.09% mysqld mariadbd [.] row_search_on_row_ref â–’
|
+ 31.70% 0.28% mysqld mariadbd [.] btr_pcur_open_low â–’
|
+ 31.02% 2.00% mysqld mariadbd [.] btr_cur_search_to_nth_level â–’
|
+ 21.92% 2.91% mysqld mariadbd [.] page_cur_search_with_match â–’
|
+ 20.56% 1.52% mysqld mariadbd [.] buf_page_get_gen â–’
|
+ 19.12% 10.86% mysqld mariadbd [.] buf_page_get_low â–’
|
+ 11.10% 2.51% mysqld mariadbd [.] cmp_dtuple_rec_with_match_low â–’
|
+ 7.84% 5.83% mysqld mariadbd [.] rec_get_offsets_func â–’
|
+ 6.03% 0.00% mysqld [unknown] [.] 0000000000000000 â–’
|
+ 5.28% 0.00% mysqld [unknown] [.] 0x0000611000032b98 â–’
|
+ 5.28% 0.00% mysqld [unknown] [.] 0x00007f39995fb370
|
So on the read ahead in 10.6 - none on a clean startup:
10.6.12 |
$ mc -e 'show global status like "innodb_buffer_pool_read%"'
|
+---------------------------------------+-------+
|
| Variable_name | Value |
|
+---------------------------------------+-------+
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 0 |
|
| Innodb_buffer_pool_read_ahead_evicted | 0 |
|
| Innodb_buffer_pool_read_requests | 991 |
|
| Innodb_buffer_pool_reads | 189 |
|
+---------------------------------------+-------+
|
|
~/repos/build-mariadb-server-10.6
|
$ time client/mysqldump -S /tmp/build-mariadb-server-10.6.sock test > /dev/null
|
|
real 0m2.111s
|
user 0m0.503s
|
sys 0m0.026s
|
|
~/repos/build-mariadb-server-10.6
|
$ mc -e 'show global status like "innodb_buffer_pool_read%"'
|
+---------------------------------------+----------+
|
| Variable_name | Value |
|
+---------------------------------------+----------+
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 0 |
|
| Innodb_buffer_pool_read_ahead_evicted | 1 |
|
| Innodb_buffer_pool_read_requests | 10022373 |
|
| Innodb_buffer_pool_reads | 21560 |
|
+---------------------------------------+----------+
|
Compared to 10.5, after the ~10-20 seconds for the reads/read request to stabilize:
$ mc -e 'show global status like "innodb_buffer_pool%"; select now()'
|
+-----------------------------------------+-----------+
|
| Variable_name | Value |
|
+-----------------------------------------+-----------+
|
| Innodb_buffer_pool_dump_status | |
|
| Innodb_buffer_pool_load_status | |
|
| Innodb_buffer_pool_resize_status | |
|
| Innodb_buffer_pool_load_incomplete | OFF |
|
| Innodb_buffer_pool_pages_data | 8065 |
|
| Innodb_buffer_pool_bytes_data | 132136960 |
|
| Innodb_buffer_pool_pages_dirty | 0 |
|
| Innodb_buffer_pool_bytes_dirty | 0 |
|
| Innodb_buffer_pool_pages_flushed | 0 |
|
| Innodb_buffer_pool_pages_free | 0 |
|
| Innodb_buffer_pool_pages_made_not_young | 36325519 |
|
| Innodb_buffer_pool_pages_made_young | 21 |
|
| Innodb_buffer_pool_pages_misc | 0 |
|
| Innodb_buffer_pool_pages_old | 2975 |
|
| Innodb_buffer_pool_pages_total | 8065 |
|
| Innodb_buffer_pool_pages_lru_flushed | 0 |
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 0 |
|
| Innodb_buffer_pool_read_ahead_evicted | 0 |
|
| Innodb_buffer_pool_read_requests | 56189465 |
|
| Innodb_buffer_pool_reads | 48791 |
|
| Innodb_buffer_pool_wait_free | 0 |
|
| Innodb_buffer_pool_write_requests | 515 |
|
+-----------------------------------------+-----------+
|
+---------------------+
|
| now() |
|
+---------------------+
|
| 2022-11-14 11:06:13 |
|
+---------------------+
|
|
~/repos/build-mariadb-server-10.5
|
$ time client/mysqldump -S /tmp/build-mariadb-server-10.5.sock test > /dev/null
|
|
real 0m6.390s
|
user 0m1.275s
|
sys 0m0.088s
|
|
~/repos/build-mariadb-server-10.5
|
$ mc -e 'show global status like "innodb_buffer_pool%"; select now()'
|
+-----------------------------------------+-----------+
|
| Variable_name | Value |
|
+-----------------------------------------+-----------+
|
| Innodb_buffer_pool_dump_status | |
|
| Innodb_buffer_pool_load_status | |
|
| Innodb_buffer_pool_resize_status | |
|
| Innodb_buffer_pool_load_incomplete | OFF |
|
| Innodb_buffer_pool_pages_data | 8065 |
|
| Innodb_buffer_pool_bytes_data | 132136960 |
|
| Innodb_buffer_pool_pages_dirty | 0 |
|
| Innodb_buffer_pool_bytes_dirty | 0 |
|
| Innodb_buffer_pool_pages_flushed | 0 |
|
| Innodb_buffer_pool_pages_free | 0 |
|
| Innodb_buffer_pool_pages_made_not_young | 46346890 |
|
| Innodb_buffer_pool_pages_made_young | 21 |
|
| Innodb_buffer_pool_pages_misc | 0 |
|
| Innodb_buffer_pool_pages_old | 2975 |
|
| Innodb_buffer_pool_pages_total | 8065 |
|
| Innodb_buffer_pool_pages_lru_flushed | 0 |
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 0 |
|
| Innodb_buffer_pool_read_ahead_evicted | 0 |
|
| Innodb_buffer_pool_read_requests | 66210840 |
|
| Innodb_buffer_pool_reads | 70161 |
|
| Innodb_buffer_pool_wait_free | 0 |
|
| Innodb_buffer_pool_write_requests | 515 |
|
+-----------------------------------------+-----------+
|
+---------------------+
|
| now() |
|
+---------------------+
|
| 2022-11-14 11:06:28 |
|
+---------------------+
|
So a longer dump, ~ same number of Innodb_buffer_pool_reads, no read ahead either.
10.4 by comparison, after redo log was applied. Read and read ahead happening on startup.
10.4-20969aa41251045a890a29e7b68ae5abae7d0a95 |
$ sql/mysqld --no-defaults --datadir=/tmp/${PWD##*/}-datadir --socket=/tmp/${PWD##*/}.sock --plugin-dir=${PWD}/mysql-test/var/plugins/ --verbose --skip-networking --innodb-buffer-pool-load-at-startup=0
|
2022-11-14 11:34:23 0 [Note] sql/mysqld (mysqld 10.4.28-MariaDB) starting as process 47533 ...
|
2022-11-14 11:34:23 0 [Note] InnoDB: Using Linux native AIO
|
2022-11-14 11:34:23 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
|
2022-11-14 11:34:23 0 [Note] InnoDB: Uses event mutexes
|
2022-11-14 11:34:23 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
|
2022-11-14 11:34:23 0 [Note] InnoDB: Number of pools: 1
|
2022-11-14 11:34:23 0 [Note] InnoDB: Using SSE2 crc32 instructions
|
2022-11-14 11:34:23 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
|
2022-11-14 11:34:23 0 [Note] InnoDB: Completed initialization of buffer pool
|
2022-11-14 11:34:23 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
|
2022-11-14 11:34:23 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
|
2022-11-14 11:34:23 0 [Note] InnoDB: Creating shared tablespace for temporary tables
|
2022-11-14 11:34:23 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
|
2022-11-14 11:34:23 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
|
2022-11-14 11:34:23 0 [Note] InnoDB: Waiting for purge to start
|
2022-11-14 11:34:23 0 [Note] InnoDB: 10.4.28 started; log sequence number 842732234; transaction id 40
|
2022-11-14 11:34:23 0 [Note] Plugin 'FEEDBACK' is disabled.
|
2022-11-14 11:34:23 0 [Note] Reading of all Master_info entries succeeded
|
2022-11-14 11:34:23 0 [Note] Added new Master_info '' to hash table
|
2022-11-14 11:34:23 0 [Note] sql/mysqld: ready for connections.
|
Version: '10.4.28-MariaDB' socket: '/tmp/build-mariadb-server-10.4.sock' port: 0 Source distribution
|
|
$ mc -e 'show global status like "innodb_buffer_pool%"; select now()'
|
+---------------------------------------+----------+
|
| Variable_name | Value |
|
+---------------------------------------+----------+
|
| Innodb_buffer_pool_dump_status | |
|
| Innodb_buffer_pool_load_status | |
|
| Innodb_buffer_pool_resize_status | |
|
| Innodb_buffer_pool_load_incomplete | OFF |
|
| Innodb_buffer_pool_pages_data | 3114 |
|
| Innodb_buffer_pool_bytes_data | 51019776 |
|
| Innodb_buffer_pool_pages_dirty | 131 |
|
| Innodb_buffer_pool_bytes_dirty | 2146304 |
|
| Innodb_buffer_pool_pages_flushed | 0 |
|
| Innodb_buffer_pool_pages_free | 3586 |
|
| Innodb_buffer_pool_pages_misc | 1327 |
|
| Innodb_buffer_pool_pages_total | 8027 |
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 1626 |
|
| Innodb_buffer_pool_read_ahead_evicted | 0 |
|
| Innodb_buffer_pool_read_requests | 2827972 |
|
| Innodb_buffer_pool_reads | 1358 |
|
| Innodb_buffer_pool_wait_free | 0 |
|
| Innodb_buffer_pool_write_requests | 715 |
|
+---------------------------------------+----------+
|
+---------------------+
|
| now() |
|
+---------------------+
|
| 2022-11-14 11:34:24 |
|
+---------------------+
|
|
~/repos/build-mariadb-server-10.4
|
$ mc -e 'show global status like "innodb_buffer_pool%"; select now()'
|
+---------------------------------------+----------+
|
| Variable_name | Value |
|
+---------------------------------------+----------+
|
| Innodb_buffer_pool_dump_status | |
|
| Innodb_buffer_pool_load_status | |
|
| Innodb_buffer_pool_resize_status | |
|
| Innodb_buffer_pool_load_incomplete | OFF |
|
| Innodb_buffer_pool_pages_data | 4687 |
|
| Innodb_buffer_pool_bytes_data | 76791808 |
|
| Innodb_buffer_pool_pages_dirty | 0 |
|
| Innodb_buffer_pool_bytes_dirty | 0 |
|
| Innodb_buffer_pool_pages_flushed | 131 |
|
| Innodb_buffer_pool_pages_free | 1349 |
|
| Innodb_buffer_pool_pages_misc | 1991 |
|
| Innodb_buffer_pool_pages_total | 8027 |
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 2420 |
|
| Innodb_buffer_pool_read_ahead_evicted | 0 |
|
| Innodb_buffer_pool_read_requests | 4281798 |
|
| Innodb_buffer_pool_reads | 2137 |
|
| Innodb_buffer_pool_wait_free | 0 |
|
| Innodb_buffer_pool_write_requests | 715 |
|
+---------------------------------------+----------+
|
+---------------------+
|
| now() |
|
+---------------------+
|
| 2022-11-14 11:34:25 |
|
+---------------------+
|
|
~/repos/build-mariadb-server-10.4
|
$ mc -e 'show global status like "innodb_buffer_pool%"; select now()'
|
+---------------------------------------+----------+
|
| Variable_name | Value |
|
+---------------------------------------+----------+
|
| Innodb_buffer_pool_dump_status | |
|
| Innodb_buffer_pool_load_status | |
|
| Innodb_buffer_pool_resize_status | |
|
| Innodb_buffer_pool_load_incomplete | OFF |
|
| Innodb_buffer_pool_pages_data | 5525 |
|
| Innodb_buffer_pool_bytes_data | 90521600 |
|
| Innodb_buffer_pool_pages_dirty | 0 |
|
| Innodb_buffer_pool_bytes_dirty | 0 |
|
| Innodb_buffer_pool_pages_flushed | 131 |
|
| Innodb_buffer_pool_pages_free | 141 |
|
| Innodb_buffer_pool_pages_misc | 2361 |
|
| Innodb_buffer_pool_pages_total | 8027 |
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 2881 |
|
| Innodb_buffer_pool_read_ahead_evicted | 0 |
|
| Innodb_buffer_pool_read_requests | 5195230 |
|
| Innodb_buffer_pool_reads | 2514 |
|
| Innodb_buffer_pool_wait_free | 0 |
|
| Innodb_buffer_pool_write_requests | 715 |
|
+---------------------------------------+----------+
|
+---------------------+
|
| now() |
|
+---------------------+
|
| 2022-11-14 11:34:26 |
|
+---------------------+
|
After stabilization:
$ mc -e 'show global status like "innodb_buffer_pool%"; select now()'
|
+---------------------------------------+----------+
|
| Variable_name | Value |
|
+---------------------------------------+----------+
|
| Innodb_buffer_pool_dump_status | |
|
| Innodb_buffer_pool_load_status | |
|
| Innodb_buffer_pool_resize_status | |
|
| Innodb_buffer_pool_load_incomplete | OFF |
|
| Innodb_buffer_pool_pages_data | 4931 |
|
| Innodb_buffer_pool_bytes_data | 80789504 |
|
| Innodb_buffer_pool_pages_dirty | 0 |
|
| Innodb_buffer_pool_bytes_dirty | 0 |
|
| Innodb_buffer_pool_pages_flushed | 131 |
|
| Innodb_buffer_pool_pages_free | 1 |
|
| Innodb_buffer_pool_pages_misc | 3095 |
|
| Innodb_buffer_pool_pages_total | 8027 |
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 18277 |
|
| Innodb_buffer_pool_read_ahead_evicted | 0 |
|
| Innodb_buffer_pool_read_requests | 31143231 |
|
| Innodb_buffer_pool_reads | 13594 |
|
| Innodb_buffer_pool_wait_free | 0 |
|
| Innodb_buffer_pool_write_requests | 715 |
|
+---------------------------------------+----------+
|
+---------------------+
|
| now() |
|
+---------------------+
|
| 2022-11-14 11:37:52 |
|
+---------------------+
|
|
~/repos/build-mariadb-server-10.4
|
$ time client/mysqldump -S /tmp/build-mariadb-server-10.4.sock test > /dev/null
|
|
real 0m5.776s
|
user 0m0.575s
|
sys 0m0.038s
|
|
~/repos/build-mariadb-server-10.4
|
$ mc -e 'show global status like "innodb_buffer_pool%"; select now()'
|
+---------------------------------------+----------+
|
| Variable_name | Value |
|
+---------------------------------------+----------+
|
| Innodb_buffer_pool_dump_status | |
|
| Innodb_buffer_pool_load_status | |
|
| Innodb_buffer_pool_resize_status | |
|
| Innodb_buffer_pool_load_incomplete | OFF |
|
| Innodb_buffer_pool_pages_data | 5878 |
|
| Innodb_buffer_pool_bytes_data | 96305152 |
|
| Innodb_buffer_pool_pages_dirty | 0 |
|
| Innodb_buffer_pool_bytes_dirty | 0 |
|
| Innodb_buffer_pool_pages_flushed | 131 |
|
| Innodb_buffer_pool_pages_free | 0 |
|
| Innodb_buffer_pool_pages_misc | 2149 |
|
| Innodb_buffer_pool_pages_total | 8027 |
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 35647 |
|
| Innodb_buffer_pool_read_ahead_evicted | 0 |
|
| Innodb_buffer_pool_read_requests | 41182641 |
|
| Innodb_buffer_pool_reads | 14523 |
|
| Innodb_buffer_pool_wait_free | 0 |
|
| Innodb_buffer_pool_write_requests | 715 |
|
+---------------------------------------+----------+
|
+---------------------+
|
| now() |
|
+---------------------+
|
| 2022-11-14 11:38:06 |
|
+---------------------+
|
High number of Innodb_buffer_pool_read_ahead compared to Innodb_buffer_pool_reads to satisfy the mysqldump.
10.6 tracing from a select sum(seq) from tbl |
Thread 9 "mysqld" hit Breakpoint 1, buf_read_ahead_linear (page_id=page_id@entry={m_id = 21474836607}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:640
|
640 count+= buf_read_page_low(&err, space, false, ibuf_mode, new_low, zip_size,
|
|
->
|
|
buf_read_page_low (err=err@entry=0x7ffff6c6aa7c, space=0x2525b18, sync=false, mode=mode@entry=132, page_id={m_id = 21474836547}, zip_size=zip_size@entry=0, unzip=<optimized out>) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:271
|
|
gets to:
|
|
(gdb)
|
300 bpage = buf_page_init_for_read(mode, page_id, zip_size, unzip);
|
|
(always returns null here, so no pages are read ahead)
|
10.6 path through buf_page_init_for_read |
buf_page_init_for_read (mode=mode@entry=132, page_id=page_id@entry={m_id = 21474836548}, zip_size=zip_size@entry=0, unzip=false) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:99
|
99 mtr_t mtr;
|
(gdb) info locals
|
mtr = {m_last = 0x80, m_last_offset = 128, m_log_mode = 0, m_modifications = 0, m_made_dirty = 0, m_inside_ibuf = 0, m_trim_pages = 0, m_memo = {m_heap = 0x0, m_list = {<ilist<mtr_buf_t::block_t, void>> = {sentinel_ = {next = 0x7ffff6c6a3f8, prev = 0x7ffff6c6a3f8}}, size_ = 1}, m_size = 0, m_first_block = {<ilist_node<void>> = {next = 0x7ffff6c6a3d8, prev = 0x7ffff6c6a3d8}, m_data = "\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000"..., m_used = 0}}, m_log = {m_heap = 0x0, m_list = {<ilist<mtr_buf_t::block_t, void>> = {sentinel_ = {next = 0x7ffff6c6a628, prev = 0x7ffff6c6a628}}, size_ = 1}, m_size = 0, m_first_block = {<ilist_node<void>> = {next = 0x7ffff6c6a608, prev = 0x7ffff6c6a608}, m_data = '\000' <repeats 33 times>, "\227\375*W\246\261\005\000\000\000\000\000\000\000\000\000@\000\000\000\000\000\000\000\247\306\366\377\177\000\000\216@\321\000\000\000\000\000\001\000\000\000\000\000\000\000\200\354\001\254\377\177\000\000\023\000\005\000\350\373\364m\002mysql\000index_stats", '\000' <repeats 46 times>, "\220\247\306\366\377\177\000\000h\\R\002\000\000\000\000\340\354\340\323\377\177\000\000\030[R\002\000\000\000\000\001\316\001\254\377\177\000\000"..., m_used = 0}}, m_user_space = 0x0, m_commit_lsn = 0, m_freed_space = 0x0, m_freed_pages = 0x0}
|
bpage = <optimized out>
|
block = <optimized out>
|
chain = <optimized out>
|
hash_page = <optimized out>
|
(gdb) p/x page_id
|
$8 = {m_id = 0x500000044}
|
(gdb) n
|
101 if (mode == BUF_READ_IBUF_PAGES_ONLY)
|
(gdb) p mode
|
$9 = 132
|
(gdb) n
|
118 if (!zip_size || unzip || recv_recovery_is_on())
|
(gdb) p zip_size
|
$10 = 0
|
(gdb) p unzip
|
$11 = false
|
(gdb) s
|
120 block= buf_LRU_get_free_block(false);
|
(gdb) n
|
[Thread 0x7fffd0e00640 (LWP 244150) exited]
|
121 block->initialise(page_id, zip_size, buf_page_t::READ_FIX);
|
(gdb) p block
|
$12 = (buf_block_t *) 0x7fffd3e0ed80
|
(gdb) n
|
124 block->page.lock.x_lock(true);
|
(gdb) p block
|
$13 = (buf_block_t *) 0x7fffd3e0ed80
|
(gdb) n
|
127 buf_pool_t::hash_chain &chain= buf_pool.page_hash.cell_get(page_id.fold());
|
(gdb)
|
129 mysql_mutex_lock(&buf_pool.mutex);
|
(gdb)
|
135 if (block)
|
(gdb) p block
|
$14 = (buf_block_t *) 0x7fffd3e0ed80
|
(gdb) n
|
137 block->page.lock.x_unlock(true);
|
(gdb) n
|
139 buf_LRU_block_free_non_file_page(block);
|
(gdb)
|
233 mysql_mutex_unlock(&buf_pool.mutex);
|
(gdb)
|
101 if (mode == BUF_READ_IBUF_PAGES_ONLY)
|
(gdb)
|
235 if (mode == BUF_READ_IBUF_PAGES_ONLY)
|
(gdb) info locals
|
mtr = {m_last = 0x80, m_last_offset = 128, m_log_mode = 0, m_modifications = 0, m_made_dirty = 0, m_inside_ibuf = 0, m_trim_pages = 0, m_memo = {m_heap = 0x0, m_list = {<ilist<mtr_buf_t::block_t, void>> = {sentinel_ = {next = 0x7ffff6c6a3f8, prev = 0x7ffff6c6a3f8}}, size_ = 1}, m_size = 0, m_first_block = {<ilist_node<void>> = {next = 0x7ffff6c6a3d8, prev = 0x7ffff6c6a3d8}, m_data = "\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000\200\000\000\000\000\000\000\000"..., m_used = 0}}, m_log = {m_heap = 0x0, m_list = {<ilist<mtr_buf_t::block_t, void>> = {sentinel_ = {next = 0x7ffff6c6a628, prev = 0x7ffff6c6a628}}, size_ = 1}, m_size = 0, m_first_block = {<ilist_node<void>> = {next = 0x7ffff6c6a608, prev = 0x7ffff6c6a608}, m_data = '\000' <repeats 33 times>, "\227\375*W\246\261\005\000\000\000\000\000\000\000\000\000@\000\000\000\000\000\000\000\247\306\366\377\177\000\000\216@\321\000\000\000\000\000\001\000\000\000\000\000\000\000\200\354\001\254\377\177\000\000\023\000\005\000\350\373\364m\002mysql\000index_stats", '\000' <repeats 46 times>, "\220\247\306\366\377\177\000\000h\\R\002\000\000\000\000\340\354\340\323\377\177\000\000\030[R\002\000\000\000\000\001\316\001\254\377\177\000\000"..., m_used = 0}}, m_user_space = 0x0, m_commit_lsn = 0, m_freed_space = 0x0, m_freed_pages = 0x0}
|
bpage = 0x0
|
block = <optimized out>
|
chain = @0x22cd6f0: {first = 0x7fffd3e0c800}
|
hash_page = <optimized out>
|
(gdb) finish
|
Run till exit from #0 buf_page_init_for_read (mode=mode@entry=132, page_id=page_id@entry={m_id = 21474836548}, zip_size=zip_size@entry=0, unzip=<optimized out>) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:235
|
buf_read_page_low (err=err@entry=0x7ffff6c6aa7c, space=0x2525b18, sync=false, mode=mode@entry=132, page_id={m_id = 21474836548}, zip_size=zip_size@entry=0, unzip=false) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:302
|
302 if (bpage == NULL) {
|
Value returned is $15 = (buf_page_t *) 0x0
|
10.5, saw no count on Innodb_buffer_pool_read_ahead on the same query.
10.4 was returning a non-null bpage at this same point.
Well 10.4 comes up with a decent low -> high range to begin the read ahead:
Thread 32 "mysqld" hit Breakpoint 1, buf_read_ahead_linear (page_id={m_space = 7, m_page_no = 64}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.4/storage/innobase/buf/buf0rea.cc:570
|
570 for (i = low; i < high; i++) {
|
low = 64
|
high = 128
|
10.4, not always completely sane |
Thread 32 "mysqld" hit Breakpoint 1, buf_read_ahead_linear (page_id={m_space = 7, m_page_no = 127}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.4/storage/innobase/buf/buf0rea.cc:570
|
low = 64
|
high = 128
|
(I'd expect the low=128 at least, the current page 127 is already read, and previous pages aren't "ahead" unless descending). I'm not sure I understand the significance of the buf_read_ahead_linear_area mask.
But at least its not 10.6, where a read ahead covers the previous pages only (this occurred during create table t1 as select seq from seq_1_to_10000000;) :
10.6 |
Thread 95 "mysqld" hit Breakpoint 2, buf_read_ahead_linear (page_id={m_id = 63}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:631
|
631 for (ulint ibuf_mode= ibuf ? BUF_READ_IBUF_PAGES_ONLY : BUF_READ_ANY_PAGE;
|
new_low = {m_id = 0}
|
new_high_1 = {m_id = 63}
|
|
Thread 95 "mysqld" hit Breakpoint 2, buf_read_ahead_linear (page_id={m_id = 255}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:631
|
631 for (ulint ibuf_mode= ibuf ? BUF_READ_IBUF_PAGES_ONLY : BUF_READ_ANY_PAGE;
|
new_low = {m_id = 192}
|
new_high_1 = {m_id = 255}
|
Or a basic case that never hits:
10.6 preproducer |
MariaDB [test]> create table t1 as select seq from seq_1_to_10000000;
|
Query OK, 10000000 rows affected (8 min 20.608 sec)
|
Records: 10000000 Duplicates: 0 Warnings: 0
|
|
MariaDB [test]> set global innodb_fast_shutdown=0;
|
Query OK, 0 rows affected (0.000 sec)
|
|
MariaDB [test]> shutdown;
|
Query OK, 0 rows affected (0.001 sec)
|
|
MariaDB [test]> show global status like 'innodb_buffer_pool%';
|
ERROR 2006 (HY000): Server has gone away
|
No connection. Trying to reconnect...
|
Connection id: 3
|
Current database: test
|
|
+-----------------------------------------+---------+
|
| Variable_name | Value |
|
+-----------------------------------------+---------+
|
...
|
| Innodb_buffer_pool_pages_data | 402 |
|
...
|
| Innodb_buffer_pool_pages_total | 8089 |
|
..
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 0 |
|
| Innodb_buffer_pool_read_ahead_evicted | 0 |
|
| Innodb_buffer_pool_read_requests | 1163 |
|
| Innodb_buffer_pool_reads | 271 |
|
+-----------------------------------------+---------+
|
25 rows in set (0.015 sec)
|
|
MariaDB [test]> select sum(seq) from t1;
|
+----------------+
|
| sum(seq) |
|
+----------------+
|
| 50000005000000 |
|
+----------------+
|
1 row in set (54.859 sec)
|
|
MariaDB [test]> show global status like 'innodb_buffer_pool%';
|
+-----------------------------------------+-----------+
|
| Variable_name | Value |
|
+-----------------------------------------+-----------+
|
..
|
| Innodb_buffer_pool_pages_data | 8089 |
|
..
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 0 |
|
| Innodb_buffer_pool_read_ahead_evicted | 1 |
|
| Innodb_buffer_pool_read_requests | 10022534 |
|
| Innodb_buffer_pool_reads | 21641 |
|
+-----------------------------------------+-----------+
|
25 rows in set (0.003 sec)
|
10.6 restart |
$ gdb --args sql/mysqld --no-defaults --datadir=$HOME/mariadb-datadir --socket=/tmp/${PWD##*/}.sock --plugin-dir=${PWD}/mysql-test/var/plugins/ --verbose --skip-networking --innodb-buffer-pool-load-at-startup=0 --innodb_read_ahead_threshold=4
|
..
|
(gdb) b storage/innobase/buf/buf0rea.cc:631
|
Breakpoint 1 at 0x152af8c: file /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc, line 631.
|
(gdb) r
|
(no breakpoints hit), not a single page matched the criteria to attempt to read ahead.
10.4 ranges of reads low -> high map to the page_id in question:
10.4 a sane read ahead setup |
Thread 32 "mysqld" hit Breakpoint 1, buf_read_ahead_linear (page_id={m_space = 7, m_page_no = 0}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.4/storage/innobase/buf/buf0rea.cc:570
|
570 for (i = low; i < high; i++) {
|
low = 0
|
high = 64
|
10.4, but not always completely sane |
Thread 32 "mysqld" hit Breakpoint 1, buf_read_ahead_linear (page_id={m_space = 7, m_page_no = 127}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.4/storage/innobase/buf/buf0rea.cc:570
|
low = 64
|
high = 128
|
As a forward read ahead, this shoudl start at 128 (127 already read by the time this is called), and its read ahead, not behind.
However 10.6 is always trying to read behind (this was during create table t1 as select seq from seq_1_to_10000000;):
10.6 |
Thread 26 "mysqld" hit Breakpoint 2, buf_read_ahead_linear (page_id={m_id = 63}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:631
|
631 for (ulint ibuf_mode= ibuf ? BUF_READ_IBUF_PAGES_ONLY : BUF_READ_ANY_PAGE;
|
new_low = {m_id = 0}
|
new_high_1 = {m_id = 63}
|
|
or:
|
|
Thread 26 "mysqld" hit Breakpoint 2, buf_read_ahead_linear (page_id={m_id = 255}, zip_size=0, ibuf=false) at /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc:631
|
631 for (ulint ibuf_mode= ibuf ? BUF_READ_IBUF_PAGES_ONLY : BUF_READ_ANY_PAGE;
|
new_low = {m_id = 192}
|
new_high_1 = {m_id = 255}
|
A basic reproducer:
should produce read aheads |
MariaDB [test]> create table t1 as select seq from seq_1_to_10000000;
|
Query OK, 10000000 rows affected (8 min 20.608 sec)
|
Records: 10000000 Duplicates: 0 Warnings: 0
|
|
MariaDB [test]> set global innodb_fast_shutdown=0;
|
Query OK, 0 rows affected (0.000 sec)
|
|
MariaDB [test]> shutdown;
|
Query OK, 0 rows affected (0.001 sec)
|
|
MariaDB [test]> show global status like 'innodb_buffer_pool%';
|
ERROR 2006 (HY000): Server has gone away
|
No connection. Trying to reconnect...
|
Connection id: 3
|
Current database: test
|
|
+-----------------------------------------+---------+
|
| Variable_name | Value |
|
+-----------------------------------------+---------+
|
| Innodb_buffer_pool_pages_data | 402 |
|
| Innodb_buffer_pool_pages_dirty | 0 |
|
| Innodb_buffer_pool_pages_free | 7687 |
|
| Innodb_buffer_pool_pages_latched | 144 |
|
| Innodb_buffer_pool_pages_total | 8089 |
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 0 |
|
| Innodb_buffer_pool_read_ahead_evicted | 0 |
|
| Innodb_buffer_pool_read_requests | 1163 |
|
| Innodb_buffer_pool_reads | 271 |
|
+-----------------------------------------+---------+
|
25 rows in set (0.015 sec)
|
|
MariaDB [test]> select sum(seq) from t1;
|
+----------------+
|
| sum(seq) |
|
+----------------+
|
| 50000005000000 |
|
+----------------+
|
1 row in set (54.859 sec)
|
|
MariaDB [test]> show global status like 'innodb_buffer_pool%';
|
+-----------------------------------------+-----------+
|
| Variable_name | Value |
|
+-----------------------------------------+-----------+
|
| Innodb_buffer_pool_pages_data | 8089 |
|
| Innodb_buffer_pool_read_ahead_rnd | 0 |
|
| Innodb_buffer_pool_read_ahead | 0 |
|
| Innodb_buffer_pool_read_ahead_evicted | 1 |
|
| Innodb_buffer_pool_read_requests | 10022534 |
|
| Innodb_buffer_pool_reads | 21641 |
|
+-----------------------------------------+-----------+
|
10.6 restart executed with breakpoint on buf0rea.cc, line 631 failed to trigger |
(gdb) b storage/innobase/buf/buf0rea.cc:631
|
Breakpoint 1 at 0x152af8c: file /home/dan/repos/mariadb-server-10.6/storage/innobase/buf/buf0rea.cc, line 631.
|
(gdb) r
|
Starting program: /home/dan/repos/build-mariadb-server-10.6-debug/sql/mysqld --no-defaults --datadir=/home/dan/mariadb-datadir --socket=/tmp/build-mariadb-server-10.6-debug.sock --plugin-dir=/home/dan/repos/build-mariadb-server-10.6-debug/mysql-test/var/plugins/ --verbose --skip-networking --innodb-buffer-pool-load-at-startup=0 --innodb_read_ahead_threshold=4
|
But small bit of inspiration is buf_read_ahead_linear is being called from the wrong spot. A more logical place to call it would be btr_pcur_move_to_next_page / btr_pcur_move_to_prev.
The advantages here are:
- At this point the direction of read ahead is obvious (by function name)
- there's a page_id on cursor->btr_cur->page_cur->block.page
- one of the cursor structures could keep track of the count of consecutive reads
- further to the above the fuzzy walking the tree and looking at access time heuristics can be dropped
disadvantages:
- Those applications doing ref calls on incremental elements in a table won't get a read ahead detected.
- No readahead in buffer pool loads (which would be a rather easy fix)
Is it known at list layer if a PK lookup is also required by the query? if so a secondary key look ahead could prefetch the PK clustered pages as well.
Maybe too much for a 10.5 fix, but useful?
The B-tree cursors were refactored in MDEV-30400. I can think of an additional point where read-ahead could be initiated. In btr_cur_t::search_leaf() and btr_cur_t::open_leaf() when reaching level 1 (a page that points to the leaf page), we will know all the child page numbers. If the end of the key range was made available at this point, we could read ahead all child pages that are withing the search range. Currently, we have no knowledge of the range here.
In addition, whenever we reach a leaf page, we might post a read-ahead request for its right sibling page (or left, in the rare case of reverse range scan). Also for this, it would be useful to know the other end of the desired key range, so that we can avoid unnecessary read-ahead when no access to the sibling page will be needed.
I believe that some progress on MDEV-16402 will be needed before we can reliably get the end-of-range conditions passed to the B-tree traversal.
MDEV-26055 might provide some improvement here, by periodically making some space available in buf_pool.free.
In a chat, psergei claimed the following:
You can overload handler::read_range_first() and you will see the endpoints of the range being scanned.
If that works, MDEV-16402 should not block this ticket.
The following test shows no read-ahead on 10.5 for me:
--source include/have_innodb.inc
|
--source include/have_sequence.inc
|
create table t1 engine=innodb as select seq from seq_1_to_10000000; |
let $datadir=`select @@datadir`; |
SET GLOBAL innodb_fast_shutdown=0; |
--source include/shutdown_mysqld.inc
|
remove_file $datadir/ib_buffer_pool;
|
--source include/start_mysqld.inc
|
show global status like 'innodb_buffer_pool_read%'; |
sleep 10;
|
show global status like 'innodb_buffer_pool_read%'; |
The invocation was as follows:
./mtr --mysqld=--innodb-{buffer-pool,log-file}-size=1g main.name_of_test |
10.5 d821fd7fab2723b84f774aaa4882acb86b35769d |
show global status like 'innodb_buffer_pool_read%';
|
Variable_name Value
|
Innodb_buffer_pool_read_ahead_rnd 0
|
Innodb_buffer_pool_read_ahead 0
|
Innodb_buffer_pool_read_ahead_evicted 0
|
Innodb_buffer_pool_read_requests 932
|
Innodb_buffer_pool_reads 155
|
show global status like 'innodb_buffer_pool_read%';
|
Variable_name Value
|
Innodb_buffer_pool_read_ahead_rnd 0
|
Innodb_buffer_pool_read_ahead 0
|
Innodb_buffer_pool_read_ahead_evicted 0
|
Innodb_buffer_pool_read_requests 932
|
Innodb_buffer_pool_reads 155
|
On 10.4, the numbers are similar:
10.4 84b9fc25a29b94a37eb9d5ac2e2c0f75c0efafda |
show global status like 'innodb_buffer_pool_read%';
|
Variable_name Value
|
Innodb_buffer_pool_read_ahead_rnd 0
|
Innodb_buffer_pool_read_ahead 0
|
Innodb_buffer_pool_read_ahead_evicted 0
|
Innodb_buffer_pool_read_requests 1103
|
Innodb_buffer_pool_reads 161
|
show global status like 'innodb_buffer_pool_read%';
|
Variable_name Value
|
Innodb_buffer_pool_read_ahead_rnd 0
|
Innodb_buffer_pool_read_ahead 0
|
Innodb_buffer_pool_read_ahead_evicted 0
|
Innodb_buffer_pool_read_requests 1103
|
Innodb_buffer_pool_reads 161
|
If I remove the SET GLOBAL innodb_fast_shutdown=0, then the purge of transaction history (MDEV-12288) will kick in and trigger read-ahead in 10.4, like danblack noted:
10.4 84b9fc25a29b94a37eb9d5ac2e2c0f75c0efafda |
# restart
|
show global status like 'innodb_buffer_pool_read%';
|
Variable_name Value
|
Innodb_buffer_pool_read_ahead_rnd 0
|
Innodb_buffer_pool_read_ahead 0
|
Innodb_buffer_pool_read_ahead_evicted 0
|
Innodb_buffer_pool_read_requests 341640
|
Innodb_buffer_pool_reads 508
|
show global status like 'innodb_buffer_pool_read%';
|
Variable_name Value
|
Innodb_buffer_pool_read_ahead_rnd 0
|
Innodb_buffer_pool_read_ahead 6673
|
Innodb_buffer_pool_read_ahead_evicted 0
|
Innodb_buffer_pool_read_requests 11977134
|
Innodb_buffer_pool_reads 5182
|
In 10.4, all read-ahead is triggered by the following in buf_page_get_low():
if (mode != BUF_PEEK_IF_IN_POOL && !access_time) { |
/* In the case of a first access, try to apply linear |
read-ahead */
|
|
buf_read_ahead_linear(page_id, zip_size, ibuf_inside(mtr));
|
}
|
In 10.5 there will be no read-ahead reported, just like danblack noted. The condition for detecting the first access to a page is apparently broken and will need to be revised.
The check in the 10.5 buf_read_ahead_linear() that enough past page requests have been in the expected order is filtering out all requests. That code was refactored in MDEV-15053.
It is more complex than that. In 10.4, buf_read_ahead_linear() is being invoked in descending order of page number, because row_purge_reset_trx_id() is being invoked in descending order of test.t1.DB_ROW_ID. It looks like the LIFO purge order was replaced with FIFO in this commit of MDEV-16678.
I think that the FIFO purge order makes sense, so I would not touch that. To make 10.4 use the FIFO order for test purposes, I applied the following patch:
diff --git a/storage/innobase/row/row0purge.cc b/storage/innobase/row/row0purge.cc
|
index 6ed03f0eabb..d30ca77f30f 100644
|
--- a/storage/innobase/row/row0purge.cc
|
+++ b/storage/innobase/row/row0purge.cc
|
@@ -1469,7 +1469,8 @@ row_purge_step(
|
trx_purge_rec_t*purge_rec;
|
|
purge_rec = static_cast<trx_purge_rec_t*>(
|
- ib_vector_pop(node->undo_recs));
|
+ ib_vector_get(node->undo_recs, 0));
|
+ ib_vector_remove(node->undo_recs, *(void**)purge_rec);
|
|
node->roll_ptr = purge_rec->roll_ptr;
|
|
Also with this patch present, 10.4 will trigger some linear read-ahead:
show global status like 'innodb_buffer_pool_read%';
|
Variable_name Value
|
Innodb_buffer_pool_read_ahead_rnd 0
|
Innodb_buffer_pool_read_ahead 0
|
Innodb_buffer_pool_read_ahead_evicted 0
|
Innodb_buffer_pool_read_requests 294129
|
Innodb_buffer_pool_reads 461
|
show global status like 'innodb_buffer_pool_read%';
|
Variable_name Value
|
Innodb_buffer_pool_read_ahead_rnd 0
|
Innodb_buffer_pool_read_ahead 2560
|
Innodb_buffer_pool_read_ahead_evicted 0
|
Innodb_buffer_pool_read_requests 4128875
|
Innodb_buffer_pool_reads 1755
|
The first linear read-ahead was triggered when page 127 of the table t1 was being accessed.
Next, I will try to see where the 10.4 and 10.5 buf_read_ahead_linear() differ. The intention seems to be to enable read-ahead when in the previous 64-page extent, at most 8 pages had not been accessed in ascending order. This condition is never met in 10.5.
The following condition appears to be wrong:
diff --git a/storage/innobase/buf/buf0rea.cc b/storage/innobase/buf/buf0rea.cc
|
--- a/storage/innobase/buf/buf0rea.cc
|
+++ b/storage/innobase/buf/buf0rea.cc
|
@@ -582,7 +582,7 @@ buf_read_ahead_linear(const page_id_t page_id, ulint zip_size, bool ibuf)
|
|
/* We will check that almost all pages in the area have been accessed
|
in the desired order. */
|
- const bool descending= page_id == low;
|
+ const bool descending= page_id != low;
|
|
if (!descending && page_id != high_1)
|
/* This is not a border page of the area */ |
There also is something wrong with buf_read_page_low(), because it always returns false.
There were a couple of off-by-one bugs too. I am really sorry for this. In MDEV-15053, I had removed one buffer page lookup, for the current page, and instead added a condition if (i == page_id) inside the loop.
diff --git a/storage/innobase/buf/buf0rea.cc b/storage/innobase/buf/buf0rea.cc
|
index ff163f74b08..26a84939306 100644
|
--- a/storage/innobase/buf/buf0rea.cc
|
+++ b/storage/innobase/buf/buf0rea.cc
|
@@ -582,7 +582,7 @@ buf_read_ahead_linear(const page_id_t page_id, ulint zip_size, bool ibuf)
|
|
/* We will check that almost all pages in the area have been accessed
|
in the desired order. */
|
- const bool descending= page_id == low;
|
+ const bool descending= page_id != low;
|
|
if (!descending && page_id != high_1)
|
/* This is not a border page of the area */
|
@@ -612,7 +612,7 @@ buf_read_ahead_linear(const page_id_t page_id, ulint zip_size, bool ibuf)
|
uint32_t{buf_pool.read_ahead_area});
|
page_id_t new_low= low, new_high_1= high_1;
|
unsigned prev_accessed= 0;
|
- for (page_id_t i= low; i != high_1; ++i)
|
+ for (page_id_t i= low; i <= high_1; ++i)
|
{
|
const ulint fold= i.fold();
|
page_hash_latch *hash_lock= buf_pool.page_hash.lock<false>(fold);
|
@@ -647,12 +647,21 @@ buf_read_ahead_linear(const page_id_t page_id, ulint zip_size, bool ibuf)
|
if (prev == FIL_NULL || next == FIL_NULL)
|
goto hard_fail;
|
page_id_t id= page_id;
|
- if (descending && next - 1 == page_id.page_no())
|
- id.set_page_no(prev);
|
- else if (!descending && prev + 1 == page_id.page_no())
|
- id.set_page_no(next);
|
+ if (descending)
|
+ {
|
+ if (id == high_1)
|
+ ++id;
|
+ else if (next - 1 != page_id.page_no())
|
+ goto hard_fail;
|
+ else
|
+ id.set_page_no(prev);
|
+ }
|
else
|
- goto hard_fail; /* Successor or predecessor not in the right order */
|
+ {
|
+ if (prev + 1 != page_id.page_no())
|
+ goto hard_fail;
|
+ id.set_page_no(next);
|
+ }
|
|
new_low= id - (id.page_no() % buf_read_ahead_area);
|
new_high_1= new_low + (buf_read_ahead_area - 1);
|
@@ -693,7 +702,7 @@ buf_read_ahead_linear(const page_id_t page_id, ulint zip_size, bool ibuf)
|
/* If we got this far, read-ahead can be sensible: do it */
|
count= 0;
|
for (ulint ibuf_mode= ibuf ? BUF_READ_IBUF_PAGES_ONLY : BUF_READ_ANY_PAGE;
|
- new_low != new_high_1; ++new_low)
|
+ new_low <= new_high_1; ++new_low)
|
{
|
if (ibuf_bitmap_page(new_low, zip_size))
|
continue; |
With this, I am seeing linear read-ahead taking place:
10.5 d821fd7fab2723b84f774aaa4882acb86b35769d with patch |
show global status like 'innodb_buffer_pool_read%';
|
Variable_name Value
|
Innodb_buffer_pool_read_ahead_rnd 0
|
Innodb_buffer_pool_read_ahead 0
|
Innodb_buffer_pool_read_ahead_evicted 0
|
Innodb_buffer_pool_read_requests 403049
|
Innodb_buffer_pool_reads 526
|
show global status like 'innodb_buffer_pool_read%';
|
Variable_name Value
|
Innodb_buffer_pool_read_ahead_rnd 0
|
Innodb_buffer_pool_read_ahead 11648
|
Innodb_buffer_pool_read_ahead_evicted 0
|
Innodb_buffer_pool_read_requests 21995204
|
Innodb_buffer_pool_reads 5955
|
I reviewed the logic of buf_read_ahead_random(), and I do not notice any regression there. I see that the default is innodb_random_read_ahead=OFF. Only linear read-ahead is enabled by default. It would be disabled if read_ahead_threshold=0. The default value is 56 and the maximum is 64.
The next step is to test with innodb_random_read_ahead=ON and innodb_read_ahead_threshold=0 to see if the random read-ahead will kick in.
Random read ahead seems to be fine. I added a wait for everything to be purged, so that we can compare the total execution time between 10.4 and 10.5.
--source include/have_innodb.inc
|
--source include/have_sequence.inc
|
create table t1 engine=innodb as select seq from seq_1_to_10000000; |
let $datadir=`select @@datadir`; |
--source include/shutdown_mysqld.inc
|
remove_file $datadir/ib_buffer_pool;
|
--source include/start_mysqld.inc
|
show global status like 'innodb_buffer_pool_read%'; |
SET GLOBAL innodb_max_purge_lag_wait=0; |
show global status like 'innodb_buffer_pool_read%'; |
DROP TABLE t1; |
./mtr --mysqld=--innodb-{buffer-pool,log-file}-size=1g --mysqld=--innodb-read-ahead-threshold=0 --mysqld=--innodb-random-read-ahead=ON --innodb-purge-threas=32 main.name_of_test |
10.4 84b9fc25a29b94a37eb9d5ac2e2c0f75c0efafda |
show global status like 'innodb_buffer_pool_read%';
|
Variable_name Value
|
Innodb_buffer_pool_read_ahead_rnd 283
|
Innodb_buffer_pool_read_ahead 0
|
Innodb_buffer_pool_read_ahead_evicted 0
|
Innodb_buffer_pool_read_requests 292045
|
Innodb_buffer_pool_reads 515
|
SET GLOBAL innodb_max_purge_lag_wait=0;
|
show global status like 'innodb_buffer_pool_read%';
|
Variable_name Value
|
Innodb_buffer_pool_read_ahead_rnd 24883
|
Innodb_buffer_pool_read_ahead 0
|
Innodb_buffer_pool_read_ahead_evicted 0
|
Innodb_buffer_pool_read_requests 31552786
|
Innodb_buffer_pool_reads 32008
|
…
|
Spent 102.762 of 110 seconds executing testcases
|
10.5 d821fd7fab2723b84f774aaa4882acb86b35769d |
show global status like 'innodb_buffer_pool_read%';
|
Variable_name Value
|
Innodb_buffer_pool_read_ahead_rnd 383
|
Innodb_buffer_pool_read_ahead 0
|
Innodb_buffer_pool_read_ahead_evicted 0
|
Innodb_buffer_pool_read_requests 408332
|
Innodb_buffer_pool_reads 640
|
SET GLOBAL innodb_max_purge_lag_wait=0;
|
show global status like 'innodb_buffer_pool_read%';
|
Variable_name Value
|
Innodb_buffer_pool_read_ahead_rnd 24883
|
Innodb_buffer_pool_read_ahead 0
|
Innodb_buffer_pool_read_ahead_evicted 0
|
Innodb_buffer_pool_read_requests 40023169
|
Innodb_buffer_pool_reads 32001
|
…
|
Spent 64.291 of 67 seconds executing testcases
|
We can see that 10.5 was a little more eager to issue random read-ahead requests during the test. Despite there being 25% more read requests from the buffer pool (we should probably find the reason why), 10.5 completed the test much faster. During the wait for purge, 10.4 was using some 660% of CPU, despite there being 32 threads available. 10.5 was using only 100% of CPU during that phase, possibly due to MDEV-16678 and related changes. If there had been DML into multiple tables or multiple partitions of a table, it should have been able to use more threads. Starting with 10.6, enabling the MDEV-515 bulk load would avoid the need of purging individual rows altogether.
origin/bb-10.5-MDEV-29967 9e1ff0435dfa31c16d3138e802971dda8b25e3b1 2023-05-10T12:43:17+03:00
behaved well in RQG testing.
In 10.6, the test would not trigger any read-ahead, because the CREATE…SELECT statement will be executed with table-level and not row-level undo logging (MDEV-515). It would complete for me in about 31 seconds. A test with row-level undo logging will start like this:
create table t1(a int primary key) engine=innodb; |
insert into t1 select seq from seq_1_to_10000000; |
10.6 c271057288f71746d1816824f338f2d9c47f67c1 |
show global status like 'innodb_buffer_pool_read%';
|
Variable_name Value
|
Innodb_buffer_pool_read_ahead_rnd 0
|
Innodb_buffer_pool_read_ahead 0
|
Innodb_buffer_pool_read_ahead_evicted 0
|
Innodb_buffer_pool_read_requests 5903
|
Innodb_buffer_pool_reads 413
|
SET GLOBAL innodb_max_purge_lag_wait=0;
|
show global status like 'innodb_buffer_pool_read%';
|
Variable_name Value
|
Innodb_buffer_pool_read_ahead_rnd 0
|
Innodb_buffer_pool_read_ahead 14720
|
Innodb_buffer_pool_read_ahead_evicted 0
|
Innodb_buffer_pool_read_requests 670313
|
Innodb_buffer_pool_reads 24061
|
…
|
Spent 48.995 of 52 seconds executing testcases
|
In MDEV-30986 I just found out that setting innodb_random_read_ahead=ON may improve read performance for some workloads. It is disabled by default.
What are your settings of innodb_buffer_pool_size and innodb_flush_method on 10.5.13 and 10.9.3? Note that the default for the latter was changed in
MDEV-24854(10.6).