Take snapshot of registered read-write transaction identifiers directly
from rw_trx_hash. It immediately saves one trx_sys.mutex lock, reduces
size of another critical section protected by this mutex, and makes
further optimisations like removing trx_sys_t::serialisation_list
possible.
Downside of this approach is bigger overhead for view opening, because
iterating LF_HASH is more expensive compared to taking snapshot of an
array. However for low concurrency overhead difference is negligible,
while for high concurrency mutex is much bigger evil.
Currently we still take trx_sys.mutex to serialise ReadView creation.
This is required to keep serialisation_list ordered by trx->no as well
as not to let purge thread to create more recent snapshot while another
thread gets suspended during creation of older snapshot. This will
become completely mutex free along with serialisation_list removal.
Compared to previous implementation removing element from rw_trx_hash
and serialisation_list is not atomic. We disregard all possible bad
consequences (if there're any) since it will be solved along with
serialisation_list removal.
serialisation_list was supposed to instantly give minimum registered
transaction serialisation number. However maintaining and accessing
this list requires global mutex protection.
Since we already take MVCC snapshot by iterating trx_sys_t::rw_trx_hash,
it is cheap to integrate minimum registered transaction lookup into this
iteration.
This looks OK, except for a possible race or starvation where we fail to update the max_trx_id in the TRX_SYS page before writing newer trx_id values to undo logs.
I think that a proper fix to that is to do the equivalent of SELECT MAX(trx_id) FROM undo_logs on startup, and only when there are no undo logs (not even to purge), read the value from the TRX_SYS page. (And only write to the TRX_SYS page when trx_undo_truncate_start() discards the last undo log.)
Marko Mäkelä
added a comment - This looks OK, except for a possible race or starvation where we fail to update the max_trx_id in the TRX_SYS page before writing newer trx_id values to undo logs.
I think that a proper fix to that is to do the equivalent of SELECT MAX(trx_id) FROM undo_logs on startup, and only when there are no undo logs (not even to purge), read the value from the TRX_SYS page. (And only write to the TRX_SYS page when trx_undo_truncate_start() discards the last undo log.)
Thank you! Very good work.
I posted some minor comments to the MVCC snapshot optimization, mainly suggesting additional comments around state transitions.
Marko Mäkelä
added a comment - Thank you! Very good work.
I posted some minor comments to the MVCC snapshot optimization , mainly suggesting additional comments around state transitions.
I got an assertion while running benchmark.(50% update-non-index, 50% select-on-pk), not sure if it relates to this change.
ersion: '10.3.5-MariaDB-rds-dev' socket: '/u01/maria/run/mysql.sock' port: 3306 Source distribution
2018-02-08 13:55:24 269 [ERROR][FATAL] InnoDB: Unknown error code 20: Required history data has been deleted
180208 13:55:24 [ERROR] mysqld got signal 6 ;
the latest commit is 029ab11cc883d486117f30900c787c4b2765b742
zhai weixiang
added a comment - I got an assertion while running benchmark.(50% update-non-index, 50% select-on-pk), not sure if it relates to this change.
ersion: '10.3.5-MariaDB-rds-dev' socket: '/u01/maria/run/mysql.sock' port: 3306 Source distribution
2018-02-08 13:55:24 269 [ERROR] [FATAL] InnoDB: Unknown error code 20: Required history data has been deleted
180208 13:55:24 [ERROR] mysqld got signal 6 ;
the latest commit is 029ab11cc883d486117f30900c787c4b2765b742
zhaiwx1987, sorry, we forgot to update this ticket to inform of this problem that axel found out when running benchmarks (actually to evaluate if MDEV-15058 could cause trouble). He got exactly the same type of errors, but only when running the benchmark for long enough. He isolated the problem to the very last scalability commit by svoj:
With trx_sys_t::rw_trx_ids removal, MVCC snapshot overhead became
slightly higher. That is instead of copying an array we now have to
iterate LF_HASH. All this done under trx_sys.mutex protection.
This patch moves MVCC snapshot out of trx_sys.mutex.
Please revert this commit, or test with its parent. axel ran benchmarks with is parent over last weekend, and did not report of any crashes. svoj has been working on a fix. I just filed MDEV-15246 to track this.
Marko Mäkelä
added a comment - zhaiwx1987 , sorry, we forgot to update this ticket to inform of this problem that axel found out when running benchmarks (actually to evaluate if MDEV-15058 could cause trouble). He got exactly the same type of errors, but only when running the benchmark for long enough. He isolated the problem to the very last scalability commit by svoj :
commit bc7a1dc1fbd27e6064d3b40443fe242397668af7
Author: Sergey Vojtovich
Date: Tue Jan 30 20:59:42 2018 +0400
MDEV-15104 - Optimise MVCC snapshot
With trx_sys_t::rw_trx_ids removal, MVCC snapshot overhead became
slightly higher. That is instead of copying an array we now have to
iterate LF_HASH . All this done under trx_sys.mutex protection.
This patch moves MVCC snapshot out of trx_sys.mutex .
Please revert this commit, or test with its parent. axel ran benchmarks with is parent over last weekend, and did not report of any crashes.
svoj has been working on a fix. I just filed MDEV-15246 to track this.
Hi, Marko, thank you for your reply,
I just also noticed that the code format of class ReadView is mixed with innodb style and server style, maybe that needs an adjustment ?
zhai weixiang
added a comment - - edited Hi, Marko, thank you for your reply,
I just also noticed that the code format of class ReadView is mixed with innodb style and server style, maybe that needs an adjustment ?
marko, please review patches for this task: https://github.com/MariaDB/server/commits/bb-10.3-arm