[MDEV-15104] Remove trx_sys_t::rw_trx_ids and trx_sys_t::serialisation_list Created: 2018-01-28  Updated: 2023-12-19  Resolved: 2018-01-31

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Fix Version/s: 10.3.5

Type: Task Priority: Major
Reporter: Sergey Vojtovich Assignee: Sergey Vojtovich
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
is blocked by MDEV-15132 Avoid accessing the TRX_SYS page Closed
Problem/Incident
causes MDEV-15155 [Draft] Assertion failed: m_state == ... Closed
causes MDEV-15246 InnoDB: Unknown error code 20: Requir... Closed
causes MDEV-22680 InnoDB trx_sys improvements Open
Relates
relates to MDEV-29612 ReadViewBase::snapshot() misses an op... Closed
relates to MDEV-33067 SCN(Sequence Commit Number) based MVCC Open
Epic Link: arm64 optimization
Sprint: 10.2.13

 Description   

Take snapshot of registered read-write transaction identifiers directly
from rw_trx_hash. It immediately saves one trx_sys.mutex lock, reduces
size of another critical section protected by this mutex, and makes
further optimisations like removing trx_sys_t::serialisation_list
possible.

Downside of this approach is bigger overhead for view opening, because
iterating LF_HASH is more expensive compared to taking snapshot of an
array. However for low concurrency overhead difference is negligible,
while for high concurrency mutex is much bigger evil.

Currently we still take trx_sys.mutex to serialise ReadView creation.
This is required to keep serialisation_list ordered by trx->no as well
as not to let purge thread to create more recent snapshot while another
thread gets suspended during creation of older snapshot. This will
become completely mutex free along with serialisation_list removal.

Compared to previous implementation removing element from rw_trx_hash
and serialisation_list is not atomic. We disregard all possible bad
consequences (if there're any) since it will be solved along with
serialisation_list removal.

serialisation_list was supposed to instantly give minimum registered
transaction serialisation number. However maintaining and accessing
this list requires global mutex protection.

Since we already take MVCC snapshot by iterating trx_sys_t::rw_trx_hash,
it is cheap to integrate minimum registered transaction lookup into this
iteration.



 Comments   
Comment by Sergey Vojtovich [ 2018-01-28 ]

marko, please review patches for this task: https://github.com/MariaDB/server/commits/bb-10.3-arm

Comment by Marko Mäkelä [ 2018-01-29 ]

This looks OK, except for a possible race or starvation where we fail to update the max_trx_id in the TRX_SYS page before writing newer trx_id values to undo logs.

I think that a proper fix to that is to do the equivalent of SELECT MAX(trx_id) FROM undo_logs on startup, and only when there are no undo logs (not even to purge), read the value from the TRX_SYS page. (And only write to the TRX_SYS page when trx_undo_truncate_start() discards the last undo log.)

Comment by Marko Mäkelä [ 2018-01-31 ]

Thank you! Very good work.
I posted some minor comments to the MVCC snapshot optimization, mainly suggesting additional comments around state transitions.

Comment by Marko Mäkelä [ 2018-01-31 ]

For the record, MDEV-15132 removes the race condition by removing the updates of the TRX_SYS page.

Comment by zhai weixiang [ 2018-02-08 ]

I got an assertion while running benchmark.(50% update-non-index, 50% select-on-pk), not sure if it relates to this change.

ersion: '10.3.5-MariaDB-rds-dev' socket: '/u01/maria/run/mysql.sock' port: 3306 Source distribution
2018-02-08 13:55:24 269 [ERROR] [FATAL] InnoDB: Unknown error code 20: Required history data has been deleted
180208 13:55:24 [ERROR] mysqld got signal 6 ;

the latest commit is 029ab11cc883d486117f30900c787c4b2765b742

Comment by Marko Mäkelä [ 2018-02-08 ]

zhaiwx1987, sorry, we forgot to update this ticket to inform of this problem that axel found out when running benchmarks (actually to evaluate if MDEV-15058 could cause trouble). He got exactly the same type of errors, but only when running the benchmark for long enough. He isolated the problem to the very last scalability commit by svoj:

commit bc7a1dc1fbd27e6064d3b40443fe242397668af7
Author: Sergey Vojtovich
Date: Tue Jan 30 20:59:42 2018 +0400

MDEV-15104 - Optimise MVCC snapshot

With trx_sys_t::rw_trx_ids removal, MVCC snapshot overhead became
slightly higher. That is instead of copying an array we now have to
iterate LF_HASH. All this done under trx_sys.mutex protection.

This patch moves MVCC snapshot out of trx_sys.mutex.

Please revert this commit, or test with its parent. axel ran benchmarks with is parent over last weekend, and did not report of any crashes.
svoj has been working on a fix. I just filed MDEV-15246 to track this.

Comment by zhai weixiang [ 2018-02-08 ]

Hi, Marko, thank you for your reply,
I just also noticed that the code format of class ReadView is mixed with innodb style and server style, maybe that needs an adjustment ?

Comment by Sergey Vojtovich [ 2018-02-08 ]

zhaiwx1987, it was agreed to use server coding style in new InnoDB code.

Generated at Thu Feb 08 08:18:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.