Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15104

Remove trx_sys_t::rw_trx_ids and trx_sys_t::serialisation_list

Details

    Description

      Take snapshot of registered read-write transaction identifiers directly
      from rw_trx_hash. It immediately saves one trx_sys.mutex lock, reduces
      size of another critical section protected by this mutex, and makes
      further optimisations like removing trx_sys_t::serialisation_list
      possible.

      Downside of this approach is bigger overhead for view opening, because
      iterating LF_HASH is more expensive compared to taking snapshot of an
      array. However for low concurrency overhead difference is negligible,
      while for high concurrency mutex is much bigger evil.

      Currently we still take trx_sys.mutex to serialise ReadView creation.
      This is required to keep serialisation_list ordered by trx->no as well
      as not to let purge thread to create more recent snapshot while another
      thread gets suspended during creation of older snapshot. This will
      become completely mutex free along with serialisation_list removal.

      Compared to previous implementation removing element from rw_trx_hash
      and serialisation_list is not atomic. We disregard all possible bad
      consequences (if there're any) since it will be solved along with
      serialisation_list removal.

      serialisation_list was supposed to instantly give minimum registered
      transaction serialisation number. However maintaining and accessing
      this list requires global mutex protection.

      Since we already take MVCC snapshot by iterating trx_sys_t::rw_trx_hash,
      it is cheap to integrate minimum registered transaction lookup into this
      iteration.

      Attachments

        Issue Links

          Activity

            svoj Sergey Vojtovich added a comment - marko , please review patches for this task: https://github.com/MariaDB/server/commits/bb-10.3-arm

            This looks OK, except for a possible race or starvation where we fail to update the max_trx_id in the TRX_SYS page before writing newer trx_id values to undo logs.

            I think that a proper fix to that is to do the equivalent of SELECT MAX(trx_id) FROM undo_logs on startup, and only when there are no undo logs (not even to purge), read the value from the TRX_SYS page. (And only write to the TRX_SYS page when trx_undo_truncate_start() discards the last undo log.)

            marko Marko Mäkelä added a comment - This looks OK, except for a possible race or starvation where we fail to update the max_trx_id in the TRX_SYS page before writing newer trx_id values to undo logs. I think that a proper fix to that is to do the equivalent of SELECT MAX(trx_id) FROM undo_logs on startup, and only when there are no undo logs (not even to purge), read the value from the TRX_SYS page. (And only write to the TRX_SYS page when trx_undo_truncate_start() discards the last undo log.)

            Thank you! Very good work.
            I posted some minor comments to the MVCC snapshot optimization, mainly suggesting additional comments around state transitions.

            marko Marko Mäkelä added a comment - Thank you! Very good work. I posted some minor comments to the MVCC snapshot optimization , mainly suggesting additional comments around state transitions.

            For the record, MDEV-15132 removes the race condition by removing the updates of the TRX_SYS page.

            marko Marko Mäkelä added a comment - For the record, MDEV-15132 removes the race condition by removing the updates of the TRX_SYS page.
            zhaiwx1987 zhai weixiang added a comment -

            I got an assertion while running benchmark.(50% update-non-index, 50% select-on-pk), not sure if it relates to this change.

            ersion: '10.3.5-MariaDB-rds-dev' socket: '/u01/maria/run/mysql.sock' port: 3306 Source distribution
            2018-02-08 13:55:24 269 [ERROR] [FATAL] InnoDB: Unknown error code 20: Required history data has been deleted
            180208 13:55:24 [ERROR] mysqld got signal 6 ;

            the latest commit is 029ab11cc883d486117f30900c787c4b2765b742

            zhaiwx1987 zhai weixiang added a comment - I got an assertion while running benchmark.(50% update-non-index, 50% select-on-pk), not sure if it relates to this change. ersion: '10.3.5-MariaDB-rds-dev' socket: '/u01/maria/run/mysql.sock' port: 3306 Source distribution 2018-02-08 13:55:24 269 [ERROR] [FATAL] InnoDB: Unknown error code 20: Required history data has been deleted 180208 13:55:24 [ERROR] mysqld got signal 6 ; the latest commit is 029ab11cc883d486117f30900c787c4b2765b742

            zhaiwx1987, sorry, we forgot to update this ticket to inform of this problem that axel found out when running benchmarks (actually to evaluate if MDEV-15058 could cause trouble). He got exactly the same type of errors, but only when running the benchmark for long enough. He isolated the problem to the very last scalability commit by svoj:

            commit bc7a1dc1fbd27e6064d3b40443fe242397668af7
            Author: Sergey Vojtovich
            Date: Tue Jan 30 20:59:42 2018 +0400

            MDEV-15104 - Optimise MVCC snapshot

            With trx_sys_t::rw_trx_ids removal, MVCC snapshot overhead became
            slightly higher. That is instead of copying an array we now have to
            iterate LF_HASH. All this done under trx_sys.mutex protection.

            This patch moves MVCC snapshot out of trx_sys.mutex.

            Please revert this commit, or test with its parent. axel ran benchmarks with is parent over last weekend, and did not report of any crashes.
            svoj has been working on a fix. I just filed MDEV-15246 to track this.

            marko Marko Mäkelä added a comment - zhaiwx1987 , sorry, we forgot to update this ticket to inform of this problem that axel found out when running benchmarks (actually to evaluate if MDEV-15058 could cause trouble). He got exactly the same type of errors, but only when running the benchmark for long enough. He isolated the problem to the very last scalability commit by svoj : commit bc7a1dc1fbd27e6064d3b40443fe242397668af7 Author: Sergey Vojtovich Date: Tue Jan 30 20:59:42 2018 +0400 MDEV-15104 - Optimise MVCC snapshot With trx_sys_t::rw_trx_ids removal, MVCC snapshot overhead became slightly higher. That is instead of copying an array we now have to iterate LF_HASH . All this done under trx_sys.mutex protection. This patch moves MVCC snapshot out of trx_sys.mutex . Please revert this commit, or test with its parent. axel ran benchmarks with is parent over last weekend, and did not report of any crashes. svoj has been working on a fix. I just filed MDEV-15246 to track this.
            zhaiwx1987 zhai weixiang added a comment - - edited

            Hi, Marko, thank you for your reply,
            I just also noticed that the code format of class ReadView is mixed with innodb style and server style, maybe that needs an adjustment ?

            zhaiwx1987 zhai weixiang added a comment - - edited Hi, Marko, thank you for your reply, I just also noticed that the code format of class ReadView is mixed with innodb style and server style, maybe that needs an adjustment ?

            zhaiwx1987, it was agreed to use server coding style in new InnoDB code.

            svoj Sergey Vojtovich added a comment - zhaiwx1987 , it was agreed to use server coding style in new InnoDB code.

            People

              svoj Sergey Vojtovich
              svoj Sergey Vojtovich
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.