Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12674

Innodb_row_lock_current_waits has overflow

Details

    Description

      Innodb_row_lock_current_waits appears to have an overflow.

      MariaDB [(none)]> SHOW GLOBAL STATUS LIKE 'Innodb_row_lock_current_waits';
      +-------------------------------+----------------------+
      | Variable_name | Value |
      +-------------------------------+----------------------+
      | Innodb_row_lock_current_waits | 18446744073709551613 |
      +-------------------------------+----------------------+
      1 row in set (0.00 sec)
       
      MariaDB [(none)]> SHOW SESSION STATUS LIKE 'Innodb_row_lock_current_waits';
      +-------------------------------+----------------------+
      | Variable_name | Value |
      +-------------------------------+----------------------+
      | Innodb_row_lock_current_waits | 18446744073709551613 |
      +-------------------------------+----------------------+
      1 row in set (0.00 sec)
      

      I don't know how to reproduce it, but have access to the system showing this (and it is currently still in this state).

      Attachments

        Issue Links

          Activity

            The underlying counter is srv_stats.n_lock_wait_current_count, which is only modified by the function lock_wait_suspend_thread(). The incrementing is protected by lock_sys->wait_mutex, but the decrementing does not appear to be protected by anything. This mismatch could allow the counter to be corrupted when a lock wait is terminating roughly at the same time with the start of a wait on (possibly another) lock.
            I think that both operations should be protected by atomics and that the incrementing can be done without holding any mutex.

            marko Marko Mäkelä added a comment - The underlying counter is srv_stats.n_lock_wait_current_count, which is only modified by the function lock_wait_suspend_thread(). The incrementing is protected by lock_sys->wait_mutex, but the decrementing does not appear to be protected by anything. This mismatch could allow the counter to be corrupted when a lock wait is terminating roughly at the same time with the start of a wait on (possibly another) lock. I think that both operations should be protected by atomics and that the incrementing can be done without holding any mutex.

            The inaccuracy seems to be by design:

            /** Class for using fuzzy counters. The counter is not protected by any
            mutex and the results are not guaranteed to be 100% accurate but close
            enough. Creates an array of counters and separates each element by the
            CACHE_LINE_SIZE bytes */
            template <
            	typename Type,
            	int N = IB_N_SLOTS,
            	template<typename, int> class Indexer = default_indexer_t>
            class ib_counter_t
            

            That said, much of the time this template is instantiated with N=1, which appears to boil down to a single counter that is padded to two cache line sizes:

            typedef ib_counter_t<ulint, 64> ulint_ctr_64_t;
            typedef ib_counter_t<lsn_t, 1, single_indexer_t> lsn_ctr_1_t;
            typedef ib_counter_t<ulint, 1, single_indexer_t> ulint_ctr_1_t;
            typedef ib_counter_t<lint, 1, single_indexer_t> lint_ctr_1_t;
            typedef ib_counter_t<int64_t, 1, single_indexer_t> int64_ctr_1_t;
            typedef ib_counter_t<int64_t, IB_N_SLOTS> int64_counter_t;
            

            It seems reasonable to use atomics for the various _ctr_1_t counters when the performance impact is low. When atomic memory operations are not available, I think it would be more acceptable to have some overflows than to risk significantly reducing performance with some mutex operations.

            The only multi-slot counter types are ulint_ctr_64_t (used by a number of srv_stats counters) and int64_counter_t (used by rw_lock_stats). For the rw_lock_stats we should not care about accuracy. For srv_stats, the following counters are inaccurate by design. Based on the names of these counters, they are only ever incremented, never decremented, and it would seem to be acceptable to lose a few increments:

            ulint_ctr_64_t          page_compression_saved;
            ulint_ctr_64_t          index_pages_written;
            ulint_ctr_64_t          non_index_pages_written;
            ulint_ctr_64_t          pages_page_compressed;
            ulint_ctr_64_t          page_compressed_trim_op;
            ulint_ctr_64_t          pages_page_decompressed;
            ulint_ctr_64_t          pages_page_compression_error;
            ulint_ctr_64_t          pages_encrypted;
            ulint_ctr_64_t          pages_decrypted;
            ulint_ctr_64_t		n_rows_read;
            ulint_ctr_64_t		n_rows_updated;
            ulint_ctr_64_t		n_rows_deleted;
            ulint_ctr_64_t		n_rows_inserted;
            ulint_ctr_64_t		n_system_rows_read;
            ulint_ctr_64_t		n_system_rows_updated;
            ulint_ctr_64_t		n_system_rows_deleted;
            ulint_ctr_64_t		n_system_rows_inserted;
            ulint_ctr_64_t		n_sec_rec_cluster_reads;
            ulint_ctr_64_t		n_sec_rec_cluster_reads_avoided;
            ulint_ctr_64_t		page0_read;
            ulint_ctr_64_t		n_key_requests;
            ulint_ctr_64_t		n_log_scrubs;
            ulint_ctr_64_t		key_rotation_list_length;
            

            marko Marko Mäkelä added a comment - The inaccuracy seems to be by design: /** Class for using fuzzy counters. The counter is not protected by any mutex and the results are not guaranteed to be 100% accurate but close enough. Creates an array of counters and separates each element by the CACHE_LINE_SIZE bytes */ template < typename Type, int N = IB_N_SLOTS, template<typename, int> class Indexer = default_indexer_t> class ib_counter_t That said, much of the time this template is instantiated with N=1, which appears to boil down to a single counter that is padded to two cache line sizes: typedef ib_counter_t<ulint, 64> ulint_ctr_64_t; typedef ib_counter_t<lsn_t, 1, single_indexer_t> lsn_ctr_1_t; typedef ib_counter_t<ulint, 1, single_indexer_t> ulint_ctr_1_t; typedef ib_counter_t<lint, 1, single_indexer_t> lint_ctr_1_t; typedef ib_counter_t<int64_t, 1, single_indexer_t> int64_ctr_1_t; typedef ib_counter_t<int64_t, IB_N_SLOTS> int64_counter_t; It seems reasonable to use atomics for the various _ctr_1_t counters when the performance impact is low. When atomic memory operations are not available, I think it would be more acceptable to have some overflows than to risk significantly reducing performance with some mutex operations. The only multi-slot counter types are ulint_ctr_64_t (used by a number of srv_stats counters) and int64_counter_t (used by rw_lock_stats). For the rw_lock_stats we should not care about accuracy. For srv_stats, the following counters are inaccurate by design. Based on the names of these counters, they are only ever incremented, never decremented, and it would seem to be acceptable to lose a few increments: ulint_ctr_64_t page_compression_saved; ulint_ctr_64_t index_pages_written; ulint_ctr_64_t non_index_pages_written; ulint_ctr_64_t pages_page_compressed; ulint_ctr_64_t page_compressed_trim_op; ulint_ctr_64_t pages_page_decompressed; ulint_ctr_64_t pages_page_compression_error; ulint_ctr_64_t pages_encrypted; ulint_ctr_64_t pages_decrypted; ulint_ctr_64_t n_rows_read; ulint_ctr_64_t n_rows_updated; ulint_ctr_64_t n_rows_deleted; ulint_ctr_64_t n_rows_inserted; ulint_ctr_64_t n_system_rows_read; ulint_ctr_64_t n_system_rows_updated; ulint_ctr_64_t n_system_rows_deleted; ulint_ctr_64_t n_system_rows_inserted; ulint_ctr_64_t n_sec_rec_cluster_reads; ulint_ctr_64_t n_sec_rec_cluster_reads_avoided; ulint_ctr_64_t page0_read; ulint_ctr_64_t n_key_requests; ulint_ctr_64_t n_log_scrubs; ulint_ctr_64_t key_rotation_list_length;

            Only the following counters are both incremented and decremented, and thus we probably should use atomic memory access for updating them:

            	/** Number of threads currently waiting on database locks */
            	lint_ctr_1_t		n_lock_wait_current_count;
            	/** Number of writes being done to the log files */
            	lint_ctr_1_t		os_log_pending_writes;
            

            marko Marko Mäkelä added a comment - Only the following counters are both incremented and decremented, and thus we probably should use atomic memory access for updating them: /** Number of threads currently waiting on database locks */ lint_ctr_1_t n_lock_wait_current_count; /** Number of writes being done to the log files */ lint_ctr_1_t os_log_pending_writes;

            srv_stats.os_log_pending_writes should be fine, because it is protected by log_sys->mutex (10.0, 10.1) or log_sys->write_mutex (10.2).

            Similarly, fil_n_pending_log_flushes and fil_n_pending_tablespace_flushes are protected by fil_system->mutex.
            Finally, MONITOR_OS_PENDING_READS, MONITOR_OS_PENDING_WRITES and their predecessors before MDEV-12534 are safe.

            So, the only overflow-prone ‘pending’ counter is Innodb_row_lock_current_waits.

            marko Marko Mäkelä added a comment - srv_stats.os_log_pending_writes should be fine, because it is protected by log_sys->mutex (10.0, 10.1) or log_sys->write_mutex (10.2). Similarly, fil_n_pending_log_flushes and fil_n_pending_tablespace_flushes are protected by fil_system->mutex. Finally, MONITOR_OS_PENDING_READS, MONITOR_OS_PENDING_WRITES and their predecessors before MDEV-12534 are safe. So, the only overflow-prone ‘pending’ counter is Innodb_row_lock_current_waits.
            marko Marko Mäkelä added a comment - - edited

            bb-10.0-marko
            bb-10.1-marko
            bb-10.2-marko
            There is no test case. I was not even able to repeat the overflow.
            This bug could require a very long run with an application with very high record lock contention to reproduce.
            These fixes should not only fix the overflow; they could also slightly improve the performance of the counters.

            marko Marko Mäkelä added a comment - - edited bb-10.0-marko bb-10.1-marko bb-10.2-marko There is no test case. I was not even able to repeat the overflow. This bug could require a very long run with an application with very high record lock contention to reproduce. These fixes should not only fix the overflow; they could also slightly improve the performance of the counters.

            ok to push.

            jplindst Jan Lindström (Inactive) added a comment - ok to push.

            People

              marko Marko Mäkelä
              ccalender Chris Calender (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.