Details

    Description

      Originally described here: https://bugs.mysql.com/bug.php?id=84958

      If there're multiple row versions in InnoDB, reading one row from PK may have O(N) complexity and reading from secondary keys may have O(N^2) complexity.

      How to repeat:
      setup: create table t1 (a int, b int, c int, primary key(a,b), key (b,c)) engine=InnoDB;

      T1: BEGIN; SELECT * FROM t1;

      T2: 10000 * INSERT INTO t1 VALUES (1,2,3) ON DUPLICATE KEY UPDATE c=c+1

      T1: SELECT * FROM t1 FORCE INDEX (PRIMARY)
      (this will be immediate and bump Innodb_buffer_pool_read_requests by ~10k)

      T1: SELECT * FROM t1 FORCE INDEX (b)
      (this will be not immediate and bump Innodb_buffer_pool_read_requests by ~100000k or ~100m or....)

      Operator: *weep* *sob* *weep*

      Analysis

      Attachments

        Issue Links

          Activity

            I see that the test innodb.innodb_bug14704286 is for the initial ‘fix’ of this bug in MySQL. Yes, it would still be O(n²), but at least you would be able to interrupt the slow query.

            marko Marko Mäkelä added a comment - I see that the test innodb.innodb_bug14704286 is for the initial ‘fix’ of this bug in MySQL. Yes, it would still be O(n²), but at least you would be able to interrupt the slow query.

            This is how MVCC for secondary index works (from MySQL docs):

            When a secondary index column is updated, old secondary index records are delete-marked, new records are inserted, and delete-marked records are eventually purged. When a secondary index record is delete-marked or the secondary index page is updated by a newer transaction, InnoDB looks up the database record in the clustered index. In the clustered index, the record's DB_TRX_ID is checked, and the correct version of the record is retrieved from the undo log if the record was modified after the reading transaction was initiated.

            If there are multiple versions of clustered index record it traversed all of them each time it needed to retrieve secondary index record. The task case retrieved multiple versions of secondary index records in process of searching the current one and searched corresponding current clustered index record by scanning undo log. For each version from secondary index it did full undo log scan. So the algorithm complexity was O(N ^ 2): traverse undo log versions for each secondary index version.

            The fix just optimized such process. So it remembers now once found clustered index record and when secondary index versions are traversed it retrieved this remembered clustered index record. And the complexity is now O(2 * N): traverse undo log (clustered index versions) and then traverse secondary index versions.

            So, the short version of the above: optimize scanning the undo log versions while scanning the secondary index versions.

            midenok Aleksey Midenkov added a comment - This is how MVCC for secondary index works (from MySQL docs): When a secondary index column is updated, old secondary index records are delete-marked, new records are inserted, and delete-marked records are eventually purged. When a secondary index record is delete-marked or the secondary index page is updated by a newer transaction, InnoDB looks up the database record in the clustered index. In the clustered index, the record's DB_TRX_ID is checked, and the correct version of the record is retrieved from the undo log if the record was modified after the reading transaction was initiated. If there are multiple versions of clustered index record it traversed all of them each time it needed to retrieve secondary index record. The task case retrieved multiple versions of secondary index records in process of searching the current one and searched corresponding current clustered index record by scanning undo log. For each version from secondary index it did full undo log scan. So the algorithm complexity was O(N ^ 2): traverse undo log versions for each secondary index version. The fix just optimized such process. So it remembers now once found clustered index record and when secondary index versions are traversed it retrieved this remembered clustered index record. And the complexity is now O(2 * N): traverse undo log (clustered index versions) and then traverse secondary index versions. So, the short version of the above: optimize scanning the undo log versions while scanning the secondary index versions.

            Quoting my comment from 2015-01-01 to a blog post:

            Peter, I forgot to mention that the MVCC and checks for implicit record locks on secondary indexes get slow when there are lots of secondary index leaf pages waiting for records to be purged. We actually got a bug report for this:
            Bug#14704286 SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE LOOKUPS
            We only addressed a usability aspect of the bug: the time-consuming code was not interruptible by KILL QUERY.

            I think that the logic for purge sleeps (innodb_max_purge_lag) is a hack, and the real solution is making the purge subsystem faster. This happened in MySQL 5.5 (separate purge thread) and 5.6 (multiple purge threads).

            In its original form (implemented by me according to directions from Heikki) the innodb_max_purge_lag does not take the purge view into account. That is, if nothing is purgeable yet, it would not make sense to delay the DMLs. I do not remember if we fixed this yet.

            Your reasoning that undo pages are being evicted from the buffer pool, only to be brought back soon later by the purge subsystem sounds plausible. I wonder if you could add some instrumentation to collect some evidence that this is actually the case.

            There have been suggestions of using dedicated buffer pools for certain stuff (say, avoid evicting undo pages, or avoid evicting other pages when doing a bulk operation on certain table).

            marko Marko Mäkelä added a comment - Quoting my comment from 2015-01-01 to a blog post : Peter, I forgot to mention that the MVCC and checks for implicit record locks on secondary indexes get slow when there are lots of secondary index leaf pages waiting for records to be purged. We actually got a bug report for this: Bug#14704286 SECONDARY INDEX UPDATES MAKE CONSISTENT READS DO O(N^2) UNDO PAGE LOOKUPS We only addressed a usability aspect of the bug: the time-consuming code was not interruptible by KILL QUERY. I think that the logic for purge sleeps (innodb_max_purge_lag) is a hack, and the real solution is making the purge subsystem faster. This happened in MySQL 5.5 (separate purge thread) and 5.6 (multiple purge threads). In its original form (implemented by me according to directions from Heikki) the innodb_max_purge_lag does not take the purge view into account. That is, if nothing is purgeable yet, it would not make sense to delay the DMLs. I do not remember if we fixed this yet. Your reasoning that undo pages are being evicted from the buffer pool, only to be brought back soon later by the purge subsystem sounds plausible. I wonder if you could add some instrumentation to collect some evidence that this is actually the case. There have been suggestions of using dedicated buffer pools for certain stuff (say, avoid evicting undo pages, or avoid evicting other pages when doing a bulk operation on certain table).

            I believe that MDEV-17598 would fix the O(n²) page accesses in secondary index operations. As far as I remember, this fix in MariaDB only changes the asymptotic complexity by caching some lookup results.

            marko Marko Mäkelä added a comment - I believe that MDEV-17598 would fix the O(n²) page accesses in secondary index operations. As far as I remember, this fix in MariaDB only changes the asymptotic complexity by caching some lookup results.

            People

              midenok Aleksey Midenkov
              midenok Aleksey Midenkov
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.