Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25758

InnoDB spatial indexes miss large geometry fields after MDEV-25459

Details

    Description

      MDEV-25459 corrected MVCC missing reads from CHAR or VARCHAR indexes, but omits handling InnoDB spatial r-tree indexes in storage/innobase/row/row0sel.cc row_sel_sec_rec_is_for_clust_rec. It seems to work for geometry fields which fit in the key prefix length.

      Checking for dict_index_is_spatial(sec_index) in check_for_blob: goto works, although repetitive code. Not familiar enough with the internals to tell if it would be safe to skip the other checks entirely in case of spatial index.

      Additionally, row_sel_sec_rec_is_for_blob doesn't seem to work right. When len > REC_VERSION_56_MAX_INDEX_COL_LEN it becomes REC_VERSION_56_MAX_INDEX_COL_LEN + 1, causing assert failure in debug build.

      Attachments

        Issue Links

          Activity

            Thank you for the report.
            Do you have a test case where len would exceed 3072 bytes? That should be the maximum column prefix length in a secondary index, ever since a larger limit was introduced to replace the original one of 768 bytes.

            marko Marko Mäkelä added a comment - Thank you for the report. Do you have a test case where len would exceed 3072 bytes? That should be the maximum column prefix length in a secondary index, ever since a larger limit was introduced to replace the original one of 768 bytes.
            jsantala Jukka Santala added a comment -

            Maximum geometry length varies a bit depending on the primary key, not sure how to easily check actual key length. Our production case is on commercial map data, so I can't share that, but linestring/polygon with 502 points seems to do it. Test case is using the mysql-test/suite/innodb_gis/t/create_spatial_index.test table structure; innodb_read_only_compressed on 10.6, however this is independent of table properies as long as engine is InnoDB and the secondary key is SPATIAL index, of course. Expected result is getting the inserted row for each SELECT, but they return none if spatial secondary index is used, and on DEBUG build it crashes with the assertion. SPATIAL indexes can't define a prefix, so before the MDEV-25459 patch the code would just skip to the spatial-key special case, although it does seem to work for shorter keys now.

            jsantala Jukka Santala added a comment - Maximum geometry length varies a bit depending on the primary key, not sure how to easily check actual key length. Our production case is on commercial map data, so I can't share that, but linestring/polygon with 502 points seems to do it. Test case is using the mysql-test/suite/innodb_gis/t/create_spatial_index.test table structure; innodb_read_only_compressed on 10.6, however this is independent of table properies as long as engine is InnoDB and the secondary key is SPATIAL index, of course. Expected result is getting the inserted row for each SELECT, but they return none if spatial secondary index is used, and on DEBUG build it crashes with the assertion. SPATIAL indexes can't define a prefix, so before the MDEV-25459 patch the code would just skip to the spatial-key special case, although it does seem to work for shorter keys now.

            Thank you, jsantala. I think that the following is a minimal test case. I was not able to reduce the size further by setting innodb_compression_level=0:

            --source include/have_innodb.inc
            --source include/have_sequence.inc
             
            CREATE TABLE t1(l LINESTRING NOT NULL, SPATIAL INDEX(l))
            ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1;
             
            SELECT GROUP_CONCAT(CONCAT(seq, ' ', seq) SEPARATOR ',') INTO @g
            FROM seq_0_to_190;
             
            INSERT INTO t1 SET l=ST_GeomFromText(CONCAT('LINESTRING(',@g,',0 0)'));
             
            SELECT COUNT(*) FROM t1
            WHERE MBRIntersects(GeomFromText('Polygon((0 0,0 10,10 10,10 0,0 0))'), l);
            DROP TABLE t1;
            

            marko Marko Mäkelä added a comment - Thank you, jsantala . I think that the following is a minimal test case. I was not able to reduce the size further by setting innodb_compression_level=0 : --source include/have_innodb.inc --source include/have_sequence.inc   CREATE TABLE t1(l LINESTRING NOT NULL , SPATIAL INDEX (l)) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1;   SELECT GROUP_CONCAT(CONCAT(seq, ' ' , seq) SEPARATOR ',' ) INTO @g FROM seq_0_to_190;   INSERT INTO t1 SET l=ST_GeomFromText(CONCAT( 'LINESTRING(' ,@g, ',0 0)' ));   SELECT COUNT (*) FROM t1 WHERE MBRIntersects(GeomFromText( 'Polygon((0 0,0 10,10 10,10 0,0 0))' ), l); DROP TABLE t1;

            The InnoDB SPATIAL INDEX implementation that was copied from MySQL 5.7.9 to MariaDB Server 10.2.2 is somewhat problematic, because the implementation employs a mix of duplicating B-tree specific code and ‘pushing down’ R-tree specific conditions to B-tree specific code. I think that it would be clearest to let row_sel_sec_rec_is_for_clust_rec() only cover B-tree indexes and to create a separate variant for SPATIAL INDEX.

            Note: There are problems with the InnoDB implementation of SPATIAL INDEX, especially in the area of MVCC. I do not completely understand the design decisions that were made in MySQL 5.7: making copies of buffer pool pages (MDEV-14059) and some tweaks to the purge of history, which does not seem to work for ROLLBACK (MDEV-15284). When I worked on MDEV-20612, I got suspicion of inconsistency in the predicate locking for R-tree indexes. Due to these reasons, I would not recommend using InnoDB SPATIAL INDEX on data that is changing frequently. For read-mostly data (or when updates and reads are not nearly concurrent) it could be safe.

            I realize that the ROW_FORMAT=COMPRESSED format (which I would seek to eventually phase out; see MDEV-23497) provides efficient storage format for geometry data, because it is essentially doing the same as MDEV-11371, storing the BLOB as a singly linked list of zlib compressed stream. It is a pity that the MDEV-11371 column compression cannot be combined with any indexes, in ‘normal’ tables.

            marko Marko Mäkelä added a comment - The InnoDB SPATIAL INDEX implementation that was copied from MySQL 5.7.9 to MariaDB Server 10.2.2 is somewhat problematic, because the implementation employs a mix of duplicating B-tree specific code and ‘pushing down’ R-tree specific conditions to B-tree specific code. I think that it would be clearest to let row_sel_sec_rec_is_for_clust_rec() only cover B-tree indexes and to create a separate variant for SPATIAL INDEX . Note: There are problems with the InnoDB implementation of SPATIAL INDEX , especially in the area of MVCC. I do not completely understand the design decisions that were made in MySQL 5.7: making copies of buffer pool pages ( MDEV-14059 ) and some tweaks to the purge of history, which does not seem to work for ROLLBACK ( MDEV-15284 ). When I worked on MDEV-20612 , I got suspicion of inconsistency in the predicate locking for R-tree indexes. Due to these reasons, I would not recommend using InnoDB SPATIAL INDEX on data that is changing frequently. For read-mostly data (or when updates and reads are not nearly concurrent) it could be safe. I realize that the ROW_FORMAT=COMPRESSED format (which I would seek to eventually phase out; see MDEV-23497 ) provides efficient storage format for geometry data, because it is essentially doing the same as MDEV-11371 , storing the BLOB as a singly linked list of zlib compressed stream. It is a pity that the MDEV-11371 column compression cannot be combined with any indexes, in ‘normal’ tables.
            jsantala Jukka Santala added a comment -

            ROW_FORMAT=COMPRESSED is not related to the bug, it was included because I wrote the test-case to use same database as the existing spatial index tests so it might be incorporated into them. I believe it makes no difference for the bug, although frequently changing SPATIAL INDEX column in ROW_FORMAT=COMPRESSED table would clearly be a bad idea for performance reasons alone.

            I would actually prefer to use AriaDB for these kinds of static map-data lookups that are written very rarely, but to my surprise at provisioning time InnoDB spatial indexes were significantly faster for our read-only use cases and we're using AriaDB as a work-around after noticing the problem. I could easily imagine use-cases like hourly geospatial tracks needing indexes within InnoDB transactions, however so I wouldn't say it's irrelevant even for us, though might be topic for another issue.

            Just to be clear, when you say "would not recommend" does that imply failing transaction guarantees which might be tolerable, or is there potential risk of crash/corruption?

            jsantala Jukka Santala added a comment - ROW_FORMAT=COMPRESSED is not related to the bug, it was included because I wrote the test-case to use same database as the existing spatial index tests so it might be incorporated into them. I believe it makes no difference for the bug, although frequently changing SPATIAL INDEX column in ROW_FORMAT=COMPRESSED table would clearly be a bad idea for performance reasons alone. I would actually prefer to use AriaDB for these kinds of static map-data lookups that are written very rarely, but to my surprise at provisioning time InnoDB spatial indexes were significantly faster for our read-only use cases and we're using AriaDB as a work-around after noticing the problem. I could easily imagine use-cases like hourly geospatial tracks needing indexes within InnoDB transactions, however so I wouldn't say it's irrelevant even for us, though might be topic for another issue. Just to be clear, when you say "would not recommend" does that imply failing transaction guarantees which might be tolerable, or is there potential risk of crash/corruption?

            I agree that my test case only requires a large enough @g so that it cannot be stored directly in the clustered index page. When I implemented ROW_FORMAT=COMPRESSED in the InnoDB Plugin for MySQL 5.1 between 2005 and 2007, a design constraint was that a page with a single record must fit even without any compression.

            MDEV-15284 proves that race conditions between rollback and MVCC reads exist (leading to a wrong number of records being counted via a spatial index), I would not personally trust it for anything write-heavy. Correct locking is a prerequisite for many of the letters in ACID. I do not remember encountering MariaDB support customer cases that would involve SPATIAL INDEX in InnoDB tables. Therefore, I assume that they are not widely used in read-write workloads. The lack of customer feedback and the effort involved in fixing this is a reason why there has been no progress in this area.

            marko Marko Mäkelä added a comment - I agree that my test case only requires a large enough @g so that it cannot be stored directly in the clustered index page. When I implemented ROW_FORMAT=COMPRESSED in the InnoDB Plugin for MySQL 5.1 between 2005 and 2007, a design constraint was that a page with a single record must fit even without any compression. MDEV-15284 proves that race conditions between rollback and MVCC reads exist (leading to a wrong number of records being counted via a spatial index), I would not personally trust it for anything write-heavy. Correct locking is a prerequisite for many of the letters in ACID. I do not remember encountering MariaDB support customer cases that would involve SPATIAL INDEX in InnoDB tables. Therefore, I assume that they are not widely used in read-write workloads. The lack of customer feedback and the effort involved in fixing this is a reason why there has been no progress in this area.

            People

              thiru Thirunarayanan Balathandayuthapani
              jsantala Jukka Santala
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.