[MDEV-25758] InnoDB spatial indexes miss large geometry fields after MDEV-25459 Created: 2021-05-22  Updated: 2021-06-04  Resolved: 2021-05-27

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2.38, 10.3.29, 10.4.19, 10.5.10, 10.6.1
Fix Version/s: 10.6.2, 10.2.39, 10.3.30, 10.4.20, 10.5.11

Type: Bug Priority: Critical
Reporter: Jukka Santala Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 1
Labels: innodb, regression

Attachments: File jsantala_large_geometry_index.sql    
Issue Links:
Blocks
Duplicate
is duplicated by MDEV-25621 Spatial Index ignores some geometry Closed
Problem/Incident
is caused by MDEV-25459 MVCC read from index on CHAR or VARCH... Closed

 Description   

MDEV-25459 corrected MVCC missing reads from CHAR or VARCHAR indexes, but omits handling InnoDB spatial r-tree indexes in storage/innobase/row/row0sel.cc row_sel_sec_rec_is_for_clust_rec. It seems to work for geometry fields which fit in the key prefix length.

Checking for dict_index_is_spatial(sec_index) in check_for_blob: goto works, although repetitive code. Not familiar enough with the internals to tell if it would be safe to skip the other checks entirely in case of spatial index.

Additionally, row_sel_sec_rec_is_for_blob doesn't seem to work right. When len > REC_VERSION_56_MAX_INDEX_COL_LEN it becomes REC_VERSION_56_MAX_INDEX_COL_LEN + 1, causing assert failure in debug build.



 Comments   
Comment by Marko Mäkelä [ 2021-05-24 ]

Thank you for the report.
Do you have a test case where len would exceed 3072 bytes? That should be the maximum column prefix length in a secondary index, ever since a larger limit was introduced to replace the original one of 768 bytes.

Comment by Jukka Santala [ 2021-05-24 ]

Maximum geometry length varies a bit depending on the primary key, not sure how to easily check actual key length. Our production case is on commercial map data, so I can't share that, but linestring/polygon with 502 points seems to do it. Test case is using the mysql-test/suite/innodb_gis/t/create_spatial_index.test table structure; innodb_read_only_compressed on 10.6, however this is independent of table properies as long as engine is InnoDB and the secondary key is SPATIAL index, of course. Expected result is getting the inserted row for each SELECT, but they return none if spatial secondary index is used, and on DEBUG build it crashes with the assertion. SPATIAL indexes can't define a prefix, so before the MDEV-25459 patch the code would just skip to the spatial-key special case, although it does seem to work for shorter keys now.

Comment by Marko Mäkelä [ 2021-05-24 ]

Thank you, jsantala. I think that the following is a minimal test case. I was not able to reduce the size further by setting innodb_compression_level=0:

--source include/have_innodb.inc
--source include/have_sequence.inc
 
CREATE TABLE t1(l LINESTRING NOT NULL, SPATIAL INDEX(l))
ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1;
 
SELECT GROUP_CONCAT(CONCAT(seq, ' ', seq) SEPARATOR ',') INTO @g
FROM seq_0_to_190;
 
INSERT INTO t1 SET l=ST_GeomFromText(CONCAT('LINESTRING(',@g,',0 0)'));
 
SELECT COUNT(*) FROM t1
WHERE MBRIntersects(GeomFromText('Polygon((0 0,0 10,10 10,10 0,0 0))'), l);
DROP TABLE t1;

Comment by Marko Mäkelä [ 2021-05-25 ]

The InnoDB SPATIAL INDEX implementation that was copied from MySQL 5.7.9 to MariaDB Server 10.2.2 is somewhat problematic, because the implementation employs a mix of duplicating B-tree specific code and ‘pushing down’ R-tree specific conditions to B-tree specific code. I think that it would be clearest to let row_sel_sec_rec_is_for_clust_rec() only cover B-tree indexes and to create a separate variant for SPATIAL INDEX.

Note: There are problems with the InnoDB implementation of SPATIAL INDEX, especially in the area of MVCC. I do not completely understand the design decisions that were made in MySQL 5.7: making copies of buffer pool pages (MDEV-14059) and some tweaks to the purge of history, which does not seem to work for ROLLBACK (MDEV-15284). When I worked on MDEV-20612, I got suspicion of inconsistency in the predicate locking for R-tree indexes. Due to these reasons, I would not recommend using InnoDB SPATIAL INDEX on data that is changing frequently. For read-mostly data (or when updates and reads are not nearly concurrent) it could be safe.

I realize that the ROW_FORMAT=COMPRESSED format (which I would seek to eventually phase out; see MDEV-23497) provides efficient storage format for geometry data, because it is essentially doing the same as MDEV-11371, storing the BLOB as a singly linked list of zlib compressed stream. It is a pity that the MDEV-11371 column compression cannot be combined with any indexes, in ‘normal’ tables.

Comment by Jukka Santala [ 2021-05-25 ]

ROW_FORMAT=COMPRESSED is not related to the bug, it was included because I wrote the test-case to use same database as the existing spatial index tests so it might be incorporated into them. I believe it makes no difference for the bug, although frequently changing SPATIAL INDEX column in ROW_FORMAT=COMPRESSED table would clearly be a bad idea for performance reasons alone.

I would actually prefer to use AriaDB for these kinds of static map-data lookups that are written very rarely, but to my surprise at provisioning time InnoDB spatial indexes were significantly faster for our read-only use cases and we're using AriaDB as a work-around after noticing the problem. I could easily imagine use-cases like hourly geospatial tracks needing indexes within InnoDB transactions, however so I wouldn't say it's irrelevant even for us, though might be topic for another issue.

Just to be clear, when you say "would not recommend" does that imply failing transaction guarantees which might be tolerable, or is there potential risk of crash/corruption?

Comment by Marko Mäkelä [ 2021-05-25 ]

I agree that my test case only requires a large enough @g so that it cannot be stored directly in the clustered index page. When I implemented ROW_FORMAT=COMPRESSED in the InnoDB Plugin for MySQL 5.1 between 2005 and 2007, a design constraint was that a page with a single record must fit even without any compression.

MDEV-15284 proves that race conditions between rollback and MVCC reads exist (leading to a wrong number of records being counted via a spatial index), I would not personally trust it for anything write-heavy. Correct locking is a prerequisite for many of the letters in ACID. I do not remember encountering MariaDB support customer cases that would involve SPATIAL INDEX in InnoDB tables. Therefore, I assume that they are not widely used in read-write workloads. The lack of customer feedback and the effort involved in fixing this is a reason why there has been no progress in this area.

Generated at Thu Feb 08 09:40:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.