Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10251

ST_INTERSECTS(tbl1.col, tbl2.col) does not use the spatial index

Details

    Description

      Let's say that we have the following tables:

      One table that keeps track of specific geolocation points, which is an InnoDB table:

      CREATE TABLE `point_table` (
        `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
        `Latitude` varchar(100) DEFAULT NULL,
        `Longitude` varchar(100) DEFAULT NULL,
        `geo_location` POINT,
        PRIMARY KEY (`id`),
        KEY `Latitude` (`Latitude`),
        KEY `Longitude` (`Longitude`)
      ) ENGINE=InnoDB;
      

      And one table that keeps track of predefined areas:

      CREATE TABLE `geometry_table` (
        `id` int(11) NOT NULL AUTO_INCREMENT,
        `SHAPE` geometry NOT NULL,
        UNIQUE KEY `id` (`id`),
        SPATIAL KEY `SHAPE` (`SHAPE`)
      ) ENGINE=MyISAM;
      

      Some users want to find out which area each point is in, so they ran queries like this:

      SELECT pt.id, 
      ( SELECT gt.id
        FROM geometry_table gt
        WHERE ST_INTERSECTS( pt.geo_location, gt.shape )
        LIMIT 1
       )
      FROM point_table pt;
      

      This query does not use the spatial index:

      +------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+
      | id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
      +------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+
      |    1 | PRIMARY            | pt    | ALL  | NULL          | NULL | NULL    | NULL | 317301 |             |
      |    2 | DEPENDENT SUBQUERY | gt    | ALL  | SHAPE         | NULL | NULL    | NULL |     35 | Using where |
      +------+--------------------+-------+------+---------------+------+---------+------+--------+-------------+
      

      It will not use the index even if FORCE INDEX is used.

      The users also tried something like this:

      SELECT pt.id, gt.id
      FROM point_table pt
      JOIN geometry_table gt
      ON ST_INTERSECTS( pt.geo_location, gt.shape );
      

      That does not use the spatial index either:

      +------+-------------+-------+------+---------------+------+---------+------+--------+-------------------------------------------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows   | Extra                                           |
      +------+-------------+-------+------+---------------+------+---------+------+--------+-------------------------------------------------+
      |    1 | SIMPLE      | gt    | ALL  | SHAPE         | NULL | NULL    | NULL |     35 |                                                 |
      |    1 | SIMPLE      | pt    | ALL  | NULL          | NULL | NULL    | NULL | 317301 | Using where; Using join buffer (flat, BNL join) |
      +------+-------------+-------+------+---------------+------+---------+------+--------+-------------------------------------------------+
      2 rows in set (0.00 sec)
      

      Attachments

        Activity

          The spatial index is used when one uses a constant:

          MariaDB [test]> explain SELECT gt.id   FROM geometry_table gt   WHERE ST_INTERSECTS(shape, ST_GEOMFROMTEXT('POINT(10 10)'));
          +------+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
          | id   | select_type | table | type  | possible_keys | key   | key_len | ref  | rows | Extra       |
          +------+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
          |    1 | SIMPLE      | gt    | range | SHAPE         | SHAPE | 34      | NULL |    1 | Using where |
          +------+-------------+-------+-------+---------------+-------+---------+------+------+-------------+
          

          As far as I understand, this is a known limitation of the GIS optimizer. R-Tree index only supports "range-like" scans, that is, it only works when a GIS predicate compares an indexed column with a constant.

          It's similar with non-GIS ranges. If we consider the "<" comparison, "t.key < const" is handled efficiently, while "t1.key < t2.col" is not.

          One could argue that "t1.key < t2.col" avoids running cross-joins by using "Range checked for each record" plans. We don't see "range checked for each record" for GIS queries, for some reason. I'd like to check this with holyfoot.

          psergei Sergei Petrunia added a comment - The spatial index is used when one uses a constant: MariaDB [test]> explain SELECT gt.id FROM geometry_table gt WHERE ST_INTERSECTS(shape, ST_GEOMFROMTEXT('POINT(10 10)')); +------+-------------+-------+-------+---------------+-------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+-------+---------+------+------+-------------+ | 1 | SIMPLE | gt | range | SHAPE | SHAPE | 34 | NULL | 1 | Using where | +------+-------------+-------+-------+---------------+-------+---------+------+------+-------------+ As far as I understand, this is a known limitation of the GIS optimizer. R-Tree index only supports "range-like" scans, that is, it only works when a GIS predicate compares an indexed column with a constant. It's similar with non-GIS ranges. If we consider the "<" comparison, "t.key < const" is handled efficiently, while "t1.key < t2.col" is not. One could argue that "t1.key < t2.col" avoids running cross-joins by using "Range checked for each record" plans. We don't see "range checked for each record" for GIS queries, for some reason. I'd like to check this with holyfoot .
          psergei Sergei Petrunia added a comment - - edited

          Wait. Actually with MariaDB 10.1.15, I do get "Range checked for each record":

          MariaDB [test]> explain SELECT pt.id, gt.id FROM point_table pt JOIN geometry_table gt ON ST_INTERSECTS( pt.geo_location, gt.shape );
          +------+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+
          | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                                          |
          +------+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+
          |    1 | SIMPLE      | pt    | ALL  | NULL          | NULL | NULL    | NULL | 1000 |                                                |
          |    1 | SIMPLE      | gt    | ALL  | SHAPE         | NULL | NULL    | NULL | 1000 | Range checked for each record (index map: 0x2) |
          +------+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+
          

          But I don't get it for the subquery:

          MariaDB [test]> explain SELECT pt.id, 
              -> ( SELECT gt.id
              ->   FROM geometry_table gt
              ->   WHERE ST_INTERSECTS( pt.geo_location, gt.shape )
              ->   LIMIT 1
              ->  )
              -> FROM point_table pt;
          +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
          | id   | select_type        | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |
          +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
          |    1 | PRIMARY            | pt    | ALL  | NULL          | NULL | NULL    | NULL | 1000 |             |
          |    2 | DEPENDENT SUBQUERY | gt    | ALL  | SHAPE         | NULL | NULL    | NULL | 1000 | Using where |
          +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+
          

          Also tried with 10.0.26-MariaDB-debug. I get the same results as with 10.1

          psergei Sergei Petrunia added a comment - - edited Wait. Actually with MariaDB 10.1.15, I do get "Range checked for each record": MariaDB [test]> explain SELECT pt.id, gt.id FROM point_table pt JOIN geometry_table gt ON ST_INTERSECTS( pt.geo_location, gt.shape ); +------+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+ | 1 | SIMPLE | pt | ALL | NULL | NULL | NULL | NULL | 1000 | | | 1 | SIMPLE | gt | ALL | SHAPE | NULL | NULL | NULL | 1000 | Range checked for each record (index map: 0x2) | +------+-------------+-------+------+---------------+------+---------+------+------+------------------------------------------------+ But I don't get it for the subquery: MariaDB [test]> explain SELECT pt.id, -> ( SELECT gt.id -> FROM geometry_table gt -> WHERE ST_INTERSECTS( pt.geo_location, gt.shape ) -> LIMIT 1 -> ) -> FROM point_table pt; +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+ | 1 | PRIMARY | pt | ALL | NULL | NULL | NULL | NULL | 1000 | | | 2 | DEPENDENT SUBQUERY | gt | ALL | SHAPE | NULL | NULL | NULL | 1000 | Using where | +------+--------------------+-------+------+---------------+------+---------+------+------+-------------+ Also tried with 10.0.26-MariaDB-debug. I get the same results as with 10.1
          psergei Sergei Petrunia added a comment - Example I've tried with: https://gist.github.com/spetrunia/b9c32a9a6c4880d7e96538f240da5535

          The difference between having "Range checked for each record" for join and not having it for subselect has nothing to do with GIS.
          Here I'm observing it with a non-GIS predicate:

          MariaDB [test]> explain SELECT pt.id,  ( SELECT gt.id   FROM geometry_table gt   WHERE pt.id < gt.id   LIMIT 1  ) FROM point_table pt;
          +------+--------------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
          | id   | select_type        | table | type  | possible_keys | key      | key_len | ref  | rows | Extra                    |
          +------+--------------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
          |    1 | PRIMARY            | pt    | index | NULL          | Latitude | 103     | NULL | 1000 | Using index              |
          |    2 | DEPENDENT SUBQUERY | gt    | index | id            | id       | 4       | NULL | 1000 | Using where; Using index |
          +------+--------------------+-------+-------+---------------+----------+---------+------+------+--------------------------+
          2 rows in set (0.00 sec)
          

          MariaDB [test]> explain SELECT pt.id, gt.id FROM point_table pt JOIN geometry_table gt ON pt.id < gt.id;
          +------+-------------+-------+-------+---------------+----------+---------+------+------+------------------------------------------------+
          | id   | select_type | table | type  | possible_keys | key      | key_len | ref  | rows | Extra                                          |
          +------+-------------+-------+-------+---------------+----------+---------+------+------+------------------------------------------------+
          |    1 | SIMPLE      | pt    | index | PRIMARY       | Latitude | 103     | NULL | 1000 | Using index                                    |
          |    1 | SIMPLE      | gt    | ALL   | id            | NULL     | NULL    | NULL | 1000 | Range checked for each record (index map: 0x1) |
          +------+-------------+-------+-------+---------------+----------+---------+------+------+------------------------------------------------+
          2 rows in set (0.00 sec)
          

          psergei Sergei Petrunia added a comment - The difference between having "Range checked for each record" for join and not having it for subselect has nothing to do with GIS. Here I'm observing it with a non-GIS predicate: MariaDB [test]> explain SELECT pt.id, ( SELECT gt.id FROM geometry_table gt WHERE pt.id < gt.id LIMIT 1 ) FROM point_table pt; +------+--------------------+-------+-------+---------------+----------+---------+------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+-------+-------+---------------+----------+---------+------+------+--------------------------+ | 1 | PRIMARY | pt | index | NULL | Latitude | 103 | NULL | 1000 | Using index | | 2 | DEPENDENT SUBQUERY | gt | index | id | id | 4 | NULL | 1000 | Using where; Using index | +------+--------------------+-------+-------+---------------+----------+---------+------+------+--------------------------+ 2 rows in set (0.00 sec) MariaDB [test]> explain SELECT pt.id, gt.id FROM point_table pt JOIN geometry_table gt ON pt.id < gt.id; +------+-------------+-------+-------+---------------+----------+---------+------+------+------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+----------+---------+------+------+------------------------------------------------+ | 1 | SIMPLE | pt | index | PRIMARY | Latitude | 103 | NULL | 1000 | Using index | | 1 | SIMPLE | gt | ALL | id | NULL | NULL | NULL | 1000 | Range checked for each record (index map: 0x1) | +------+-------------+-------+-------+---------------+----------+---------+------+------+------------------------------------------------+ 2 rows in set (0.00 sec)

          Ok, looking at the original issue, one can tell that it's not at all clear if making ST_INTERSECTS slightly faster would resolve the issue.

          psergei Sergei Petrunia added a comment - Ok, looking at the original issue, one can tell that it's not at all clear if making ST_INTERSECTS slightly faster would resolve the issue.

          Changing the bug attributes accordingly

          It would be nice to have efficient handling for use cases like

          select * from my_path, geom_features 
          where st_interects(my_path.point, geom_features.shape) and my_path.date=today()
          

          but

          • it's a big feature
          • it doesnt seem to be a priority ATM.
          psergei Sergei Petrunia added a comment - Changing the bug attributes accordingly It would be nice to have efficient handling for use cases like select * from my_path, geom_features where st_interects(my_path.point, geom_features.shape) and my_path.date=today() but it's a big feature it doesnt seem to be a priority ATM.

          People

            Unassigned Unassigned
            GeoffMontee Geoff Montee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.