[MDEV-23431] SQL query range on a string field Created: 2020-08-07  Updated: 2020-08-10  Resolved: 2020-08-09

Status: Closed
Project: MariaDB Server
Component/s: Optimizer, Server
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Christopher Gillespie Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

MariaDB 10.5.4 on Linux using MySQL Workbench


Attachments: File CitySrch.sql     Microsoft Word Maine.csv    

 Description   

Hi,

If i run this query it only gives me the Cities that begin with the letter 'A'
SELECT * FROM vagtc_db.CitySrch WHERE City >= "0%" AND City <= "B%" ORDER BY City ASC ;

I would expect it to return all cities that start with 'A and 'B'. It seems to stop at the '<' and not include the '=' with it.

If i run this then i receive everything. But i had to specify the last city in the 'B' range. Or if I use 'C%' then I receive all cities that begin with 'A' and 'B'.
SELECT * FROM vagtc_db.CitySrch WHERE City >= "0%" AND City <= "Bucksport" ORDER BY City ASC ;

Best regards,
Chris



 Comments   
Comment by Varun Gupta (Inactive) [ 2020-08-09 ]

This is not how wildcards work with predicates having operators like >/>=/</ <=.
What you need is to use the LIKE and then the wildcard % would matche any number of characters, including zero.

Took a sample of your data :

MariaDB [test]> CREATE TABLE t1 (
    ->   `City` VARCHAR(30) DEFAULT NULL,
    ->   `State_Name` VARCHAR(30) DEFAULT NULL,
    ->   PRIMARY KEY (City, State_Name)
    -> );
Query OK, 0 rows affected (0.003 sec)
 
MariaDB [test]> 
MariaDB [test]> INSERT INTO t1 VALUES
    -> ("Alfred","Maine"),
    -> ("Anson","Maine"),
    -> ("Biddeford","Maine"),
    -> ("Blue Hill","Maine"),
    -> ("Boothbay Harbor","Maine");
Query OK, 5 rows affected (0.018 sec)
Records: 5  Duplicates: 0  Warnings: 0
 
MariaDB [test]> SELECT * FROM t1 WHERE City LIKE 'A%' OR City LIKE 'B%' ORDER BY City ASC ;
+-----------------+------------+
| City            | State_Name |
+-----------------+------------+
| Alfred          | Maine      |
| Anson           | Maine      |
| Biddeford       | Maine      |
| Blue Hill       | Maine      |
| Boothbay Harbor | Maine      |
+-----------------+------------+
5 rows in set (0.003 sec)

This is how it can be achieved using LIKE command. Also maybe you can find more info here https://mariadb.com/kb/en/like/.

Comment by Christopher Gillespie [ 2020-08-10 ]

I think you missed the point of my statement on >= or <=. If I run the query your way, it does not return anything beginning with B if I use the letter C. It only returns anything beginning with A or C, no B. If you run my sample query replacing B with D you get A through C.

Comment by Varun Gupta (Inactive) [ 2020-08-10 ]

Well what i tried to mean was that for LIKE the wildcard character % matches any number of characters. So if you want something between A -> C

you can try to use LIKE and do OR for all the characters in between:

City LIKE 'A%' OR City Like 'B%' or City Like 'C%'.

But if you use the wildcard character % with >/>=/</<= then it does not mean that it would match any character. It is compared as just a character. Let me show the range that is created internally.

If you are interested in more details on what the ranges are created internally, let me show you the optimizer trace output.

So for the query:

SELECT * FROM vagtc_db.CitySrch WHERE City >= "A%" AND City <= "B%" ORDER BY City ASC 

"range_scan_alternatives": [
                      {
                        "index": "PRIMARY",
                        "ranges": ["(A%) <= (City) <= (B%)"],
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": true,
                        "rows": 2,
                        "cost": 0.551834269,
                        "chosen": true
                      }
                    ],

For the query with LIKE, the range created is like this (using only one predicate here)

SELECT * FROM vagtc_db.CitySrch WHERE City LIKE "A%" ORDER BY City ASC 

"range_scan_alternatives": [
                      {
                        "index": "PRIMARY",
                        "ranges": [
                          "(A\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0) <= (City) <= (A?????????????????????????????)"
                        ],
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": true,
                        "rows": 2,
                        "cost": 0.551834269,
                        "chosen": true
                      }
                    ],

here \0 is min character and ? is max character, this is how wildcard % is handled with LIKE.

Comment by Christopher Gillespie [ 2020-08-10 ]

Based on the trace file you posted. The query optimizer does not see the = sign after the < sign. So the query cannot pull the full range as requested. I found if I use a z% or zzzz% as a part of the query I can get better results, provided it does not match with a field value starting with Cz.

SELECT * FROM vagtc_db.CitySrch WHERE City >= "A%" AND City <= "Cz%" ORDER BY City ASC

This returns everything starting with A through C.

Ultimately, a query like this is to be apart of a stored procedure with two inputs fields. One field value low the other field value high and returning a dataset with all values in-between. The LIKE and OR commands would not work very well in this case.

Generated at Thu Feb 08 09:22:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.