[MDEV-12403] REGEXP: performance can be very very poor Created: 2017-03-30 Updated: 2017-04-02 Resolved: 2017-03-30 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Server |
| Affects Version/s: | 10.1.22 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Minor |
| Reporter: | Pascal HOARAU | Assignee: | Sergei Golubchik |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Environment: |
centos 7, CPU E5-1630 v3 @ 3.70GHz, 64 GB RAM, RAID1 SSD |
||
| Description |
|
Horrible performances when using REGEXP
|
| Comments |
| Comment by Sergei Golubchik [ 2017-03-30 ] |
|
Well, sure. Regular expressions can be slow. There are lots of articles in the internet and many books about how to write fast regexes. Also, you can first pre-filter your rows with a fast LIKE — just as you did! LIKE on a constant substring uses Boyer–Moore algorithm and should, probably, be faster than anything you can do with a regex. |
| Comment by Sergei Golubchik [ 2017-03-30 ] |
|
Technically, MariaDB could look inside a regex and automatically apply LIKE optimization. But it generally treats regexes as black boxes, and directly invokes pcre for them. If pcre would have an API to parse a regex and extract substrings that must always be present, we could've use it for optimizing regexes. |
| Comment by Pascal HOARAU [ 2017-03-30 ] |
|
more than 3mn for a powerful server and a table containing 13 000 rows is a real bug for me ! |
| Comment by Daniel Black [ 2017-03-31 ] |
|
I suggest you use a https://mariadb.com/kb/en/mariadb/virtual-computed-columns/ to extract (using REGEX_REPLACE) the amazonses.com part of the from the return_path column as a new column and index this. Then change the query to lookup this value in the generated column will use the new index. |