[MDEV-12942] REGEXP_INSTR returns 1 when using brackets Created: 2017-05-29  Updated: 2017-05-30  Resolved: 2017-05-30

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 10.1.13, 10.2.6
Fix Version/s: 10.1.25, 10.2.7

Type: Bug Priority: Major
Reporter: Konstantin Schmidt Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None
Environment:

ms windows 7 Home Premium 64 bit, Intel i5 750



 Description   

MariaDB Documentation says about REGEXP_INSTR: Returns the position of the first occurrence of the regular expression pattern in the string subject, or 0 if pattern was not found.

My task is (simplified): finding o followed by a double consonant

SELECT REGEXP_INSTR('a_kollision', 'o([lm])\\1'); # -> expected 4 got 1

It returns 1 but I would have expected 4 as the position of oll

Doing the same with replace works:

# replace works as expected
SELECT REGEXP_REPLACE('a_kollision', 'o([lm])\\1', '???'); # -> a_k???ision -- OK

It seems that the trouble starts when using brackets in REGEXP_INSTR:

SELECT REGEXP_REPLACE('a_kollision', 'oll', '???'); # -> a_k???ision  -- OK
SELECT REGEXP_REPLACE('a_kollision', '(oll)', '???'); # -> a_k???ision  -- OK
SELECT REGEXP_INSTR('a_kollision', 'oll'); # -> 4  -- OK
SELECT REGEXP_INSTR('a_kollision', '(oll)'); # -> 1  -- wrong

Checked with latest stable version 10.2.6, default settings.



 Comments   
Comment by Sergei Golubchik [ 2017-05-29 ]

bar, do you remember what was the reason for this m_subpatterns_needed in the first place, instead of always using array_elments(m_SubStrVec) ?

Comment by Alexander Barkov [ 2017-05-30 ]

Sergei,

I overlooked in the documentation that PCRE uses this buffer not only to return matching subpatterns, but also to store back-references, in the last one third of the buffer.
http://www.pcre.org/original/doc/html/pcreapi.html

Your changes looks fine. But perhaps we should just remove m_subpatterns_needed.

Comment by Sergei Golubchik [ 2017-05-30 ]

Yes, I've removed it in a followup cleanup patch:

in the cleanup patch I've noticed another bug, where re.init was called with subpatterns_needed=10, but this value should be a multiple of 3 (man pcre_exec).

Generated at Thu Feb 08 08:01:41 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.