[MDEV-20350] REGEXP REPLACE backreference in a function Created: 2019-08-14  Updated: 2019-08-16  Resolved: 2019-08-16

Status: Closed
Project: MariaDB Server
Component/s: Data types
Affects Version/s: 10.1, 10.1.38, 10.2, 10.3, 10.4
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Lay András Assignee: Alexander Barkov
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

Debian



 Description   

If i use backreference in LPAD function, in the result the padding characters number always the same (2, but why?), not depends on the padded string length:

SELECT REGEXP_REPLACE('aaa 123 bbb 45 ccc 6 ddd', '([0-9]+)', LPAD('\\1', 4, '0'));

aaa 00123 bbb 0045 ccc 006 ddd

How can i reach this result:

aaa 0123 bbb 0045 ccc 0006 ddd



 Comments   
Comment by Elena Stepanova [ 2019-08-15 ]

Thanks for the report.

Reproducible as described, and I don't see any explanation for it.
If it's expected on whatever reason, it probably needs to be documented.

Comment by Alexander Barkov [ 2019-08-16 ]

This is not a bug.
The LPAD() expression is calculated before REGEX_REPLACE.
So you effectively run this query:

SELECT REGEXP_REPLACE('aaa 123 bbb 45 ccc 6 ddd', '([0-9]+)', ' 00\\1');

It always adds 2 zeros to numbers as expected.

As a possible option you can call REGEXP_REPLACE recursively like this:

SELECT REGEXP_REPLACE(
       REGEXP_REPLACE(
       REGEXP_REPLACE('aaa 123 bbb 45 ccc 6 ddd',
        '(?<=[ ])([0-9]{1})(?=[ ])', '000\\1'),
        '(?<=[ ])([0-9]{2})(?=[ ])', '00\\1'),
        '(?<=[ ])([0-9]{3})(?=[ ])', '0\\1') AS res;

+--------------------------------+
| res                            |
+--------------------------------+
| aaa 0123 bbb 0045 ccc 0006 ddd |
+--------------------------------+

Another option is to write a stored function and call REGEXP_REPLACE in a loop, replacing sequences with different lengths on separate iterations.

You may also try to ask on PCRE forums. Perhaps it can be done using a single call with some tricky regular expression syntax.

Comment by Alexander Barkov [ 2019-08-16 ]

I googled for pcre pad all numbers with zeros to certain length and found this link:

https://stackoverflow.com/questions/48891333/pad-number-with-variable-length-zeroes-using-regex

People suggest two alternatives:

  • Call replace recursively, once for each length.
  • Use a too complex regular expression with a dictionary.
Generated at Thu Feb 08 08:58:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.