[MDEV-23099] REGEXP_REPLACE(): unexpected result Created: 2020-07-05  Updated: 2023-04-27

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0, 10.1, 10.3.23, 10.2, 10.3, 10.4
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Marc Muelller Assignee: Alexander Barkov
Resolution: Unresolved Votes: 0
Labels: not-10.5, regexp_replace
Environment:

AMD64



 Description   

SELECT REGEXP_REPLACE( 
'zdf_neo_HD_Monk20170218_184400.ts',
'^_*(Pro(7_MAXX|Sieben)|kabel(1|_eins)|(SUPER_)?RTL(_Televivion|2|NITRO|plus)?|Das_(Erste|VIERTE)|(00[12]_)?KiKA|Eins(Festival|Extra|Plus)|zdf(_neo|\.kultur|info(kanal)?|dokukanal)?)(_HD)?_*',
'XXX' );

Maria-DB Result:
zdf_neo_HD_Monk20170218_184400.ts

Expected result (should replace 'zdf_neo_HD' by 'XXX'):
XXXMonk20170218_184400.ts

Omitting one part of the RegExp (e.g. '|kabel(1|_eins)' or '|Eins(Festival|Extra|Plus)') leads to correct results:

SELECT REGEXP_REPLACE( 
'zdf_neo_HD_Monk20170218_184400.ts',
'^_*(Pro(7_MAXX|Sieben)|(SUPER_)?RTL(_Televivion|2|NITRO|plus)?|Das_(Erste|VIERTE)|(00[12]_)?KiKA|Eins(Festival|Extra|Plus)|zdf(_neo|\.kultur|info(kanal)?|dokukanal)?)(_HD)?_*',
'XXX' );
-> XXXMonk20170218_184400.ts

SELECT REGEXP_REPLACE( 
'zdf_neo_HD_Monk20170218_184400.ts',
'^_*(Pro(7_MAXX|Sieben)|kabel(1|_eins)|(SUPER_)?RTL(_Televivion|2|NITRO|plus)?|Das_(Erste|VIERTE)|(00[12]_)?KiKA|zdf(_neo|\.kultur|info(kanal)?|dokukanal)?)(_HD)?_*',
'XXX' );
-> XXXMonk20170218_184400.ts

Result is confirmed on versions 10.3.23 and 10.4.13



 Comments   
Comment by Alice Sherepa [ 2020-07-06 ]

Repeatable on 10.0-10.4, but 10.5 returns the correct result.
10.4:

SELECT REGEXP_REPLACE( 'zdf_neo_HD_Monk.ts','((a)|(a)|(a)(2)|(a)|(a)|(a)|zdf(_neo|(a)))(_HD)*','XXX' );
REGEXP_REPLACE( 'zdf_neo_HD_Monk.ts','((a)|(a)|(a)(2)|(a)|(a)|(a)|zdf(_neo|(a)))(_HD)*','XXX' )
zdf_neo_HD_Monk.ts

10.5

SELECT REGEXP_REPLACE( 'zdf_neo_HD_Monk.ts','((a)|(a)|(a)(2)|(a)|(a)|(a)|zdf(_neo|(a)))(_HD)*','XXX' );
REGEXP_REPLACE( 'zdf_neo_HD_Monk.ts','((a)|(a)|(a)(2)|(a)|(a)|(a)|zdf(_neo|(a)))(_HD)*','XXX' )
XXX_Monk.ts

Comment by Sergei Golubchik [ 2021-10-26 ]

10.0–10.4 uses pcre (version 8.x), 10.5 uses pcre2. http://www.pcre.org/ says

There are two major versions of the PCRE library. The current version, PCRE2, released in 2015, is now at version 10.37.

The older, but still widely deployed PCRE library, originally released in 1997, is at version 8.45. This version of PCRE is now at end of life, and is no longer being actively maintained. Version 8.45 is expected to be the final release of the older PCRE library, and new projects should use PCRE2 instead.

So there is a chance we won't be able to do anything about it, besides advising to upgrade to 10.5 or later.

Generated at Thu Feb 08 09:19:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.