[MDEV-17185] regexp_substr handles non capturing groups incorrectly Created: 2018-09-13  Updated: 2018-09-18  Resolved: 2018-09-14

Status: Closed
Project: MariaDB Server
Component/s: N/A
Affects Version/s: 10.2.17
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Kenneth Penza Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

Ubuntu 16.04
MariaDB 10.2.17



 Description   

The behaviour of regexp_substr is inconsistent with PRCE when using non-capturing group.

Test of regular expression using Perl:

$ perl -e '"cl_fl_id1_1" =~ /(?:(?:[a-z]{1,}_){2})([a-z0-9]{1,})/ && print "$1\n"'
id1
$

Test of regular expression using MariaDB:

MariaDB [test]> select regexp_substr('cl_fl_id1_1', '(?:(?:[a-z]{1,}_){2})([a-z0-9]{1,})') matched;
+-----------+
| matched   |
+-----------+
| cl_fl_id1 |
+-----------+
1 row in set (0.00 sec)
MariaDB [test]> 

Mysqld parameters:

$ mysqld --print-defaults
mysqld would have been started with the following arguments:
--user=mysql --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306 --basedir=/usr --datadir=/var/lib/mysql --tmpdir=/tmp --lc_messages_dir=/usr/share/mysql --lc_messages=en_US --skip-external-locking --bind-address=0.0.0.0 --max_connections=100 --connect_timeout=5 --wait_timeout=600 --max_allowed_packet=256M --thread_cache_size=128 --sort_buffer_size=4M --bulk_insert_buffer_size=16M --tmp_table_size=32M --max_heap_table_size=32M --myisam_recover_options=BACKUP --key_buffer_size=128M --table_open_cache=400 --myisam_sort_buffer_size=512M --concurrent_insert=2 --read_buffer_size=2M --read_rnd_buffer_size=1M --query_cache_limit=128K --query_cache_size=64M --log_warnings=2 --slow_query_log_file=/var/log/mysql/mariadb-slow.log --long_query_time=10 --log_slow_verbosity=query_plan --sql_mode=STRICT_ALL_TABLES,ONLY_FULL_GROUP_BY --default_storage_engine=InnoDB --default_tmp_storage_engine=Aria --innodb_buffer_pool_size=256M --innodb_log_buffer_size=8M --innodb_file_per_table=1 --innodb_open_files=400 --innodb_io_capacity=400 --innodb_flush_method=O_DIRECT --innodb_flush_log_at_trx_commit=1 --sync_binlog=1 --histogram_size=128 --use_stat_tables=preferably
$

Testcase:

select regexp_substr('cl_fl_id1_1', '(?:(?:[a-z]{1,}_){2})([a-z0-9]{1,})') matched;



 Comments   
Comment by Alice Sherepa [ 2018-09-14 ]

could you please explain why you consider this is a bug.

perl -e '"cl_fl_id1_1" =~ /(?:(?:[a-z]{1,}_){2})([a-z0-9]{1,})/ && print "$1\n"'

--prints group 1, which is id1.
if I add brackets there, then group 1 is the whole regular expression:

$ perl -e '"cl_fl_id1_1" =~ /((?:(?:[a-z]{1,}_){2})([a-z0-9]{1,}))/ && print "$1\n"'
cl_fl_id1

Comment by Kenneth Penza [ 2018-09-14 ]

It's not a bug, I have misinterpreted the documentation. Went through the documentation, did some further tests and can confirm that the function works as expected.

Generated at Thu Feb 08 08:34:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.