Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12942

REGEXP_INSTR returns 1 when using brackets

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.1.13, 10.2.6
    • 10.1.25, 10.2.7
    • OTHER
    • None
    • ms windows 7 Home Premium 64 bit, Intel i5 750

    Description

      MariaDB Documentation says about REGEXP_INSTR: Returns the position of the first occurrence of the regular expression pattern in the string subject, or 0 if pattern was not found.

      My task is (simplified): finding o followed by a double consonant

      SELECT REGEXP_INSTR('a_kollision', 'o([lm])\\1'); # -> expected 4 got 1
      

      It returns 1 but I would have expected 4 as the position of oll

      Doing the same with replace works:

      # replace works as expected
      SELECT REGEXP_REPLACE('a_kollision', 'o([lm])\\1', '???'); # -> a_k???ision -- OK
      

      It seems that the trouble starts when using brackets in REGEXP_INSTR:

      SELECT REGEXP_REPLACE('a_kollision', 'oll', '???'); # -> a_k???ision  -- OK
      SELECT REGEXP_REPLACE('a_kollision', '(oll)', '???'); # -> a_k???ision  -- OK
      SELECT REGEXP_INSTR('a_kollision', 'oll'); # -> 4  -- OK
      SELECT REGEXP_INSTR('a_kollision', '(oll)'); # -> 1  -- wrong
      

      Checked with latest stable version 10.2.6, default settings.

      Attachments

        Activity

          KoSchmi Konstantin Schmidt created issue -
          KoSchmi Konstantin Schmidt made changes -
          Field Original Value New Value
          Description MariaDB Documentation says about REGEXP_INSTR: Returns the position of the first occurrence of the regular expression pattern in the string subject, or 0 if pattern was not found.

          My task is (simplified): finding o followed by a double consonant
          {{SELECT REGEXP_INSTR('a_kollision', 'o([lm])\\1'); # -> expected 4 got 1}}

          It returns 1 but I would have expected 4 as the position of oll

          Doing the same with replace works:

          {{# replace works as expected
          SELECT REGEXP_REPLACE('a_kollision', 'o([lm])\\1', '???'); # -> a_k???ision -- OK}}

          It seems that the trouble starts when using brackets in REGEXP_INSTR:

          {{SELECT REGEXP_REPLACE('a_kollision', 'oll', '???'); # -> a_k???ision -- OK
          SELECT REGEXP_REPLACE('a_kollision', '(oll)', '???'); # -> a_k???ision -- OK
          SELECT REGEXP_INSTR('a_kollision', 'oll'); # -> 4 -- OK
          SELECT REGEXP_INSTR('a_kollision', '(oll)'); # -> 1 -- wrong}}

          Checked with latest stable version 10.2.6, default settings.
          MariaDB Documentation says about REGEXP_INSTR: Returns the position of the first occurrence of the regular expression pattern in the string subject, or 0 if pattern was not found.

          My task is (simplified): finding o followed by a double consonant
          {{SELECT REGEXP_INSTR('a_kollision', 'o([lm])\\1'); # -> expected 4 got 1}}

          It returns 1 but I would have expected 4 as the position of oll

          Doing the same with replace works:
          {{
          # replace works as expected
          SELECT REGEXP_REPLACE('a_kollision', 'o([lm])\\1', '???'); # -> a_k???ision -- OK}}

          It seems that the trouble starts when using brackets in REGEXP_INSTR:

          {{SELECT REGEXP_REPLACE('a_kollision', 'oll', '???'); # -> a_k???ision -- OK
          SELECT REGEXP_REPLACE('a_kollision', '(oll)', '???'); # -> a_k???ision -- OK
          SELECT REGEXP_INSTR('a_kollision', 'oll'); # -> 4 -- OK
          SELECT REGEXP_INSTR('a_kollision', '(oll)'); # -> 1 -- wrong}}

          Checked with latest stable version 10.2.6, default settings.
          KoSchmi Konstantin Schmidt made changes -
          Description MariaDB Documentation says about REGEXP_INSTR: Returns the position of the first occurrence of the regular expression pattern in the string subject, or 0 if pattern was not found.

          My task is (simplified): finding o followed by a double consonant
          {{SELECT REGEXP_INSTR('a_kollision', 'o([lm])\\1'); # -> expected 4 got 1}}

          It returns 1 but I would have expected 4 as the position of oll

          Doing the same with replace works:
          {{
          # replace works as expected
          SELECT REGEXP_REPLACE('a_kollision', 'o([lm])\\1', '???'); # -> a_k???ision -- OK}}

          It seems that the trouble starts when using brackets in REGEXP_INSTR:

          {{SELECT REGEXP_REPLACE('a_kollision', 'oll', '???'); # -> a_k???ision -- OK
          SELECT REGEXP_REPLACE('a_kollision', '(oll)', '???'); # -> a_k???ision -- OK
          SELECT REGEXP_INSTR('a_kollision', 'oll'); # -> 4 -- OK
          SELECT REGEXP_INSTR('a_kollision', '(oll)'); # -> 1 -- wrong}}

          Checked with latest stable version 10.2.6, default settings.
          MariaDB Documentation says about REGEXP_INSTR: Returns the position of the first occurrence of the regular expression pattern in the string subject, or 0 if pattern was not found.

          My task is (simplified): finding o followed by a double consonant

          {code:java}
          SELECT REGEXP_INSTR('a_kollision', 'o([lm])\\1'); # -> expected 4 got 1
          {code}


          It returns 1 but I would have expected 4 as the position of oll

          Doing the same with replace works:

          {code:java}
          # replace works as expected
          SELECT REGEXP_REPLACE('a_kollision', 'o([lm])\\1', '???'); # -> a_k???ision -- OK
          {code}


          It seems that the trouble starts when using brackets in REGEXP_INSTR:

          {code:java}
          SELECT REGEXP_REPLACE('a_kollision', 'oll', '???'); # -> a_k???ision -- OK
          SELECT REGEXP_REPLACE('a_kollision', '(oll)', '???'); # -> a_k???ision -- OK
          SELECT REGEXP_INSTR('a_kollision', 'oll'); # -> 4 -- OK
          SELECT REGEXP_INSTR('a_kollision', '(oll)'); # -> 1 -- wrong
          {code}

          Checked with latest stable version 10.2.6, default settings.
          serg Sergei Golubchik made changes -
          Status Open [ 1 ] Confirmed [ 10101 ]
          serg Sergei Golubchik made changes -
          Status Confirmed [ 10101 ] In Progress [ 3 ]
          serg Sergei Golubchik made changes -
          Assignee Sergei Golubchik [ serg ]
          serg Sergei Golubchik made changes -
          Status In Progress [ 3 ] Stalled [ 10000 ]

          bar, do you remember what was the reason for this m_subpatterns_needed in the first place, instead of always using array_elments(m_SubStrVec) ?

          serg Sergei Golubchik added a comment - bar , do you remember what was the reason for this m_subpatterns_needed in the first place, instead of always using array_elments(m_SubStrVec) ?
          serg Sergei Golubchik made changes -
          Assignee Sergei Golubchik [ serg ] Alexander Barkov [ bar ]
          Status Stalled [ 10000 ] In Review [ 10002 ]
          serg Sergei Golubchik made changes -
          Fix Version/s 10.1 [ 16100 ]
          Fix Version/s 10.2 [ 14601 ]

          Sergei,

          I overlooked in the documentation that PCRE uses this buffer not only to return matching subpatterns, but also to store back-references, in the last one third of the buffer.
          http://www.pcre.org/original/doc/html/pcreapi.html

          Your changes looks fine. But perhaps we should just remove m_subpatterns_needed.

          bar Alexander Barkov added a comment - Sergei, I overlooked in the documentation that PCRE uses this buffer not only to return matching subpatterns, but also to store back-references, in the last one third of the buffer. http://www.pcre.org/original/doc/html/pcreapi.html Your changes looks fine. But perhaps we should just remove m_subpatterns_needed.

          Yes, I've removed it in a followup cleanup patch:

          in the cleanup patch I've noticed another bug, where re.init was called with subpatterns_needed=10, but this value should be a multiple of 3 (man pcre_exec).

          serg Sergei Golubchik added a comment - Yes, I've removed it in a followup cleanup patch: https://github.com/MariaDB/server/commit/2372bfaa7b4b9a40e418cbfec480d30eb84eaf21 https://github.com/MariaDB/server/commit/5e0038b376b79ee5a2f47da1e0d71caa7d8fa99c in the cleanup patch I've noticed another bug, where re.init was called with subpatterns_needed=10 , but this value should be a multiple of 3 (man pcre_exec).
          serg Sergei Golubchik made changes -
          Component/s OTHER [ 10125 ]
          Fix Version/s 10.1.25 [ 22542 ]
          Fix Version/s 10.2.7 [ 22543 ]
          Fix Version/s 10.2 [ 14601 ]
          Fix Version/s 10.1 [ 16100 ]
          Resolution Fixed [ 1 ]
          Status In Review [ 10002 ] Closed [ 6 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 80990 ] MariaDB v4 [ 152239 ]

          People

            bar Alexander Barkov
            KoSchmi Konstantin Schmidt
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.