Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-38712

strnncollsp_nchars() virtual implementations are not correct for PAD SPACE collations

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.6, 11.8, 12.2, 12.3
    • 12.3
    • Character Sets
    • None
    • Related to performance

    Description

      I wrote a program testing strnncollsp_nchar() in combination with PAD SPACE collations:

      #include <my_global.h>
      #include <m_ctype.h>
      #include <my_sys.h>
       
      void test_collation(const char *name)
      {
        CHARSET_INFO *cs= get_charset_by_name(name, MYF(0));
        if (!cs)
        {
          printf("Collation '%s' not found\n", name);
          return;
        }
        /*
          Intentionally don't pass the
          MY_STRNNCOLLSP_NCHARS_EMULATE_TRIMMED_TRAILING_SPACES flag
          to the last argument:
        */
        int cmp= (cs->coll->strnncollsp_nchars)(cs,
                                              (const uchar*) "abc", 3,
                                              (const uchar*) "abc ", 4,
                                               4/*nchars*/, 0/*flags*/);
        printf("cmp=%-10d %s\n", cmp, name);
      }
       
       
      int main()
      {
        my_init();
        test_collation("utf8mb3_uca1400_nopad_ai_ci");
        test_collation("utf8mb3_general_nopad_ci");
        test_collation("utf8mb3_nopad_bin");
        test_collation("latin1_swedish_nopad_ci");
        test_collation("latin1_nopad_bin");
        my_end(0);
        return 0;
      }
      

      and built it using this Makefile:

      SRCDIR=/home/bar/maria-git/12.3.hphsh/
      BUILDDIR=/home/bar/maria-git/12.3.hphsh/BUILD-DEB/
       
      INCLUDES=-I$(SRCDIR)/include/ -I$(BUILDDIR)/include/
      LIB=-L$(BUILDDIR)/mysys/ -L$(BUILDDIR)/dbug/ -L$(BUILDDIR)/strings/
       
      all: test
       
      test: test.cc
              g++ $(INCLUDES) $(LIB) test.cc -o test -lmysys -ldbug -lstrings
       
      clean:
              rm -rf test
      

      The output of the program is:

      cmp=-521       utf8mb3_uca1400_nopad_ai_ci
      cmp=0          utf8mb3_general_nopad_ci
      cmp=0          utf8mb3_nopad_bin
      cmp=0          latin1_swedish_nopad_ci
      cmp=0          latin1_nopad_bin
      

      Notice:

      • utf8mb3_uca1400_nopad_ai_ci correctly reports that the string "abc" is smaller. This is correct.
      • Other collations report that "abc" and "abc " are equal. This is wrong.

      Emulation of paddding of the shorter string "abc" to "abc " (according to nchars=4) should only happen when MY_STRNNCOLLSP_NCHARS_EMULATE_TRIMMED_TRAILING_SPACES is passed. When this flag is not passed, with this input, it should just return what strnncollsp() returns.

      All virtual implementations of strnncollsp_nchars() should be checked and fixed to return a negative result meaning that "abc" is smaller than "abc ".

      After this fix it will be possible to use strnncollsp_nchars() to address the problem reported in MDEV-21543. See the link to the Zulip topic. Without this change a patch for MDEV-21543 can only use strnncollsp_nchars() for NO PAD collations.

      Attachments

        Issue Links

          Activity

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Alexander Barkov Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.