Details

    • 10.2.2-3, 10.2.2-1, 10.2.2-2, 10.2.2-4, 10.1.18

    Description

      Our current grammar in sql_yacc.yy uses LEX_STRING to return TEXT_STRING and NCHAR_STRING terminal symbols from the tokenizer, and additionally uses Lex->text_string_is_7bit to know a difference between 7bit and 8bit strings (for optimization purposes).

      This approach is error prone. Changes in the grammar that require more look-ahead can put Lex->text_string_is_7bit out of sync from bison variables ($1, $2, $3 etc), so for example Lext->text_string_is_7bit already corresponds to $2 instead of expected $1.

      A safe approach would be to return the LEX_STRING and the corresponding 7/8 bit flag as a single structure like this:

      struct Lex_string_with_metadata_st: public LEX_STRING
      {
        bool m_is_8bit;
      public:
        void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
        // Get string repertoire by the 8-bit flag and the character set
        uint repertoire(CHARSET_INFO *cs) const
        {
          return !m_is_8bit && my_charset_is_ascii_based(cs) ?
                 MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
        }
        // Get string repertoire by the 8-bit flag, for ASCII-based character sets
        uint repertoire() const
        {
          return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
        }
      };
      

      and use this structure for TEXT_STRING and NCHAR_STRING.

      The problem was revealed by valgrind in the bb-10.2-compatibility branch when extending this rule:

      sp_proc_stmt_return:
          RETURN_SYM expr
        ;
      

      to

      sp_proc_stmt_return:
          RETURN_SYM expr
        | RETURN_SYM /* from a procedure */
        ;
      

      Before making changes in the grammar we should fix this problem.

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov created issue -
            bar Alexander Barkov made changes -
            Field Original Value New Value
            bar Alexander Barkov made changes -
            Description
            Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}.
            Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure.
            bar Alexander Barkov made changes -
            Description Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure.
            Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

            {code:cpp}
            struct Lex_string_with_metadata_st: public LEX_STRING
            {
              bool m_is_8bit;
            public:
              void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
              // Get string repertoire by the 8-bit flag and the character set
              uint repertoire(CHARSET_INFO *cs) const
              {
                return !m_is_8bit && my_charset_is_ascii_based(cs) ?
                       MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
              // Get string repertoire by the 8-bit flag, for ASCII-based character sets
              uint repertoire() const
              {
                return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
            };
            {code}
            bar Alexander Barkov made changes -
            Description Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

            {code:cpp}
            struct Lex_string_with_metadata_st: public LEX_STRING
            {
              bool m_is_8bit;
            public:
              void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
              // Get string repertoire by the 8-bit flag and the character set
              uint repertoire(CHARSET_INFO *cs) const
              {
                return !m_is_8bit && my_charset_is_ascii_based(cs) ?
                       MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
              // Get string repertoire by the 8-bit flag, for ASCII-based character sets
              uint repertoire() const
              {
                return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
            };
            {code}
            Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

            {code:cpp}
            struct Lex_string_with_metadata_st: public LEX_STRING
            {
              bool m_is_8bit;
            public:
              void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
              // Get string repertoire by the 8-bit flag and the character set
              uint repertoire(CHARSET_INFO *cs) const
              {
                return !m_is_8bit && my_charset_is_ascii_based(cs) ?
                       MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
              // Get string repertoire by the 8-bit flag, for ASCII-based character sets
              uint repertoire() const
              {
                return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
            };
            {code}
            and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.
            bar Alexander Barkov made changes -
            Description Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

            {code:cpp}
            struct Lex_string_with_metadata_st: public LEX_STRING
            {
              bool m_is_8bit;
            public:
              void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
              // Get string repertoire by the 8-bit flag and the character set
              uint repertoire(CHARSET_INFO *cs) const
              {
                return !m_is_8bit && my_charset_is_ascii_based(cs) ?
                       MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
              // Get string repertoire by the 8-bit flag, for ASCII-based character sets
              uint repertoire() const
              {
                return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
            };
            {code}
            and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.
            Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

            {code:cpp}
            struct Lex_string_with_metadata_st: public LEX_STRING
            {
              bool m_is_8bit;
            public:
              void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
              // Get string repertoire by the 8-bit flag and the character set
              uint repertoire(CHARSET_INFO *cs) const
              {
                return !m_is_8bit && my_charset_is_ascii_based(cs) ?
                       MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
              // Get string repertoire by the 8-bit flag, for ASCII-based character sets
              uint repertoire() const
              {
                return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
            };
            {code}
            and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.
            bar Alexander Barkov made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            bar Alexander Barkov made changes -
            Assignee Alexander Barkov [ bar ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            bar Alexander Barkov made changes -
            Description Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

            {code:cpp}
            struct Lex_string_with_metadata_st: public LEX_STRING
            {
              bool m_is_8bit;
            public:
              void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
              // Get string repertoire by the 8-bit flag and the character set
              uint repertoire(CHARSET_INFO *cs) const
              {
                return !m_is_8bit && my_charset_is_ascii_based(cs) ?
                       MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
              // Get string repertoire by the 8-bit flag, for ASCII-based character sets
              uint repertoire() const
              {
                return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
            };
            {code}
            and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.
            Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

            {code:cpp}
            struct Lex_string_with_metadata_st: public LEX_STRING
            {
              bool m_is_8bit;
            public:
              void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
              // Get string repertoire by the 8-bit flag and the character set
              uint repertoire(CHARSET_INFO *cs) const
              {
                return !m_is_8bit && my_charset_is_ascii_based(cs) ?
                       MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
              // Get string repertoire by the 8-bit flag, for ASCII-based character sets
              uint repertoire() const
              {
                return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
            };
            {code}
            and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

            The problem was revealed by valgrind when extending this rule:

            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              ;
            {noformat}
            to
            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              | RETURN /* from a procedure */
              ;
            {noformat}
            bar Alexander Barkov made changes -
            Description Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

            {code:cpp}
            struct Lex_string_with_metadata_st: public LEX_STRING
            {
              bool m_is_8bit;
            public:
              void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
              // Get string repertoire by the 8-bit flag and the character set
              uint repertoire(CHARSET_INFO *cs) const
              {
                return !m_is_8bit && my_charset_is_ascii_based(cs) ?
                       MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
              // Get string repertoire by the 8-bit flag, for ASCII-based character sets
              uint repertoire() const
              {
                return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
            };
            {code}
            and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

            The problem was revealed by valgrind when extending this rule:

            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              ;
            {noformat}
            to
            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              | RETURN /* from a procedure */
              ;
            {noformat}
            Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

            {code:cpp}
            struct Lex_string_with_metadata_st: public LEX_STRING
            {
              bool m_is_8bit;
            public:
              void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
              // Get string repertoire by the 8-bit flag and the character set
              uint repertoire(CHARSET_INFO *cs) const
              {
                return !m_is_8bit && my_charset_is_ascii_based(cs) ?
                       MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
              // Get string repertoire by the 8-bit flag, for ASCII-based character sets
              uint repertoire() const
              {
                return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
            };
            {code}
            and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

            The problem was revealed by valgrind when extending this rule:

            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              ;
            {noformat}
            to
            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              | RETURN_SYM /* from a procedure */
              ;
            {noformat}
            bar Alexander Barkov made changes -
            Description Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

            {code:cpp}
            struct Lex_string_with_metadata_st: public LEX_STRING
            {
              bool m_is_8bit;
            public:
              void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
              // Get string repertoire by the 8-bit flag and the character set
              uint repertoire(CHARSET_INFO *cs) const
              {
                return !m_is_8bit && my_charset_is_ascii_based(cs) ?
                       MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
              // Get string repertoire by the 8-bit flag, for ASCII-based character sets
              uint repertoire() const
              {
                return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
            };
            {code}
            and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

            The problem was revealed by valgrind when extending this rule:

            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              ;
            {noformat}
            to
            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              | RETURN_SYM /* from a procedure */
              ;
            {noformat}
            Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

            {code:cpp}
            struct Lex_string_with_metadata_st: public LEX_STRING
            {
              bool m_is_8bit;
            public:
              void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
              // Get string repertoire by the 8-bit flag and the character set
              uint repertoire(CHARSET_INFO *cs) const
              {
                return !m_is_8bit && my_charset_is_ascii_based(cs) ?
                       MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
              // Get string repertoire by the 8-bit flag, for ASCII-based character sets
              uint repertoire() const
              {
                return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
            };
            {code}
            and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

            The problem was revealed by valgrind when extending this rule:

            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              ;
            {noformat}
            to
            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              | RETURN_SYM /* from a procedure */
              ;
            {noformat}

            Before making changes in the grammar we should fix this problem.
            bar Alexander Barkov made changes -
            Description Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

            {code:cpp}
            struct Lex_string_with_metadata_st: public LEX_STRING
            {
              bool m_is_8bit;
            public:
              void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
              // Get string repertoire by the 8-bit flag and the character set
              uint repertoire(CHARSET_INFO *cs) const
              {
                return !m_is_8bit && my_charset_is_ascii_based(cs) ?
                       MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
              // Get string repertoire by the 8-bit flag, for ASCII-based character sets
              uint repertoire() const
              {
                return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
            };
            {code}
            and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

            The problem was revealed by valgrind when extending this rule:

            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              ;
            {noformat}
            to
            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              | RETURN_SYM /* from a procedure */
              ;
            {noformat}

            Before making changes in the grammar we should fix this problem.
            Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in the grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

            {code:cpp}
            struct Lex_string_with_metadata_st: public LEX_STRING
            {
              bool m_is_8bit;
            public:
              void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
              // Get string repertoire by the 8-bit flag and the character set
              uint repertoire(CHARSET_INFO *cs) const
              {
                return !m_is_8bit && my_charset_is_ascii_based(cs) ?
                       MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
              // Get string repertoire by the 8-bit flag, for ASCII-based character sets
              uint repertoire() const
              {
                return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
            };
            {code}
            and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

            The problem was revealed by valgrind when extending this rule:

            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              ;
            {noformat}
            to
            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              | RETURN_SYM /* from a procedure */
              ;
            {noformat}

            Before making changes in the grammar we should fix this problem.
            bar Alexander Barkov made changes -
            Component/s Stored routines [ 13905 ]
            bar Alexander Barkov made changes -
            Labels Compatibility
            bar Alexander Barkov made changes -
            Assignee Sergei Golubchik [ serg ] Alexander Barkov [ bar ]
            bar Alexander Barkov made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            bar Alexander Barkov made changes -
            issue.field.resolutiondate 2017-04-04 13:12:04.0 2017-04-04 13:12:04.548
            bar Alexander Barkov made changes -
            Fix Version/s 10.3.0 [ 22127 ]
            Fix Version/s 10.3 [ 22126 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            bar Alexander Barkov made changes -
            Description Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in the grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

            {code:cpp}
            struct Lex_string_with_metadata_st: public LEX_STRING
            {
              bool m_is_8bit;
            public:
              void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
              // Get string repertoire by the 8-bit flag and the character set
              uint repertoire(CHARSET_INFO *cs) const
              {
                return !m_is_8bit && my_charset_is_ascii_based(cs) ?
                       MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
              // Get string repertoire by the 8-bit flag, for ASCII-based character sets
              uint repertoire() const
              {
                return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
            };
            {code}
            and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

            The problem was revealed by valgrind when extending this rule:

            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              ;
            {noformat}
            to
            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              | RETURN_SYM /* from a procedure */
              ;
            {noformat}

            Before making changes in the grammar we should fix this problem.
            Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

            This approach is error prone. Changes in the grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

            A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

            {code:cpp}
            struct Lex_string_with_metadata_st: public LEX_STRING
            {
              bool m_is_8bit;
            public:
              void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
              // Get string repertoire by the 8-bit flag and the character set
              uint repertoire(CHARSET_INFO *cs) const
              {
                return !m_is_8bit && my_charset_is_ascii_based(cs) ?
                       MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
              // Get string repertoire by the 8-bit flag, for ASCII-based character sets
              uint repertoire() const
              {
                return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
              }
            };
            {code}
            and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

            The problem was revealed by valgrind in the {{bb-10.2-compatibility}} branch when extending this rule:

            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              ;
            {noformat}
            to
            {noformat}
            sp_proc_stmt_return:
                RETURN_SYM expr
              | RETURN_SYM /* from a procedure */
              ;
            {noformat}

            Before making changes in the grammar we should fix this problem.
            bar Alexander Barkov made changes -
            Labels Compatibility Compatibility refactoring
            bar Alexander Barkov made changes -
            Parent MDEV-10764 [ 57940 ]
            Issue Type Task [ 3 ] Technical task [ 7 ]
            alvinr Alvin Richards (Inactive) made changes -
            Parent MDEV-10764 [ 57940 ] MDEV-10142 [ 56873 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 80199 ] MariaDB v4 [ 151896 ]

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.