[MDEV-12411] Remove Lex::text_string_is_7bit - Jira

Alexander Barkov created issue - 2017-03-30 18:14

Alexander Barkov made changes - 2017-03-30 18:14

Field	Original Value	New Value
Link		This issue blocks ~~MDEV-10142~~ [ ~~MDEV-10142~~ ]

Alexander Barkov made changes - 2017-03-30 18:16

Description

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}.

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure.

Alexander Barkov made changes - 2017-03-30 18:17

Description

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure.

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

{code:cpp}
struct Lex_string_with_metadata_st: public LEX_STRING
{
  bool m_is_8bit;
public:
  void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
  // Get string repertoire by the 8-bit flag and the character set
  uint repertoire(CHARSET_INFO *cs) const
  {
    return !m_is_8bit && my_charset_is_ascii_based(cs) ?
           MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
  // Get string repertoire by the 8-bit flag, for ASCII-based character sets
  uint repertoire() const
  {
    return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
};
{code}

Alexander Barkov made changes - 2017-03-30 18:18

Description

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

{code:cpp}
struct Lex_string_with_metadata_st: public LEX_STRING
{
  bool m_is_8bit;
public:
  void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
  // Get string repertoire by the 8-bit flag and the character set
  uint repertoire(CHARSET_INFO *cs) const
  {
    return !m_is_8bit && my_charset_is_ascii_based(cs) ?
           MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
  // Get string repertoire by the 8-bit flag, for ASCII-based character sets
  uint repertoire() const
  {
    return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
};
{code}

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

{code:cpp}
struct Lex_string_with_metadata_st: public LEX_STRING
{
  bool m_is_8bit;
public:
  void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
  // Get string repertoire by the 8-bit flag and the character set
  uint repertoire(CHARSET_INFO *cs) const
  {
    return !m_is_8bit && my_charset_is_ascii_based(cs) ?
           MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
  // Get string repertoire by the 8-bit flag, for ASCII-based character sets
  uint repertoire() const
  {
    return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
};
{code}
and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

Alexander Barkov made changes - 2017-03-30 18:18

Description

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

{code:cpp}
struct Lex_string_with_metadata_st: public LEX_STRING
{
  bool m_is_8bit;
public:
  void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
  // Get string repertoire by the 8-bit flag and the character set
  uint repertoire(CHARSET_INFO *cs) const
  {
    return !m_is_8bit && my_charset_is_ascii_based(cs) ?
           MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
  // Get string repertoire by the 8-bit flag, for ASCII-based character sets
  uint repertoire() const
  {
    return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
};
{code}
and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

{code:cpp}
struct Lex_string_with_metadata_st: public LEX_STRING
{
  bool m_is_8bit;
public:
  void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
  // Get string repertoire by the 8-bit flag and the character set
  uint repertoire(CHARSET_INFO *cs) const
  {
    return !m_is_8bit && my_charset_is_ascii_based(cs) ?
           MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
  // Get string repertoire by the 8-bit flag, for ASCII-based character sets
  uint repertoire() const
  {
    return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
};
{code}
and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

Alexander Barkov made changes - 2017-03-30 18:19

Status

Open [ 1 ]

In Progress [ 3 ]

Alexander Barkov made changes - 2017-03-30 18:19

Assignee	Alexander Barkov [ bar ]	Sergei Golubchik [ serg ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Alexander Barkov made changes - 2017-03-30 18:22

Description

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

{code:cpp}
struct Lex_string_with_metadata_st: public LEX_STRING
{
  bool m_is_8bit;
public:
  void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
  // Get string repertoire by the 8-bit flag and the character set
  uint repertoire(CHARSET_INFO *cs) const
  {
    return !m_is_8bit && my_charset_is_ascii_based(cs) ?
           MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
  // Get string repertoire by the 8-bit flag, for ASCII-based character sets
  uint repertoire() const
  {
    return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
};
{code}
and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

{code:cpp}
struct Lex_string_with_metadata_st: public LEX_STRING
{
  bool m_is_8bit;
public:
  void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
  // Get string repertoire by the 8-bit flag and the character set
  uint repertoire(CHARSET_INFO *cs) const
  {
    return !m_is_8bit && my_charset_is_ascii_based(cs) ?
           MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
  // Get string repertoire by the 8-bit flag, for ASCII-based character sets
  uint repertoire() const
  {
    return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
};
{code}
and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

The problem was revealed by valgrind when extending this rule:

{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  ;
{noformat}
to
{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  | RETURN /* from a procedure */
  ;
{noformat}

Alexander Barkov made changes - 2017-03-30 18:23

Description

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

{code:cpp}
struct Lex_string_with_metadata_st: public LEX_STRING
{
  bool m_is_8bit;
public:
  void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
  // Get string repertoire by the 8-bit flag and the character set
  uint repertoire(CHARSET_INFO *cs) const
  {
    return !m_is_8bit && my_charset_is_ascii_based(cs) ?
           MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
  // Get string repertoire by the 8-bit flag, for ASCII-based character sets
  uint repertoire() const
  {
    return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
};
{code}
and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

The problem was revealed by valgrind when extending this rule:

{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  ;
{noformat}
to
{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  | RETURN /* from a procedure */
  ;
{noformat}

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

{code:cpp}
struct Lex_string_with_metadata_st: public LEX_STRING
{
  bool m_is_8bit;
public:
  void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
  // Get string repertoire by the 8-bit flag and the character set
  uint repertoire(CHARSET_INFO *cs) const
  {
    return !m_is_8bit && my_charset_is_ascii_based(cs) ?
           MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
  // Get string repertoire by the 8-bit flag, for ASCII-based character sets
  uint repertoire() const
  {
    return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
};
{code}
and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

The problem was revealed by valgrind when extending this rule:

{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  ;
{noformat}
to
{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  | RETURN_SYM /* from a procedure */
  ;
{noformat}

Alexander Barkov made changes - 2017-03-30 18:24

Description

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

{code:cpp}
struct Lex_string_with_metadata_st: public LEX_STRING
{
  bool m_is_8bit;
public:
  void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
  // Get string repertoire by the 8-bit flag and the character set
  uint repertoire(CHARSET_INFO *cs) const
  {
    return !m_is_8bit && my_charset_is_ascii_based(cs) ?
           MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
  // Get string repertoire by the 8-bit flag, for ASCII-based character sets
  uint repertoire() const
  {
    return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
};
{code}
and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

The problem was revealed by valgrind when extending this rule:

{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  ;
{noformat}
to
{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  | RETURN_SYM /* from a procedure */
  ;
{noformat}

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

{code:cpp}
struct Lex_string_with_metadata_st: public LEX_STRING
{
  bool m_is_8bit;
public:
  void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
  // Get string repertoire by the 8-bit flag and the character set
  uint repertoire(CHARSET_INFO *cs) const
  {
    return !m_is_8bit && my_charset_is_ascii_based(cs) ?
           MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
  // Get string repertoire by the 8-bit flag, for ASCII-based character sets
  uint repertoire() const
  {
    return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
};
{code}
and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

The problem was revealed by valgrind when extending this rule:

{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  ;
{noformat}
to
{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  | RETURN_SYM /* from a procedure */
  ;
{noformat}

Before making changes in the grammar we should fix this problem.

Alexander Barkov made changes - 2017-03-30 18:24

Description

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

{code:cpp}
struct Lex_string_with_metadata_st: public LEX_STRING
{
  bool m_is_8bit;
public:
  void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
  // Get string repertoire by the 8-bit flag and the character set
  uint repertoire(CHARSET_INFO *cs) const
  {
    return !m_is_8bit && my_charset_is_ascii_based(cs) ?
           MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
  // Get string repertoire by the 8-bit flag, for ASCII-based character sets
  uint repertoire() const
  {
    return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
};
{code}
and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

The problem was revealed by valgrind when extending this rule:

{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  ;
{noformat}
to
{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  | RETURN_SYM /* from a procedure */
  ;
{noformat}

Before making changes in the grammar we should fix this problem.

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in the grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

{code:cpp}
struct Lex_string_with_metadata_st: public LEX_STRING
{
  bool m_is_8bit;
public:
  void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
  // Get string repertoire by the 8-bit flag and the character set
  uint repertoire(CHARSET_INFO *cs) const
  {
    return !m_is_8bit && my_charset_is_ascii_based(cs) ?
           MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
  // Get string repertoire by the 8-bit flag, for ASCII-based character sets
  uint repertoire() const
  {
    return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
};
{code}
and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

The problem was revealed by valgrind when extending this rule:

{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  ;
{noformat}
to
{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  | RETURN_SYM /* from a procedure */
  ;
{noformat}

Before making changes in the grammar we should fix this problem.

Alexander Barkov made changes - 2017-03-31 08:14

Component/s

Stored routines [ 13905 ]

Alexander Barkov made changes - 2017-03-31 08:14

Labels

Compatibility

Alexander Barkov made changes - 2017-04-04 12:53

Assignee

Sergei Golubchik [ serg ]

Alexander Barkov [ bar ]

Alexander Barkov made changes - 2017-04-04 13:11

Status

In Review [ 10002 ]

Stalled [ 10000 ]

Alexander Barkov made changes - 2017-04-04 13:12

issue.field.resolutiondate

2017-04-04 13:12:04.0

2017-04-04 13:12:04.548

Alexander Barkov made changes - 2017-04-04 13:12

Fix Version/s		10.3.0 [ 22127 ]
Fix Version/s	10.3 [ 22126 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Alexander Barkov made changes - 2017-04-04 13:13

Description

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in the grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

{code:cpp}
struct Lex_string_with_metadata_st: public LEX_STRING
{
  bool m_is_8bit;
public:
  void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
  // Get string repertoire by the 8-bit flag and the character set
  uint repertoire(CHARSET_INFO *cs) const
  {
    return !m_is_8bit && my_charset_is_ascii_based(cs) ?
           MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
  // Get string repertoire by the 8-bit flag, for ASCII-based character sets
  uint repertoire() const
  {
    return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
};
{code}
and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

The problem was revealed by valgrind when extending this rule:

{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  ;
{noformat}
to
{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  | RETURN_SYM /* from a procedure */
  ;
{noformat}

Before making changes in the grammar we should fix this problem.

Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in the grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}.

A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this:

{code:cpp}
struct Lex_string_with_metadata_st: public LEX_STRING
{
  bool m_is_8bit;
public:
  void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
  // Get string repertoire by the 8-bit flag and the character set
  uint repertoire(CHARSET_INFO *cs) const
  {
    return !m_is_8bit && my_charset_is_ascii_based(cs) ?
           MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
  // Get string repertoire by the 8-bit flag, for ASCII-based character sets
  uint repertoire() const
  {
    return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
};
{code}
and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}.

The problem was revealed by valgrind in the {{bb-10.2-compatibility}} branch when extending this rule:

{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  ;
{noformat}
to
{noformat}
sp_proc_stmt_return:
    RETURN_SYM expr
  | RETURN_SYM /* from a procedure */
  ;
{noformat}

Before making changes in the grammar we should fix this problem.

Alexander Barkov made changes - 2017-04-09 02:53

Labels

Compatibility

Compatibility refactoring

Alexander Barkov made changes - 2017-09-23 06:20

Parent		MDEV-10764 [ 57940 ]
Issue Type	Task [ 3 ]	Technical task [ 7 ]

Alvin Richards (Inactive) made changes - 2017-11-17 06:56

Parent

MDEV-10764 [ 57940 ]

~~MDEV-10142~~ [ 56873 ]

Sergei Golubchik made changes - 2021-12-06 21:44

Workflow

MariaDB v3 [ 80199 ]

MariaDB v4 [ 151896 ]

MariaDB Server

Remove Lex::text_string_is_7bit

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration