Details
-
Technical task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
None
-
10.2.2-3, 10.2.2-1, 10.2.2-2, 10.2.2-4, 10.1.18
Description
Our current grammar in sql_yacc.yy uses LEX_STRING to return TEXT_STRING and NCHAR_STRING terminal symbols from the tokenizer, and additionally uses Lex->text_string_is_7bit to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in the grammar that require more look-ahead can put Lex->text_string_is_7bit out of sync from bison variables ($1, $2, $3 etc), so for example Lext->text_string_is_7bit already corresponds to $2 instead of expected $1.
A safe approach would be to return the LEX_STRING and the corresponding 7/8 bit flag as a single structure like this:
struct Lex_string_with_metadata_st: public LEX_STRING |
{
|
bool m_is_8bit; |
public: |
void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } |
// Get string repertoire by the 8-bit flag and the character set |
uint repertoire(CHARSET_INFO *cs) const |
{
|
return !m_is_8bit && my_charset_is_ascii_based(cs) ? |
MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
|
}
|
// Get string repertoire by the 8-bit flag, for ASCII-based character sets |
uint repertoire() const |
{
|
return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; |
}
|
};
|
and use this structure for TEXT_STRING and NCHAR_STRING.
The problem was revealed by valgrind in the bb-10.2-compatibility branch when extending this rule:
sp_proc_stmt_return:
|
RETURN_SYM expr
|
;
|
to
sp_proc_stmt_return:
|
RETURN_SYM expr
|
| RETURN_SYM /* from a procedure */
|
;
|
Before making changes in the grammar we should fix this problem.
Attachments
Issue Links
- blocks
-
MDEV-10142 PL/SQL parser
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue blocks |
Description |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes). This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}. |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure. |
Description |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure. |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this: {code:cpp} struct Lex_string_with_metadata_st: public LEX_STRING { bool m_is_8bit; public: void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } // Get string repertoire by the 8-bit flag and the character set uint repertoire(CHARSET_INFO *cs) const { return !m_is_8bit && my_charset_is_ascii_based(cs) ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } // Get string repertoire by the 8-bit flag, for ASCII-based character sets uint repertoire() const { return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } }; {code} |
Description |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this: {code:cpp} struct Lex_string_with_metadata_st: public LEX_STRING { bool m_is_8bit; public: void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } // Get string repertoire by the 8-bit flag and the character set uint repertoire(CHARSET_INFO *cs) const { return !m_is_8bit && my_charset_is_ascii_based(cs) ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } // Get string repertoire by the 8-bit flag, for ASCII-based character sets uint repertoire() const { return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } }; {code} |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this: {code:cpp} struct Lex_string_with_metadata_st: public LEX_STRING { bool m_is_8bit; public: void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } // Get string repertoire by the 8-bit flag and the character set uint repertoire(CHARSET_INFO *cs) const { return !m_is_8bit && my_charset_is_ascii_based(cs) ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } // Get string repertoire by the 8-bit flag, for ASCII-based character sets uint repertoire() const { return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } }; {code} and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}. |
Description |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this: {code:cpp} struct Lex_string_with_metadata_st: public LEX_STRING { bool m_is_8bit; public: void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } // Get string repertoire by the 8-bit flag and the character set uint repertoire(CHARSET_INFO *cs) const { return !m_is_8bit && my_charset_is_ascii_based(cs) ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } // Get string repertoire by the 8-bit flag, for ASCII-based character sets uint repertoire() const { return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } }; {code} and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}. |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this: {code:cpp} struct Lex_string_with_metadata_st: public LEX_STRING { bool m_is_8bit; public: void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } // Get string repertoire by the 8-bit flag and the character set uint repertoire(CHARSET_INFO *cs) const { return !m_is_8bit && my_charset_is_ascii_based(cs) ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } // Get string repertoire by the 8-bit flag, for ASCII-based character sets uint repertoire() const { return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } }; {code} and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}. |
Status | Open [ 1 ] | In Progress [ 3 ] |
Assignee | Alexander Barkov [ bar ] | Sergei Golubchik [ serg ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Description |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this: {code:cpp} struct Lex_string_with_metadata_st: public LEX_STRING { bool m_is_8bit; public: void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } // Get string repertoire by the 8-bit flag and the character set uint repertoire(CHARSET_INFO *cs) const { return !m_is_8bit && my_charset_is_ascii_based(cs) ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } // Get string repertoire by the 8-bit flag, for ASCII-based character sets uint repertoire() const { return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } }; {code} and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}. |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this: {code:cpp} struct Lex_string_with_metadata_st: public LEX_STRING { bool m_is_8bit; public: void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } // Get string repertoire by the 8-bit flag and the character set uint repertoire(CHARSET_INFO *cs) const { return !m_is_8bit && my_charset_is_ascii_based(cs) ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } // Get string repertoire by the 8-bit flag, for ASCII-based character sets uint repertoire() const { return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } }; {code} and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}. The problem was revealed by valgrind when extending this rule: {noformat} sp_proc_stmt_return: RETURN_SYM expr ; {noformat} to {noformat} sp_proc_stmt_return: RETURN_SYM expr | RETURN /* from a procedure */ ; {noformat} |
Description |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this: {code:cpp} struct Lex_string_with_metadata_st: public LEX_STRING { bool m_is_8bit; public: void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } // Get string repertoire by the 8-bit flag and the character set uint repertoire(CHARSET_INFO *cs) const { return !m_is_8bit && my_charset_is_ascii_based(cs) ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } // Get string repertoire by the 8-bit flag, for ASCII-based character sets uint repertoire() const { return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } }; {code} and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}. The problem was revealed by valgrind when extending this rule: {noformat} sp_proc_stmt_return: RETURN_SYM expr ; {noformat} to {noformat} sp_proc_stmt_return: RETURN_SYM expr | RETURN /* from a procedure */ ; {noformat} |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this: {code:cpp} struct Lex_string_with_metadata_st: public LEX_STRING { bool m_is_8bit; public: void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } // Get string repertoire by the 8-bit flag and the character set uint repertoire(CHARSET_INFO *cs) const { return !m_is_8bit && my_charset_is_ascii_based(cs) ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } // Get string repertoire by the 8-bit flag, for ASCII-based character sets uint repertoire() const { return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } }; {code} and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}. The problem was revealed by valgrind when extending this rule: {noformat} sp_proc_stmt_return: RETURN_SYM expr ; {noformat} to {noformat} sp_proc_stmt_return: RETURN_SYM expr | RETURN_SYM /* from a procedure */ ; {noformat} |
Description |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this: {code:cpp} struct Lex_string_with_metadata_st: public LEX_STRING { bool m_is_8bit; public: void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } // Get string repertoire by the 8-bit flag and the character set uint repertoire(CHARSET_INFO *cs) const { return !m_is_8bit && my_charset_is_ascii_based(cs) ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } // Get string repertoire by the 8-bit flag, for ASCII-based character sets uint repertoire() const { return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } }; {code} and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}. The problem was revealed by valgrind when extending this rule: {noformat} sp_proc_stmt_return: RETURN_SYM expr ; {noformat} to {noformat} sp_proc_stmt_return: RETURN_SYM expr | RETURN_SYM /* from a procedure */ ; {noformat} |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this: {code:cpp} struct Lex_string_with_metadata_st: public LEX_STRING { bool m_is_8bit; public: void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } // Get string repertoire by the 8-bit flag and the character set uint repertoire(CHARSET_INFO *cs) const { return !m_is_8bit && my_charset_is_ascii_based(cs) ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } // Get string repertoire by the 8-bit flag, for ASCII-based character sets uint repertoire() const { return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } }; {code} and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}. The problem was revealed by valgrind when extending this rule: {noformat} sp_proc_stmt_return: RETURN_SYM expr ; {noformat} to {noformat} sp_proc_stmt_return: RETURN_SYM expr | RETURN_SYM /* from a procedure */ ; {noformat} Before making changes in the grammar we should fix this problem. |
Description |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this: {code:cpp} struct Lex_string_with_metadata_st: public LEX_STRING { bool m_is_8bit; public: void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } // Get string repertoire by the 8-bit flag and the character set uint repertoire(CHARSET_INFO *cs) const { return !m_is_8bit && my_charset_is_ascii_based(cs) ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } // Get string repertoire by the 8-bit flag, for ASCII-based character sets uint repertoire() const { return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } }; {code} and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}. The problem was revealed by valgrind when extending this rule: {noformat} sp_proc_stmt_return: RETURN_SYM expr ; {noformat} to {noformat} sp_proc_stmt_return: RETURN_SYM expr | RETURN_SYM /* from a procedure */ ; {noformat} Before making changes in the grammar we should fix this problem. |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in the grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this: {code:cpp} struct Lex_string_with_metadata_st: public LEX_STRING { bool m_is_8bit; public: void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } // Get string repertoire by the 8-bit flag and the character set uint repertoire(CHARSET_INFO *cs) const { return !m_is_8bit && my_charset_is_ascii_based(cs) ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } // Get string repertoire by the 8-bit flag, for ASCII-based character sets uint repertoire() const { return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } }; {code} and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}. The problem was revealed by valgrind when extending this rule: {noformat} sp_proc_stmt_return: RETURN_SYM expr ; {noformat} to {noformat} sp_proc_stmt_return: RETURN_SYM expr | RETURN_SYM /* from a procedure */ ; {noformat} Before making changes in the grammar we should fix this problem. |
Component/s | Stored routines [ 13905 ] |
Labels | Compatibility |
Assignee | Sergei Golubchik [ serg ] | Alexander Barkov [ bar ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
issue.field.resolutiondate | 2017-04-04 13:12:04.0 | 2017-04-04 13:12:04.548 |
Fix Version/s | 10.3.0 [ 22127 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Resolution | Fixed [ 1 ] | |
Status | Stalled [ 10000 ] | Closed [ 6 ] |
Description |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in the grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this: {code:cpp} struct Lex_string_with_metadata_st: public LEX_STRING { bool m_is_8bit; public: void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } // Get string repertoire by the 8-bit flag and the character set uint repertoire(CHARSET_INFO *cs) const { return !m_is_8bit && my_charset_is_ascii_based(cs) ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } // Get string repertoire by the 8-bit flag, for ASCII-based character sets uint repertoire() const { return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } }; {code} and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}. The problem was revealed by valgrind when extending this rule: {noformat} sp_proc_stmt_return: RETURN_SYM expr ; {noformat} to {noformat} sp_proc_stmt_return: RETURN_SYM expr | RETURN_SYM /* from a procedure */ ; {noformat} Before making changes in the grammar we should fix this problem. |
Our current grammar in {{sql_yacc.yy}} uses {{LEX_STRING}} to return {{TEXT_STRING}} and {{NCHAR_STRING}} terminal symbols from the tokenizer, and additionally uses {{Lex->text_string_is_7bit}} to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in the grammar that require more look-ahead can put {{Lex->text_string_is_7bit}} out of sync from bison variables ({{$1}}, {{$2}}, {{$3}} etc), so for example {{Lext->text_string_is_7bit}} already corresponds to {{$2}} instead of expected {{$1}}. A safe approach would be to return the {{LEX_STRING}} and the corresponding 7/8 bit flag as a single structure like this: {code:cpp} struct Lex_string_with_metadata_st: public LEX_STRING { bool m_is_8bit; public: void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } // Get string repertoire by the 8-bit flag and the character set uint repertoire(CHARSET_INFO *cs) const { return !m_is_8bit && my_charset_is_ascii_based(cs) ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } // Get string repertoire by the 8-bit flag, for ASCII-based character sets uint repertoire() const { return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; } }; {code} and use this structure for {{TEXT_STRING}} and {{NCHAR_STRING}}. The problem was revealed by valgrind in the {{bb-10.2-compatibility}} branch when extending this rule: {noformat} sp_proc_stmt_return: RETURN_SYM expr ; {noformat} to {noformat} sp_proc_stmt_return: RETURN_SYM expr | RETURN_SYM /* from a procedure */ ; {noformat} Before making changes in the grammar we should fix this problem. |
Labels | Compatibility | Compatibility refactoring |
Parent | MDEV-10764 [ 57940 ] | |
Issue Type | Task [ 3 ] | Technical task [ 7 ] |
Parent | MDEV-10764 [ 57940 ] |
|
Workflow | MariaDB v3 [ 80199 ] | MariaDB v4 [ 151896 ] |