Details
-
Technical task
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
None
-
10.2.2-3, 10.2.2-1, 10.2.2-2, 10.2.2-4, 10.1.18
Description
Our current grammar in sql_yacc.yy uses LEX_STRING to return TEXT_STRING and NCHAR_STRING terminal symbols from the tokenizer, and additionally uses Lex->text_string_is_7bit to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in the grammar that require more look-ahead can put Lex->text_string_is_7bit out of sync from bison variables ($1, $2, $3 etc), so for example Lext->text_string_is_7bit already corresponds to $2 instead of expected $1.
A safe approach would be to return the LEX_STRING and the corresponding 7/8 bit flag as a single structure like this:
struct Lex_string_with_metadata_st: public LEX_STRING |
{
|
bool m_is_8bit; |
public: |
void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; } |
// Get string repertoire by the 8-bit flag and the character set |
uint repertoire(CHARSET_INFO *cs) const |
{
|
return !m_is_8bit && my_charset_is_ascii_based(cs) ? |
MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
|
}
|
// Get string repertoire by the 8-bit flag, for ASCII-based character sets |
uint repertoire() const |
{
|
return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30; |
}
|
};
|
and use this structure for TEXT_STRING and NCHAR_STRING.
The problem was revealed by valgrind in the bb-10.2-compatibility branch when extending this rule:
sp_proc_stmt_return:
|
RETURN_SYM expr
|
;
|
to
sp_proc_stmt_return:
|
RETURN_SYM expr
|
| RETURN_SYM /* from a procedure */
|
;
|
Before making changes in the grammar we should fix this problem.
Attachments
Issue Links
- blocks
-
MDEV-10142 PL/SQL parser
- Closed