Our current grammar in sql_yacc.yy uses LEX_STRING to return TEXT_STRING and NCHAR_STRING terminal symbols from the tokenizer, and additionally uses Lex->text_string_is_7bit to know a difference between 7bit and 8bit strings (for optimization purposes).
This approach is error prone. Changes in the grammar that require more look-ahead can put Lex->text_string_is_7bit out of sync from bison variables ($1, $2, $3 etc), so for example Lext->text_string_is_7bit already corresponds to $2 instead of expected $1.
A safe approach would be to return the LEX_STRING and the corresponding 7/8 bit flag as a single structure like this:
and use this structure for TEXT_STRING and NCHAR_STRING.
The problem was revealed by valgrind in the bb-10.2-compatibility branch when extending this rule:
Before making changes in the grammar we should fix this problem.