PL/SQL parser (MDEV-10142)

[MDEV-12411] Remove Lex::text_string_is_7bit Created: 2017-03-30  Updated: 2018-08-31  Resolved: 2017-04-04

Status: Closed
Project: MariaDB Server
Component/s: Parser, Stored routines
Affects Version/s: None
Fix Version/s: 10.3.0

Type: Technical task Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: Compatibility, refactoring

Issue Links:
Blocks
blocks MDEV-10142 PL/SQL parser Closed
Sprint: 10.2.2-3, 10.2.2-1, 10.2.2-2, 10.2.2-4, 10.1.18

 Description   

Our current grammar in sql_yacc.yy uses LEX_STRING to return TEXT_STRING and NCHAR_STRING terminal symbols from the tokenizer, and additionally uses Lex->text_string_is_7bit to know a difference between 7bit and 8bit strings (for optimization purposes).

This approach is error prone. Changes in the grammar that require more look-ahead can put Lex->text_string_is_7bit out of sync from bison variables ($1, $2, $3 etc), so for example Lext->text_string_is_7bit already corresponds to $2 instead of expected $1.

A safe approach would be to return the LEX_STRING and the corresponding 7/8 bit flag as a single structure like this:

struct Lex_string_with_metadata_st: public LEX_STRING
{
  bool m_is_8bit;
public:
  void set_8bit(bool is_8bit) { m_is_8bit= is_8bit; }
  // Get string repertoire by the 8-bit flag and the character set
  uint repertoire(CHARSET_INFO *cs) const
  {
    return !m_is_8bit && my_charset_is_ascii_based(cs) ?
           MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
  // Get string repertoire by the 8-bit flag, for ASCII-based character sets
  uint repertoire() const
  {
    return !m_is_8bit ? MY_REPERTOIRE_ASCII : MY_REPERTOIRE_UNICODE30;
  }
};

and use this structure for TEXT_STRING and NCHAR_STRING.

The problem was revealed by valgrind in the bb-10.2-compatibility branch when extending this rule:

sp_proc_stmt_return:
    RETURN_SYM expr
  ;

to

sp_proc_stmt_return:
    RETURN_SYM expr
  | RETURN_SYM /* from a procedure */
  ;

Before making changes in the grammar we should fix this problem.



 Comments   
Comment by Alexander Barkov [ 2017-04-04 ]

Adressed Sergei's review suggestions. Pushed to bb-10.2-ext and 10.3.

Generated at Thu Feb 08 07:57:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.