[MDEV-12518] Unify sql_yacc.yy and sql_yacc_ora.yy Created: 2017-04-18  Updated: 2020-05-05  Resolved: 2019-12-04

Status: Closed
Project: MariaDB Server
Component/s: Parser
Fix Version/s: 10.5.1

Type: Task Priority: Critical
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-21073 Collect different grammar rules into ... Closed
is blocked by MDEV-14415 Add Oracle-style FOR loop to sql_mode... Closed
is blocked by MDEV-17652 Add sql_mode specific tokens for some... Closed
is blocked by MDEV-17660 sql_mode=ORACLE: Some keywords do not... Closed
is blocked by MDEV-17661 Add sql_mode specific tokens for the ... Closed
is blocked by MDEV-17664 Add sql_mode specific tokens for ':' ... Closed
is blocked by MDEV-17666 sql_mode=ORACLE: Keyword ELSEIF shoul... Closed
is blocked by MDEV-17669 Add sql_mode specific tokens for the ... Closed
is blocked by MDEV-17687 Add sql_mode specific tokens for keyw... Closed
is blocked by MDEV-17693 Shift/reduce conflicts for NAMES,ROLE... Closed
is blocked by MDEV-17694 Add method LEX::sp_proc_stmt_statemen... Closed
is blocked by MDEV-18767 Port "MDEV-16294: INSTALL PLUGIN IF N... Closed
is blocked by MDEV-18779 Port sp_suid implementation from sql_... Closed
is blocked by MDEV-18789 Port "MDEV-7773 Aggregate stored func... Closed
is blocked by MDEV-18796 Synchronize PS grammar between sql_ya... Closed
is blocked by MDEV-18813 PROCEDURE and anonymous blocks silent... Closed
is blocked by MDEV-19533 Add methods make() and append_uniq() ... Closed
is blocked by MDEV-19535 sql_mode=ORACLE: 'SELECT INTO @var FO... Closed
is blocked by MDEV-20913 sql_mode=ORACLE: INET6 does not work ... Closed
is blocked by MDEV-20924 Unify grammar rules: field_type_strin... Closed
is blocked by MDEV-20985 Add LEX methods stmt_drop_{function|p... Closed
is blocked by MDEV-21023 Move LEX methods and related function... Closed
is blocked by MDEV-21043 Collect different bison %type declara... Closed
is blocked by MDEV-21064 Add a new class sp_expr_lex and a new... Closed
is blocked by MDEV-21110 Unify turn_parser_debug_on() in sql_y... Closed
Relates
relates to MDEV-10142 PL/SQL parser Closed
relates to MDEV-10485 "Unreserve" MariaDB reserved keywords... Open
relates to MDEV-10764 PL/SQL parser - Phase 2 Open

 Description   

Unify sql_yacc.yy and sql_yacc_ora.yy

Maintaining two independent bison grammar files is difficult.

We should eventually:

  • either get rid of the second bison grammar file sql_yacc_ora.yy
  • or generate sql_yacc_ora.yy from sql_yacc.yy, for example with help of CMake utilities

In order to make this possible, we should start with making the two grammar as close to each other as possible.

As the measure of similarity we can use the following shell command:

(
diff -u sql_yacc.yy sql_yacc_ora.yy |grep @@|wc -l
diff -u sql_yacc.yy sql_yacc_ora.yy |wc -l
diff -u sql_yacc.yy sql_yacc_ora.yy |wc -c
)

It returns three numbers:

  • the number of chunks in unified diff
  • the number of lines in unified diff
  • the number of bytes in unified diff

As of 2017-04-18 the above command returns:

112
3039
94157

The difference between two grammar files include:

General

  • sql_yacc.yy contains 4 global functions, e.g. turn_parser_debug_on()
  • sql_yacc.yy around 11 LEX methods, e.g. LEX::case_stmt_action_expr()
  • Different number of shift/reduce conflicts
  • Different set of %type directives
  • sql_yacc_ora.yy has different rules for allowed keywords in identifiers: SP variables and labels: ident_directly_assignable, keyword_directly_assignable, keyword_directly_not_assignable, keyword, keyword_sp, keyword_sp_data_type

Data types

  • sql_yacc_ora.yy additionally understands RAW, NUMBER, VARCHAR2 data types
  • sql_yacc_ora.yy translates DATE to DATETIME
  • sql_yacc_ora.yy translates BLOB to LONGBLOB
  • sql_yacc_ora.yy splits field_type into field_type_numeric, field_type_temporal, field_type_string, field_type_lob, field_type_misc. This is to allow VARCHAR without length in SP parameters.
  • row_type_body vs field_type_row
  • sp_param_name_and_type in sql_yacc_ora.yy uses a new rule sp_param_type_with_opt_collate instead of type_with_opt_collate, because in stored procedure parameters length for the VARCHAR and VARCHAR2 data type is optional. sp_param_field_type_string, field_type, opt_field_length_default_sp_param_varchar, opt_field_length_default_sp_param_char
  • sql_yacc_ora.yy allows to cast as VARCHAR and VARCHAR2: cast_type

Statements

  • sql_yacc_ora.yy does not have begin in verb_clause
  • sql_yacc_ora.yy does not have a separate rule sp_unlabeled_block_not_atomic. BEGIN starts an anonymous block rather than a transaction
  • sql_yacc_ora.yy additionally has set_assign in statement
  • sql_yacc_ora.yy additionally has raise_stmt in statement
  • sql_yacc_ora.yy extends TRUNCATE with an optional storage clause: truncate, opt_truncate_table_storage_clause
  • sql_yacc_ora.yy allows to call stored procedures from other stored routines without using the keyword CALL: sp_statement

Flow control

  • sql_yacc_ora.yy additionally supports FOR loops: sp_labeled_control, sp_unlabeled_control.
  • WHILE expr DO..END WHILE vs WHILE expr LOOP..END LOOP
  • sql_yacc_ora.yy supports GOTO: sp_proc_stmt_goto, sp_proc_stmts1_implicit_block, trigger_tail, sp_tail
  • ELSEIF vs ELSIF

Declarations

  • sql_yacc_ora.yy supports variable declarations after cursor declarations
  • Different position of sp_opt_inout in sp_pdparam
  • sql_yacc_ora.yy supports IN OUT as two separate words in sp_opt_inout
  • Declarations in sql_yacc_ora.yy reside in a separate block section DECLARE..BEGIN, or AS..BEGIN. In sql_yacc.yy DECLARE is a directive in the beginning of BEGIN..END block.
  • sql_yacc_ora.yy has the EXCEPTION section in SP sp_labeled_block, sp_block_statements_and_exceptions, opt_exception_clause,
  • sql_yacc_ora.yy supports user defined exception declaration: signal_value, sp_hcond
  • TYPE OF name vs name%TYPE and ROW TYPE OF name vs name%ROWTYPE

Functions and operators

  • Different semantics and syntax for the function DECODE: column_default_non_parenthesized_expr, decode_when_list
  • Different Item types created to handle concatenation: simple_expr
  • sql_yacc_ora.yy does not support % as a synonym for MOD. % is used for cursor attributes. '%' has a %left precedence in sql_yacc.yy. '%' is not in a %left directive in sql_yacc_ora.yy. bit_expr
  • sql_yacc_ora.yy has implicit cursor attributes: sp_cursor_name_and_offset, explicit_cursor_attr, simple_expr, function_call_keyword
  • sql_yacc_ora.yy has explicit cursor attributes: function_call_keyword

Other syntax differences

  • sql_yacc_ora.yy support UNIQUE_SYM as a synonym to DISTICT
  • sql_yacc_ora.yy additionally supports an optional SP name after the last END keyword, for easier SP code readability: opt_sp_name, sp_tail_standalone, sf_tail_standalone.
  • SQLEXCEPTION_SYN vs OTHERS_SYM in sp_hcond
  • Different label format: sp_block_label, sp_labeled_control,
  • Different PS parameter syntax ? vs :1, :name: param_marker, colon_with_pos, simple_ident_q2, internal_variable_name_directly_assignable
  • sql_yacc_ora.yy supports empty SP parameter list without parentheses. This adds rules sp_no_param, opt_sp_parenthesized_fdparam_list, opt_sp_parenthesized_pdparam_list
  • sql_yacc_ora.yy removes the rule opt_savepoint and embeds it directly to rollback to remove a shift/reduce conflict

Directions

We can use the following directions to unify the two grammar files:

1. Move the different code from .yy files to .h or .cc files.
Examples:

  • global functions, e.g. turn_parser_debug_on()
  • LEX methods, e.g. LEX::case_stmt_action_expr()
  • Different Item types created to handle concatenation

2. Port missing features from sql_yacc_ora.yy to sql_yacc.yy

  • GOTO
  • variable declarations after cursor declarations

3. Move the different grammar rules to the end of *.yy files, so they reside in one chunk which can be easily removed or replaced. This will be helpful if we decide to generate sql_yacc_ora.yy from sql_yacc.yy.

4. Token substitution in MYSQLlex() in sql_lex.cc when sql_mode=ORACLE.
Examples:

  • BEGIN_SYM could be substituted to ORA_BEGIN_SYM, to handle BEGIN as a transaction start and BEGIN as an anonymous block start.
  • The token % can be substituted to ORA_PERCENT_SIGN, to handle the MOD operator vs cursor attributes.

5. Raise syntax errors conditionally, depending on sql_mode
Examples:

  • We could unify the grammar for anonymous blocks by adding the DECLARE_SYM branch to sql_yacc.yy, but then generate a syntax error if sql_mode is not ORACLE:

    sp_unlabeled_block:
            ...
            | DECLARE_SYM
              {
                if (!(thd->variables.sql_mode & MODE_ORACLE)
                {
                  thd->syntax_error();
                  MYSQL_YYABORT;
                }
              }
    

6. Copy safe grammar changes between sql_yacc_ora.yy and sql_yacc.yy
Examples:

  • The field_type rules in sql_yacc_ora.yy was split into parts:

    field_type:
              field_type_numeric
            | field_type_temporal
            | field_type_string
            | field_type_lob
            | field_type_misc
            ;
    

    This was done to be able to have VARCHAR without length as an SP parameter. This split is not really necessary in sql_yacc.yy, but it is not harmful and can be safely copied from sql_yacc_ora.yy to sql_yacc.yy.

  • The field_type_row rule in sql_yacc_ora.yy includes ROW_SYM:

    field_type_row:
              ROW_SYM '(' row_field_definition_list ')' { $$= $3; }
            ;
    

    In sql_yacc.yy it was replaced to row_type_body without ROW_SYM:

    row_type_body:
             '(' row_field_definition_list ')' { $$= $2; }
            ;
    

    This was done to avoid shift/reduce conflicts in this rule:

    sp_decl_variable_list:
              sp_decl_idents_init_vars type_with_opt_collate sp_opt_default {...}
            | sp_decl_idents_init_vars TYPE_SYM OF_SYM optionally_qualified_column_ident sp_opt_default {...}
            | sp_decl_idents_init_vars ROW_SYM TYPE_SYM OF_SYM optionally_qualified_column_ident sp_opt_default {...}
            | sp_decl_idents_init_vars ROW_SYM row_type_body sp_opt_default {...}
            ;
    

    to handle the following declarations without conflicts:

    DECLARE
      var ROW (a INT);
      var ROW TYPE OF t1.x;
    

This change is not really necessary for sql_yacc_ora.yy, but it's not harmful and can be copied from sql_yacc.yy to sql_yacc_ora.yy.



 Comments   
Comment by Alexander Barkov [ 2019-03-11 ]

Hi ralf.gebhardt@mariadb.com, I've been working on this. Hoped to have it in 10.4.

Comment by Alexander Barkov [ 2019-10-30 ]

julien.fritsch, this one is really time consuming. I'm working in it in small steps.

After Barcelona I have already pushed three dependency patches:

  • MDEV-20913 sql_mode=ORACLE: INET6 does not work as a routine parameter type and return type
  • MDEV-20924 Unify grammar rules: field_type_string and sp_param_field_type_string
  • sql_yacc_ora.yy: syncing grammar for "MDEV-18844 Implement EXCEPT ALL and INTERSECT ALL operations"

Sorry, can't give a good estimation. But this should certainly be done in 10.5.

Comment by Alexander Barkov [ 2019-11-26 ]

serg, please review:

https://github.com/MariaDB/server/commits/bb-10.5-mdev-12518

There are two commits.

Comment by Sergei Golubchik [ 2019-12-02 ]

Ugh. This isn't nice, but I suspect it's the best we can do now.

Why did you use both

//%{#ifdef SQL_MODE_ORACLE_SPECIFIC

and

/* Start SQL_MODE_ORACLE_SPECIFIC */

it would be better to use only one set of markers.

Comment by Alexander Barkov [ 2019-12-03 ]

Fixed to use this comment style only:

/* Start SQL_MODE_ORACLE_SPECIFIC */

Generated at Thu Feb 08 07:58:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.