Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12518

Unify sql_yacc.yy and sql_yacc_ora.yy

    Details

    • Type: Task
    • Status: Open (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Fix Version/s: 10.5
    • Component/s: Parser
    • Labels:
      None

      Description

      Unify sql_yacc.yy and sql_yacc_ora.yy

      Maintaining two independent bison grammar files is difficult.

      We should eventually:

      • either get rid of the second bison grammar file sql_yacc_ora.yy
      • or generate sql_yacc_ora.yy from sql_yacc.yy, for example with help of CMake utilities

      In order to make this possible, we should start with making the two grammar as close to each other as possible.

      As the measure of similarity we can use the following shell command:

      (
      diff -u sql_yacc.yy sql_yacc_ora.yy |grep @@|wc -l
      diff -u sql_yacc.yy sql_yacc_ora.yy |wc -l
      diff -u sql_yacc.yy sql_yacc_ora.yy |wc -c
      )
      

      It returns three numbers:

      • the number of chunks in unified diff
      • the number of lines in unified diff
      • the number of bytes in unified diff

      As of 2017-04-18 the above command returns:

      112
      3039
      94157
      

      The difference between two grammar files include:

      General

      • sql_yacc.yy contains 4 global functions, e.g. turn_parser_debug_on()
      • sql_yacc.yy around 11 LEX methods, e.g. LEX::case_stmt_action_expr()
      • Different number of shift/reduce conflicts
      • Different set of %type directives
      • sql_yacc_ora.yy has different rules for allowed keywords in identifiers: SP variables and labels: ident_directly_assignable, keyword_directly_assignable, keyword_directly_not_assignable, keyword, keyword_sp, keyword_sp_data_type

      Data types

      • sql_yacc_ora.yy additionally understands RAW, NUMBER, VARCHAR2 data types
      • sql_yacc_ora.yy translates DATE to DATETIME
      • sql_yacc_ora.yy translates BLOB to LONGBLOB
      • sql_yacc_ora.yy splits field_type into field_type_numeric, field_type_temporal, field_type_string, field_type_lob, field_type_misc. This is to allow VARCHAR without length in SP parameters.
      • row_type_body vs field_type_row
      • sp_param_name_and_type in sql_yacc_ora.yy uses a new rule sp_param_type_with_opt_collate instead of type_with_opt_collate, because in stored procedure parameters length for the VARCHAR and VARCHAR2 data type is optional. sp_param_field_type_string, field_type, opt_field_length_default_sp_param_varchar, opt_field_length_default_sp_param_char
      • sql_yacc_ora.yy allows to cast as VARCHAR and VARCHAR2: cast_type

      Statements

      • sql_yacc_ora.yy does not have begin in verb_clause
      • sql_yacc_ora.yy does not have a separate rule sp_unlabeled_block_not_atomic. BEGIN starts an anonymous block rather than a transaction
      • sql_yacc_ora.yy additionally has set_assign in statement
      • sql_yacc_ora.yy additionally has raise_stmt in statement
      • sql_yacc_ora.yy extends TRUNCATE with an optional storage clause: truncate, opt_truncate_table_storage_clause
      • sql_yacc_ora.yy allows to call stored procedures from other stored routines without using the keyword CALL: sp_statement

      Flow control

      • sql_yacc_ora.yy additionally supports FOR loops: sp_labeled_control, sp_unlabeled_control.
      • WHILE expr DO..END WHILE vs WHILE expr LOOP..END LOOP
      • sql_yacc_ora.yy supports GOTO: sp_proc_stmt_goto, sp_proc_stmts1_implicit_block, trigger_tail, sp_tail
      • ELSEIF vs ELSIF

      Declarations

      • sql_yacc_ora.yy supports variable declarations after cursor declarations
      • Different position of sp_opt_inout in sp_pdparam
      • sql_yacc_ora.yy supports IN OUT as two separate words in sp_opt_inout
      • Declarations in sql_yacc_ora.yy reside in a separate block section DECLARE..BEGIN, or AS..BEGIN. In sql_yacc.yy DECLARE is a directive in the beginning of BEGIN..END block.
      • sql_yacc_ora.yy has the EXCEPTION section in SP sp_labeled_block, sp_block_statements_and_exceptions, opt_exception_clause,
      • sql_yacc_ora.yy supports user defined exception declaration: signal_value, sp_hcond
      • TYPE OF name vs name%TYPE and ROW TYPE OF name vs name%ROWTYPE

      Functions and operators

      • Different semantics and syntax for the function DECODE: column_default_non_parenthesized_expr, decode_when_list
      • Different Item types created to handle concatenation: simple_expr
      • sql_yacc_ora.yy does not support % as a synonym for MOD. % is used for cursor attributes. '%' has a %left precedence in sql_yacc.yy. '%' is not in a %left directive in sql_yacc_ora.yy. bit_expr
      • sql_yacc_ora.yy has implicit cursor attributes: sp_cursor_name_and_offset, explicit_cursor_attr, simple_expr, function_call_keyword
      • sql_yacc_ora.yy has explicit cursor attributes: function_call_keyword

      Other syntax differences

      • sql_yacc_ora.yy support UNIQUE_SYM as a synonym to DISTICT
      • sql_yacc_ora.yy additionally supports an optional SP name after the last END keyword, for easier SP code readability: opt_sp_name, sp_tail_standalone, sf_tail_standalone.
      • SQLEXCEPTION_SYN vs OTHERS_SYM in sp_hcond
      • Different label format: sp_block_label, sp_labeled_control,
      • Different PS parameter syntax ? vs :1, :name: param_marker, colon_with_pos, simple_ident_q2, internal_variable_name_directly_assignable
      • sql_yacc_ora.yy supports empty SP parameter list without parentheses. This adds rules sp_no_param, opt_sp_parenthesized_fdparam_list, opt_sp_parenthesized_pdparam_list
      • sql_yacc_ora.yy removes the rule opt_savepoint and embeds it directly to rollback to remove a shift/reduce conflict

      Directions

      We can use the following directions to unify the two grammar files:

      1. Move the different code from .yy files to .h or .cc files.
      Examples:

      • global functions, e.g. turn_parser_debug_on()
      • LEX methods, e.g. LEX::case_stmt_action_expr()
      • Different Item types created to handle concatenation

      2. Port missing features from sql_yacc_ora.yy to sql_yacc.yy

      • GOTO
      • variable declarations after cursor declarations

      3. Move the different grammar rules to the end of *.yy files, so they reside in one chunk which can be easily removed or replaced. This will be helpful if we decide to generate sql_yacc_ora.yy from sql_yacc.yy.

      4. Token substitution in MYSQLlex() in sql_lex.cc when sql_mode=ORACLE.
      Examples:

      • BEGIN_SYM could be substituted to ORA_BEGIN_SYM, to handle BEGIN as a transaction start and BEGIN as an anonymous block start.
      • The token % can be substituted to ORA_PERCENT_SIGN, to handle the MOD operator vs cursor attributes.

      5. Raise syntax errors conditionally, depending on sql_mode
      Examples:

      • We could unify the grammar for anonymous blocks by adding the DECLARE_SYM branch to sql_yacc.yy, but then generate a syntax error if sql_mode is not ORACLE:

        sp_unlabeled_block:
                ...
                | DECLARE_SYM
                  {
                    if (!(thd->variables.sql_mode & MODE_ORACLE)
                    {
                      thd->syntax_error();
                      MYSQL_YYABORT;
                    }
                  }
        

      6. Copy safe grammar changes between sql_yacc_ora.yy and sql_yacc.yy
      Examples:

      • The field_type rules in sql_yacc_ora.yy was split into parts:

        field_type:
                  field_type_numeric
                | field_type_temporal
                | field_type_string
                | field_type_lob
                | field_type_misc
                ;
        

        This was done to be able to have VARCHAR without length as an SP parameter. This split is not really necessary in sql_yacc.yy, but it is not harmful and can be safely copied from sql_yacc_ora.yy to sql_yacc.yy.

      • The field_type_row rule in sql_yacc_ora.yy includes ROW_SYM:

        field_type_row:
                  ROW_SYM '(' row_field_definition_list ')' { $$= $3; }
                ;
        

        In sql_yacc.yy it was replaced to row_type_body without ROW_SYM:

        row_type_body:
                 '(' row_field_definition_list ')' { $$= $2; }
                ;
        

        This was done to avoid shift/reduce conflicts in this rule:

        sp_decl_variable_list:
                  sp_decl_idents_init_vars type_with_opt_collate sp_opt_default {...}
                | sp_decl_idents_init_vars TYPE_SYM OF_SYM optionally_qualified_column_ident sp_opt_default {...}
                | sp_decl_idents_init_vars ROW_SYM TYPE_SYM OF_SYM optionally_qualified_column_ident sp_opt_default {...}
                | sp_decl_idents_init_vars ROW_SYM row_type_body sp_opt_default {...}
                ;
        

        to handle the following declarations without conflicts:

        DECLARE
          var ROW (a INT);
          var ROW TYPE OF t1.x;
        

      This change is not really necessary for sql_yacc_ora.yy, but it's not harmful and can be copied from sql_yacc.yy to sql_yacc_ora.yy.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bar Alexander Barkov
                Reporter:
                bar Alexander Barkov
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: