Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9706

Deprecate use of quoted strings as select sublist aliases

    XMLWordPrintable

Details

    Description

      MariaDB supports this non-standard syntax:

      SELECT x 'alias' FROM t1;
      

      where the quoted string 'alias' is an alias.

      Problems

      There are a few problems with this syntax.

      1. Exceptions: a quoted string is not always an alias

      In some contexts a quoted string cannot be an alias, which is confusing:

      • Multi-chunk string literals

        SELECT 'str' 'alias';
        

        returns:

        +----------+
        | str      |
        +----------+
        | stralias |
        +----------+
        

        Notice, the 'alias' part here is concatenated to the string, according to the SQL standard. Therefore, a quoted string cannot be an alias for a string literal.

      • Temporal literals

        SELECT DATE '2001-01-01';
        

        returns

        +-------------------+
        | DATE '2001-01-01' |
        +-------------------+
        | 2001-01-01        |
        +-------------------+
        

        Notice, the '2001-01-01' part is not an alias here. A combination of the keywords DATE, TIME and TIMESTAMP followed by a quoted string is treated as a temporal literal, according to the SQL standard. Therefore, a quoted string cannot be an alias for column names DATE, TIME, TIMESTAMP.

      2. Shift/reduce conflicts.

      The above exception produce shift/reduce conflicts in sql_yacc.yy. Removing the "| TEXT_STRING_sys" options from the select_alias rule:

      select_alias:
                /* empty */ { $$=null_lex_str;}
              | AS ident { $$=$2; }
              | AS TEXT_STRING_sys { $$=$2; }
              | ident { $$=$1; }
              | TEXT_STRING_sys { $$=$1; }
              ;
      

      removes 4 shift/reduce conflicts.
      There is an option that we'll replace the current bison syntax analyzer to something else in the future. It would be nice to clean up all shift/reduce conflicts eventually, as they will be show stoppers for a new analyzer.

      3. Conflicts with the Unicode character string literals.

      We'll eventually add the SQL standard Unicode character string literals, e.g.:

      SELECT U&'\00FF' 'alias';
      

      The 'alias' part in the above query should be concatenated to the string, it's not an alias!

      4. Conflicts with multi-chunk hex literals.

      We'll eventually add the SQL standard multi-chunk hex literals, e.g.:

      SELECT X'60' '61';
      

      The above query should return a binary string 0x6061. '61' is not an alias!

      5. Conflicts with binary and nchar literals:

      We support this SQL standard syntax for binary and nchar literals:

      SELECT X'60';
      SELECT N'60';
      

      But we don't support this standard syntax (notice the space between X/N and the quoted string):

      SELECT X '60';
      SELECT N '60';
      

      According to the SQL standard, the space does not matter. To avoid conflicts, it's currently implemented on the tokenizer level in sql_lex.cc. It should be implemented on the syntax parser level instead.

      6. Conflicts with the non-standard typified literals.

      The SQL standard defines DATE, TIME, TIMESTAMP literals as the data type name followed by a quoted string:

      SELECT DATE '2001-01-01';
      SELECT TIME '10:20:30';
      SELECT TIMESTAMP '2001-01-01 10:20:30';
      

      PostgreSQL extends this idea to all other data types, e.g.:

      • A literal of the INET data type:

        bar=> SELECT INET'00f::000f';
         inet 
        ------
         f::f
        (1 row)
        

      • A literal of the BIGINT data type:

        bar=> SELECT BIGINT'00001';
         int8 
        ------
            1
        (1 row)
        

      Teradata also uses a similar syntax for the PERIOD data type literals:

      SELECT * FROM t1 WHERE flight_period = PERIOD '(08:00:00-08:00 - 15:40:00-05:00)';
      

      IBM DB2 uses a similar syntax for the GRAPHIC data type literals and UCS-2 graphic string literals:

      SELECT G'double-byte character string' FROM t1;
      SELECT UX'0042006F006200620079' FROM t1;
      

      We want to reuse the same idea for pluggable data types, and possibly even for built-in data types.

      Plan

      We'll deprecate use of quoted string literals as aliases, as follows:
      1. In 10.5 we'll add a warning:

      Using string literals as aliases is deprecated. Use non-delimited or backtick-delimited identifiers instead.
      

      2. In 10.5 we'll start using this syntax for known pluggable data types:

      SELECT INET6'00f::000f';
      

      • If INET6 is a known data type, this will be treated as an INET6 litersl
      • If INET6 is not a known data type, this will be treated in the old way: as an identifier followed by an alias

      3. In 10.6 we'll disallow this syntax by default and will return an error instead of the warning.
      4. In 10.6 we'll add a new sql_mode value STRING_AS_ALIAS, to switch to the legacy behavior when needed.
      5. In 10.6 we'll start using this syntax for all data types:

      SELECT INET6'00f::000f';
      

      to mean typified literals in the default sql_mode, so the above expression will mean a literal of the INET6 data type with 'f::f' as a value.
      Note, with sql_mode=STRING_AS_ALIAS the above query will resolve INET6 as an identifier (e.g. a column or an SP variable name) and '00f::000f' as an alias, for backward compatibility.

      Single-quoted and Double-quoted strings in sql_mode=default

      Note, in the sql_mode=default (when ANSI_QUOTES is not enabled) we'll deprecate use of both single and double quoted strings, so these queries will return an error:

      SELECT x 'alias' FROM t1;
      SELECT x "alias" FROM t1;
      

      Bar originally proposed to deprecate only single quotes, but Monty and Serg are afraid that it will be confusing, because MariaDB does not distinguish between single and double string elsewhere (when ANSI_QUOTES is not set).

      Note, this change will assume that this SQL-standard compliant query:

      SELECT x "alias" FROM t1;
      

      will stop working in the default sql_mode. But Monty and Serg think that this is OK: "alias" is a string here, it's not an identifier! Those users who strongly need this syntax can use sql_mode=ANSI_QUOTES or sql_mode=STRING_AS_ALIAS.

      Quoted strings and AS

      This syntax:

      SELECT x AS 'alias' FROM t1;
      SELECT x AS "alias" FROM t1;
      

      does not seem to cause any conflicts. We could still support this. However, for consistency and to move closer towards the SQL standard we'll also deprecate this syntax.

      Note, similar to the previous paragraph, this change will assume that this SQL-standard compliant query:

      SELECT x AS "alias" FROM t1;
      

      will stop working with the default sql_mode. But Monty and Serg think that this is OK: "alias" is a string here, it's not an identifier! Those users who strongly need this syntax can use sql_mode=ANSI_QUOTES or sql_mode=STRING_AS_ALIAS.

      Attachments

        Activity

          People

            Unassigned Unassigned
            bar Alexander Barkov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.