Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25154

JSON_TABLE: Queries involving ordinality columns are unsafe for statement binlog and should be marked as such

Details

    Description

      ORDINALITY columns in some ways behave as auto-increment, and as auto-increment they can cause a non-deterministic result, so they should be marked as unsafe for statement binary log (produce a warning with binlog_format=statement and make binary logging switch to row with binlog_format=mixed). If it's easier, I suppose it can be extended onto any statements involving JSON_TABLE.

      Below is an example I have made up to demonstrate the point within the current MariaDB implementation; but in general, nothing in the standard guarantees any particular order in which values are returned and hence ordinality is assigned, and it's already clear that the order is different between implementations (e.g. between MariaDB and MySQL), and will likely variate between versions.

      bb-10.6-mdev17399-psergey2 8b533cc1d5

      MariaDB [test]> create table t (a int, key(a));
      Query OK, 0 rows affected (0.037 sec)
       
      MariaDB [test]> insert into t values (30),(20),(10);
      Query OK, 3 rows affected (0.013 sec)
      Records: 3  Duplicates: 0  Warnings: 0
      

      MariaDB [test]> create or replace table tt as select * from json_table((select json_arrayagg(a) from t), '$[*]' columns (o for ordinality, b int path '$')) t;
      Query OK, 3 rows affected (0.032 sec)
      Records: 3  Duplicates: 0  Warnings: 0
       
      MariaDB [test]> select * from tt;
      +------+------+
      | o    | b    |
      +------+------+
      |    1 |   10 |
      |    2 |   20 |
      |    3 |   30 |
      +------+------+
      3 rows in set (0.001 sec)
      

      MariaDB [test]> create or replace table tt as select * from json_table((select json_arrayagg(a) from t ignore index(a)), '$[*]' columns (o for ordinality, b int path '$')) t;
      Query OK, 3 rows affected (0.042 sec)
      Records: 3  Duplicates: 0  Warnings: 0
       
      MariaDB [test]> select * from tt;
      +------+------+
      | o    | b    |
      +------+------+
      |    1 |   30 |
      |    2 |   20 |
      |    3 |   10 |
      +------+------+
      3 rows in set (0.001 sec)
      

      Attachments

        Issue Links

          Activity

            This is not unique to JSON_TABLE. Here's an example with just GROUP_CONCAT:

            MariaDB [j4]> select @@log_bin, @@binlog_format;
            +-----------+-----------------+
            | @@log_bin | @@binlog_format |
            +-----------+-----------------+
            |         1 | STATEMENT       |
            +-----------+-----------------+
            1 row in set (0.001 sec)
            

            create table t1 (a int, key(a));
            insert into t1 values (30),(20),(10);
            create or replace table t2 as select group_concat(a) as A from t1;
            create or replace table t3 as select group_concat(a) as A from t1 ignore index(a);
            

            MariaDB [j4]> select * from t2;
            +----------+
            | A        |
            +----------+
            | 10,20,30 |
            +----------+
            1 row in set (0.000 sec)
            

            MariaDB [j4]> select * from t3;
            +----------+
            | A        |
            +----------+
            | 30,20,10 |
            +----------+
            1 row in set (0.000 sec)
            

            in the binlog, both CREATE...SELECT queries are logged as statements, and no error is given.

            psergei Sergei Petrunia added a comment - This is not unique to JSON_TABLE. Here's an example with just GROUP_CONCAT: MariaDB [j4]> select @@log_bin, @@binlog_format; +-----------+-----------------+ | @@log_bin | @@binlog_format | +-----------+-----------------+ | 1 | STATEMENT | +-----------+-----------------+ 1 row in set (0.001 sec) create table t1 (a int , key (a)); insert into t1 values (30),(20),(10); create or replace table t2 as select group_concat(a) as A from t1; create or replace table t3 as select group_concat(a) as A from t1 ignore index (a); MariaDB [j4]> select * from t2; +----------+ | A | +----------+ | 10,20,30 | +----------+ 1 row in set (0.000 sec) MariaDB [j4]> select * from t3; +----------+ | A | +----------+ | 30,20,10 | +----------+ 1 row in set (0.000 sec) in the binlog, both CREATE...SELECT queries are logged as statements, and no error is given.

            I think the fix is just to do this:

              thd->lex->set_stmt_unsafe(LEX::BINLOG_STMT_UNSAFE_SYSTEM_FUNCTION);
            

            when creating the function

            psergei Sergei Petrunia added a comment - I think the fix is just to do this: thd->lex->set_stmt_unsafe(LEX::BINLOG_STMT_UNSAFE_SYSTEM_FUNCTION); when creating the function
            psergei Sergei Petrunia added a comment - - edited

            Take-away from the discussion: the source of "non-determinsm" wrt MySQL is that one can write constructs like so:

            JSON_TABLE(doc, 
              '$.*' columns (
                col1 for ordinality,
                col2 int path '$')
            ) 
            

            This will enumerate all object members in the doc and assign them numbers. The issue is that object members can appear either in the order they appeared in the doc (current MariaDB) or re-ordered by their name (MySQL 8).

            psergei Sergei Petrunia added a comment - - edited Take-away from the discussion: the source of "non-determinsm" wrt MySQL is that one can write constructs like so: JSON_TABLE(doc, '$.*' columns ( col1 for ordinality, col2 int path '$' ) ) This will enumerate all object members in the doc and assign them numbers. The issue is that object members can appear either in the order they appeared in the doc (current MariaDB) or re-ordered by their name (MySQL 8).

            Proposed solution: mark the statement SBR-unsafe if it is using JSON_TABLE() with a FOR ORDINALITY column. Need to discuss this.

            psergei Sergei Petrunia added a comment - Proposed solution: mark the statement SBR-unsafe if it is using JSON_TABLE() with a FOR ORDINALITY column. Need to discuss this.

            That was only a partial take away from the discussion.

            The main point is – neither the standard nor, currently, MariaDB specify any particular order in which the values should be returned, and strangely the standard does not even make any mention of it being implementation-specific in the context of ORDINALITY. It means that not only the difference with MySQL is in this case legitimate (and it wouldn't be a sufficient reason to worry about replication, because as far as JSON functions are involved, MariaDB and MySQL are too different to allow statement-based replication anyway), but also it can change any time within MariaDB itself.

            A totally different approach could have been to "fix" the order once for all: since the standard doesn't say anything at all about it, MariaDB could have declared in its own specification/documentation, on top of the standard requirements, that it always deterministically returns and will return values in <this particular order>. It would have limited possibilities for future refactoring or other internal changes which affect the order, but on the other hand, it would have probably eliminated the need to mark it as unsafe for binlog (still needs to be carefully considered). Another positive side of it would have been that, as we know from the past experience, whenever the outcome is not guaranteed, and even when it's explicitly documented as non-deterministic, users take the de-facto result as a baseline and vehemently complain when the outcome does change. The guaranteed order would have meant stricter approach to maintenance, but less variance for users.

            elenst Elena Stepanova added a comment - That was only a partial take away from the discussion. The main point is – neither the standard nor, currently, MariaDB specify any particular order in which the values should be returned, and strangely the standard does not even make any mention of it being implementation-specific in the context of ORDINALITY. It means that not only the difference with MySQL is in this case legitimate (and it wouldn't be a sufficient reason to worry about replication, because as far as JSON functions are involved, MariaDB and MySQL are too different to allow statement-based replication anyway), but also it can change any time within MariaDB itself. A totally different approach could have been to "fix" the order once for all: since the standard doesn't say anything at all about it, MariaDB could have declared in its own specification/documentation, on top of the standard requirements, that it always deterministically returns and will return values in <this particular order>. It would have limited possibilities for future refactoring or other internal changes which affect the order, but on the other hand, it would have probably eliminated the need to mark it as unsafe for binlog (still needs to be carefully considered). Another positive side of it would have been that, as we know from the past experience, whenever the outcome is not guaranteed, and even when it's explicitly documented as non-deterministic, users take the de-facto result as a baseline and vehemently complain when the outcome does change. The guaranteed order would have meant stricter approach to maintenance, but less variance for users.
            psergei Sergei Petrunia added a comment - http://lists.askmonty.org/pipermail/commits/2021-April/014565.html

            ok to push as discussed on slack.

            holyfoot Alexey Botchkov added a comment - ok to push as discussed on slack.

            People

              psergei Sergei Petrunia
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.