[MDEV-26678] JSON_EQUALS validates invalid JSON and valid JSON as being equal when using a backslash Created: 2021-09-25  Updated: 2021-09-27  Resolved: 2021-09-25

Status: Closed
Project: MariaDB Server
Component/s: Parser
Affects Version/s: 10.7
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Roel Van de Paar Assignee: Sergei Golubchik
Resolution: Not a Bug Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-23143 Add JSON_EQUALS function to check JSO... Closed
Relates
relates to MDEV-26679 JSON_VALID validates invalid JSON Closed

 Description   

JSON_EQUALS validates invalid JSON and valid JSON as being equal when using a backslash:

10.7.0 d552e092c9f3e20da078d1b62b976f629f73d3a4 (Debug)

10.7.0-dbg>SELECT JSON_EQUALS('{"A": 1}', '{"A": 1\}');
+--------------------------------------+
| JSON_EQUALS('{"A": 1}', '{"A": 1\}') |
+--------------------------------------+
|                                    1 |
+--------------------------------------+
1 row in set (0.001 sec)

A simplified version is this (though '{}' is not valid JSON by itself):

10.7.0 d552e092c9f3e20da078d1b62b976f629f73d3a4 (Debug)

10.7.0-dbg>SELECT JSON_EQUALS('{}', '{\}');
+--------------------------+
| json_equals('{}', '{\}') |
+--------------------------+
|                        1 |
+--------------------------+
1 row in set (0.000 sec)



 Comments   
Comment by Roel Van de Paar [ 2021-09-25 ]

Whilst there is (now) MDEV-26679, this cannot be cause here.

Comment by Roel Van de Paar [ 2021-09-25 ]

Same parsing issue

10.7.0 d552e092c9f3e20da078d1b62b976f629f73d3a4 (Debug)

10.7.0-dbg>SELECT JSON_EQUALS('\{"A": 1}', '{"A": 1}');
+--------------------------------------+
| JSON_EQUALS('\{"A": 1}', '{"A": 1}') |
+--------------------------------------+
|                                    1 |
+--------------------------------------+
1 row in set (0.001 sec)

Comment by Roel Van de Paar [ 2021-09-25 ]

Also consider this output

10.7.0 d552e092c9f3e20da078d1b62b976f629f73d3a4 (Debug)

10.7.0-dbg>SELECT JSON_EQUALS('\{"A": 1\}', '\{"A": 1\}');
+-----------------------------------------+
| JSON_EQUALS('\{"A": 1\}', '\{"A": 1\}') |
+-----------------------------------------+
|                                       1 |
+-----------------------------------------+
1 row in set (0.000 sec)
 
10.7.0-dbg>SELECT JSON_EQUALS('\{"A": 1\\}', '\{"A": 1\}');
+------------------------------------------+
| JSON_EQUALS('\{"A": 1\\}', '\{"A": 1\}') |
+------------------------------------------+
|                                     NULL |
+------------------------------------------+
1 row in set (0.000 sec)
 
10.7.0-dbg>SELECT JSON_EQUALS('\{"A": 1\\}', '\{"A": 1\\}');
+-------------------------------------------+
| JSON_EQUALS('\{"A": 1\\}', '\{"A": 1\\}') |
+-------------------------------------------+
|                                      NULL |
+-------------------------------------------+
1 row in set (0.000 sec)

Comment by Roel Van de Paar [ 2021-09-25 ]

10.7.0 d552e092c9f3e20da078d1b62b976f629f73d3a4 (Debug)

10.7.0-dbg>SELECT JSON_EQUALS('{}', '{ \ \ \ \ \ }');
+------------------------------------+
| JSON_EQUALS('{}', '{ \ \ \ \ \ }') |
+------------------------------------+
|                                  1 |
+------------------------------------+
1 row in set (0.000 sec)

Comment by Eric Herman [ 2021-09-25 ]

The JSON_EQUALS sql function relies upon JSON_NORMALIZE.

As you probably expect, in order for Item_func_json_equals::val_int() to return a valid result and not a NULL, the input result from each call to json_normalize() must return a 0 error code (like EXIT_SUCCESS).

Quickly hacking in a unit test for the underlying json_normalize library function showed me that, as expected, the "bad" string with the backslash before the closing curly-brace, like this:

 "{\"A\": 1\\}"

returns a non-zero return value, thus is an error.

Here is the diff of the test I hacked together:

diff --git a/unittest/json_lib/json_normalize-t.c b/unittest/json_lib/json_normalize-t.c
index f72e90175e2..dd9f3c88fe8 100644
--- a/unittest/json_lib/json_normalize-t.c
+++ b/unittest/json_lib/json_normalize-t.c
@@ -68,6 +68,11 @@ test_json_normalize_invalid(void)
   ok(json_normalize(&result, NULL, 0, cs) != 0,
      "expected normalized error");
   dynstr_free(&result);
+
+  init_dynamic_string(&result, NULL, 0, 0);
+  ok(json_normalize(&result, STRING_WITH_LEN("{\"A\": 1\\}"), cs) != 0,
+     "expected normalized error");
+  dynstr_free(&result);
 }
 
 
@@ -243,7 +248,7 @@ check_number_normalize(const char *in, const char *expected)
 int
 main(void)
 {
-  plan(88);
+  plan(89);
   diag("Testing json_normalization.");
 
   check_number_normalize("0", "0.0E0");

This makes me suspect that the "bad" string which is being sent in to the Item_func_json_equals method is not being properly escaped.

Comment by Eric Herman [ 2021-09-25 ]

Similarly, the underlying json_valid works as I would expect for these two strings:

diff --git a/unittest/json_lib/json_lib-t.c b/unittest/json_lib/json_lib-t.c
index 378ebe201f5..bcbee11c6b4 100644
--- a/unittest/json_lib/json_lib-t.c
+++ b/unittest/json_lib/json_lib-t.c
@@ -170,17 +170,32 @@ test_search()
   ok(n_matches == 3, "search");
 }
 
+static void test_json_valid(void)
+{
+  const char *str= NULL;
+  CHARSET_INFO *cs= &my_charset_utf8mb4_general_ci;
+  int valid= 0;
+
+  str= "{\"A\": 1}";
+  valid= json_valid(str, strlen(str), cs);
+  ok(valid, "expected valid '%s'", str);
+
+  str= "{\"A\": 1\\}";
+  valid= json_valid(str, strlen(str), cs);
+  ok(!valid, "expected not valid '%s'", str);
+}
 
 int main()
 {
   ci= &my_charset_utf8mb3_general_ci;
 
-  plan(6);
+  plan(8);
   diag("Testing json_lib functions.");
 
   test_json_parsing();
   test_path_parsing();
   test_search();
+  test_json_valid();
 
   return exit_status();
 }

Comment by Sergei Golubchik [ 2021-09-25 ]

JSON_EQUAL returns 1 because strings are not only json-equal, they're bytewise identical:

MariaDB [test]> SELECT JSON_EQUALS('{"A": 1}', '{"A": 1\}');
+--------------------------------------+
| JSON_EQUALS('{"A": 1}', '{"A": 1\}') |
+--------------------------------------+
|                                    1 |
+--------------------------------------+
1 row in set (0.001 sec)
 
MariaDB [test]> SELECT '{"A": 1}' = '{"A": 1\}';
+--------------------------+
| '{"A": 1}' = '{"A": 1\}' |
+--------------------------+
|                        1 |
+--------------------------+
1 row in set (0.001 sec)
 
MariaDB [test]> SELECT '{"A": 1\}';
+----------+
| {"A": 1} |
+----------+
| {"A": 1} |
+----------+
1 row in set (0.001 sec)

See https://mariadb.com/kb/en/string-literals/

Comment by Eric Herman [ 2021-09-27 ]

As was shown in MDEV-26679 this works as expected with sql_mode=no_backslash_escapes or if the slashes are escaped:

MariaDB [test]> SELECT JSON_EQUALS('{"A": 1}', '{"A": 1\}');
+--------------------------------------+
| JSON_EQUALS('{"A": 1}', '{"A": 1\}') |
+--------------------------------------+
|                                    1 |
+--------------------------------------+
1 row in set (0.000 sec)
 
MariaDB [test]> SELECT JSON_EQUALS('{"A": 1}', '{"A": 1\\}');
+---------------------------------------+
| JSON_EQUALS('{"A": 1}', '{"A": 1\\}') |
+---------------------------------------+
|                                  NULL |
+---------------------------------------+
1 row in set (0.000 sec)
 
MariaDB [test]> set sql_mode=no_backslash_escapes;
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [test]> SELECT JSON_EQUALS('{"A": 1}', '{"A": 1\}');
+--------------------------------------+
| JSON_EQUALS('{"A": 1}', '{"A": 1\}') |
+--------------------------------------+
|                                 NULL |
+--------------------------------------+
1 row in set (0.000 sec)
 
MariaDB [test]>

Generated at Thu Feb 08 09:47:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.