Details
Description
From https://github.com/MariaDB/server/pull/1920
There is a scientific notation bug that uses the e notation which was discovered many years ago and was used to bypass some Web Application Firewall.
With more research, I found out more use case to this bug and that it is possible to bypass modern WAF as the query seems to be invalid but is valid.
So the following query that seems to be invalid:
MariaDB [test]> create table test (id int unsigned primary key not null);
|
Query OK, 0 rows affected (0.03 sec)
|
|
MariaDB [test]> insert into test select * from seq_1_to_3;
|
Query OK, 3 rows affected (0.01 sec)
|
Records: 3 Duplicates: 0 Warnings: 0
|
|
MariaDB [test]> select id 1.e, char 10.e(id 2.e), concat 3.e('a'12356.e,'b'1.e,'c'1.1234e)1.e, 12 1.e*2 1.e, 12 1.e/2 1.e, 12 1.e|2 1.e, 12 1.e^2 1.e, 12 1.e%2 1.e, 12 1.e&2 from test;
|
+----+------------------------+------------------------------------------+---------------+---------------+---------------+---------------+---------------+---------------+
|
| id | 1.e, char 10.e(id 2.e) | concat 3.e('a'12356.e,'b'1.e,'c'1.1234e) | 1.e, 12 1.e*2 | 1.e, 12 1.e/2 | 1.e, 12 1.e|2 | 1.e, 12 1.e^2 | 1.e, 12 1.e%2 | 1.e, 12 1.e&2 |
|
+----+------------------------+------------------------------------------+---------------+---------------+---------------+---------------+---------------+---------------+
|
| 1 | | abc | 24 | 6.0000 | 14 | 14 | 0 | 0 |
|
| 2 | | abc | 24 | 6.0000 | 14 | 14 | 0 | 0 |
|
| 3 | | abc | 24 | 6.0000 | 14 | 14 | 0 | 0 |
|
+----+------------------------+------------------------------------------+---------------+---------------+---------------+---------------+---------------+---------------+
|
3 rows in set (0.01 sec)
|
Is in fact valid and the same as this query:
mysql> select id, char(id), concat('a','b','c'), 12*2, 12/2, 12|2, 12^2, 12%2, 12&2 from test.test; |
+------+--------------------+---------------------+------+--------+------+------+------+------+ |
| id | char(id) | concat('a','b','c') | 12*2 | 12/2 | 12|2 | 12^2 | 12%2 | 12&2 | |
+------+--------------------+---------------------+------+--------+------+------+------+------+ |
| 1 | 0x01 | abc | 24 | 6.0000 | 14 | 14 | 0 | 0 |
|
| 2 | 0x02 | abc | 24 | 6.0000 | 14 | 14 | 0 | 0 |
|
| 3 | 0x03 | abc | 24 | 6.0000 | 14 | 14 | 0 | 0 |
|
+------+--------------------+---------------------+------+--------+------+------+------+------+ |
3 rows in set (0.00 sec) |
The bug is that when the code think that it is a float token containing a dot and the e notion, it does a check if the following character is a digit and if not, set the state to "MY_LEX_CHAR" which is the equivalent to "MY_LEX_SKIP" and drop the token completely. Therefore, all the "1.e" that you see in the previous query are dropped from the query completely. Note that that number in front or after the dot does not matter and that the dot is mandatory for it to work.
The bug works by following any of these characters to the notation:
( ) . , | & % * ^ /
|
The fix should be enough to simply abort the query if no digit follows the "dot e notation". I tested it locally and it worked, however, more tests should be done to make sure it does not break anything else.
How can this PR be tested?
Since the bug happens only with the "dot e notation" followed by any of these characters:
( ) . , | & % * ^ /
|
It would only require you to write a query with any of these about. Also, the query provided in the description could be used (or any part of it individually).