[MDEV-23359] Lex_input_stream: buggy code duplication in scan_ident_start() and scan_ident_sysvar() Created: 2020-07-31  Updated: 2020-08-01

Status: Open
Project: MariaDB Server
Component/s: Parser
Affects Version/s: 10.3, 10.4, 10.5
Fix Version/s: 10.5

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Blocks
is blocked by MDEV-23037 Multibyte character sets parse identi... Open

 Description   

There is a huge code duplication in:

  • Lex_input_stream::scan_ident_start()
  • Lex_input_stream::scan_ident_sysvar()

The latter seems to be buggy, as it does not handle bad bytes correctly.
This can be demonstrated in the following queries:

EXECUTE IMMEDIATE CONCAT('SELECT @@session',0xC2,'.autocommit');

ERROR 1300 (HY000): Invalid utf8 character string: 'session\xC2'

EXECUTE IMMEDIATE CONCAT('SELECT @@session',0xFF,'.autocommit');

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '?.autocommit' at line 1

The former error message is wrong. The 0xC2 is errorneously scanned as a part of identifier, although it is not followed by a valid multi-byte tail.

The latter error message is correct. 0xFF cannot be a part of an UTF8 sequence, so the tokenizer scans 'session' as expected, then fails to get a token during the next lex_one_token() call.

The wrong method scan_ident_sysvar() should be removed, and scan_ident_start() should be used instead.


Generated at Thu Feb 08 09:21:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.