[MDEV-6845] Unable to specify escape character for FMT Created: 2014-10-06  Updated: 2014-10-10  Resolved: 2014-10-09

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Connect
Affects Version/s: 10.0.14
Fix Version/s: 10.0.15

Type: Bug Priority: Major
Reporter: Christian Rishøj Assignee: Olivier Bertrand
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

Ubuntu trusty



 Description   

The request path in the 2nd line of the following sample has a trailing quotation mark, which is escaped with a backslash:

imusic.dk:80 5.255.253.131 - - [20/Jul/2014:10:47:25 +0200] "GET /image.php?id=2090503593091&type=thumbnail HTTP/1.1" 302 266 "-" "Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)"
imusic.dk:80 157.55.39.105 - - [20/Jul/2014:10:47:25 +0200] "GET /cd/5053105444328/christoffer-hoeyer-2012-silent-songs-about-things-that-don-t-happen-cd\" HTTP/1.1" 301 335 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

Here's a CREATE string to query the log data using the CONNECT engine:

CREATE TABLE access_logs (
  server_name CHAR(30)          NOT NULL FIELD_FORMAT = ' %n%s%n',
  remote_host CHAR(15)          NOT NULL FIELD_FORMAT = ' %n%s%n',
  time        DATETIME          NOT NULL FIELD_FORMAT = ' - - [%n%s%n +0200]' 
    DATE_FORMAT = 'DD/MMM/YYYY:hh:mm:ss',
  request     VARCHAR(16384)    NOT NULL FIELD_FORMAT = ' "%n%[^"]%n"',
  status      SMALLINT UNSIGNED NOT NULL FIELD_FORMAT = ' %n%d%n',
  bytes_sent  INT UNSIGNED      NOT NULL FIELD_FORMAT = ' %n%d%n',
  referer     VARCHAR(16384)    NOT NULL FIELD_FORMAT = ' "%n%[^"]%m',
  user_agent  VARCHAR(512)      NOT NULL FIELD_FORMAT = '" "%n%[^"]%n"'
)
  ENGINE = CONNECT
  TABLE_TYPE = FMT
  FILE_NAME = '/tmp/text.txt';

Querying the first line works well:

MariaDB [test]> SELECT * FROM access_logs LIMIT 1;
+--------------+---------------+---------------------+---------------------------------------------------------+--------+------------+---------+---------------------------------------------------------------------+
| server_name  | remote_host   | time                | request                                                 | status | bytes_sent | referer | user_agent                                                          |
+--------------+---------------+---------------------+---------------------------------------------------------+--------+------------+---------+---------------------------------------------------------------------+
| imusic.dk:80 | 5.255.253.131 | 2014-07-20 10:47:02 | GET /image.php?id=2090503593091&type=thumbnail HTTP/1.1 |    302 |        266 | -       | Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots) |
+--------------+---------------+---------------------+---------------------------------------------------------+--------+------------+---------+---------------------------------------------------------------------+
1 row in set, 1 warning (0.01 sec)

For the second line, I get an error:

MariaDB [test]> SELECT * FROM access_logs LIMIT 2 OFFSET 1;
ERROR 1296 (HY000): Got error 122 'Bad format line 2 field 5 of access_logs' from CONNECT

I suppose the engine chokes on the escaped quotation mark (\"), and assume that this case requires the ability to specify an escape character (which in turn might prove troublesome, in case the parsing is sscanf-based).



 Comments   
Comment by Olivier Bertrand [ 2014-10-09 ]

Unfortunately the parsing of FMT tables is sscanf-based indeed. I see no solution for this problem so far.

However, if the number of such lines is not to big in your table, what you can do temporarily is:

ALTER TABLE access_logs OPTION_LIST='maxerr=5000,accept=1';

Where maxerr is set to a number larger than the number of bad lines. Doing so the query:

SELECT * FROM access_logs;

returns:

server_name remote_host time request status bytes_sent referer user_agent
imusic.dk:80 5.255.253.131 2014-07-20 10:47:02 GET /image.php?id=2090503593091&type=thumbnail HTTP/1.1 302 266 - Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots)
imusic.dk:80 157.55.39.105 2014-07-20 10:47:02 GET /cd/5053105444328/christoffer-hoeyer-2012-silent-songs-about-things-that-don-t-happen-cd|0 0

NOTE: According to the documentation, it would have been enough to specify

ALTER TABLE access_logs OPTION_LIST='accept';

However this does not work. This bug will be fixed in next version.

By the way, when doing the SELECT command, I got two warnings:

Level Code Message
Warning 1105 Out of range value for column time at row 1
Warning 1105 Out of range value for column time at row 2

Indeed the result seems truncated.
This is another bug that I have fixed for new versions.

OEM:
Of course, another solution would be to write an OEM table type with parsing based on lex/yacc. It is what could be done for a customer having a need for using such files. It is precisely the purpose of the OEM type.

Generated at Thu Feb 08 07:15:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.