[MDEV-6217] CONNECT engine cannot handle varchar fields containing newline Created: 2014-05-07  Updated: 2023-11-28

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.10
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Minor
Reporter: Martin WaIte Assignee: Andrew Hutchings
Resolution: Unresolved Votes: 0
Labels: connect-engine
Environment:

windows 7 64 bit (actually v10.0.10 without galera)



 Description   

Given a CSV file with a varchar field containing a newline:

f1,f2
1,"simple ""text"" field"
2,"includes newline
in field"
3,"simple"

I can create a CONNECT CSV table:

CREATE TABLE `simple` (
  `f1` int(1) NOT NULL,
  `f2` varchar(200) NOT NULL
) ENGINE=CONNECT DEFAULT CHARSET=utf8 `TABLE_TYPE`='csv' `FILE_NAME`='c:/Program Files/MariaDB 10.0/data/mw/simple.csv' `SEP_CHAR`=',' `QCHAR`='"' `HEADER`=1;

But selecting from the table fails:

MariaDB [mw]> select * from simple;
ERROR 1296 (HY000): Got error 122 'Missing ending quote in simple field 2 line 2' from CONNECT

This makes the CONNECT CSV table unusable for my application.



 Comments   
Comment by Olivier Bertrand [ 2014-05-10 ]

Sure enough for most file table types (except BIN, VEC and XML) CONNECT regards a physical line of the file as a table row.
For it your CSV file has four rows, the second and third being wrong because not having the required number of fields.
Currently, if you have only a few fields containing newline characters, a bypass is to use the MAXERR option to skip them:

alter table simple option_list='maxerr=10';

Giving a MAXERR value big enough to skip all wrong lines (each newline character will make 2 wrong lines)
Another alternative is to use the CSV engine that handles field containing newline characters.

I will see if I can do the same in the special case of CSV tables but it is not in my list of priorities, sorry about that...

Generated at Thu Feb 08 07:10:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.