[MCOL-501] cpimport string with \ and \\ in the end Created: 2017-01-09 Updated: 2017-06-03 Resolved: 2017-06-03 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | cpimport |
| Affects Version/s: | None |
| Fix Version/s: | Icebox |
| Type: | Bug | Priority: | Major |
| Reporter: | Igor Kostyukovski | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Description |
|
/usr/local/mariadb/columnstore/bin/cpimport r8 test /var/spin_sync/test1 -E '|' -s '\t' -C '\' -e 4000 we can't load this file without errors. |
| Comments |
| Comment by Igor Kostyukovski [ 2017-01-09 ] |
|
i'm sorry |
| Comment by Daniel Lee (Inactive) [ 2017-04-21 ] |
|
In line 4 of the data, there is "php|". If the "|" is part of the data, then line 4 does not have enough data values for the row. If that "|" is meant to be a column separator, then the extra \ needs to be removed. If the "\" meant to be part of the data, then need to use " |
| Comment by David Thompson (Inactive) [ 2017-05-08 ] |
|
Hi Igor, can you confirm if the advice from Daniel addresses the issue or not? |
| Comment by Igor Kostyukovski [ 2017-05-09 ] |
|
but we load a millions(~200m) urls each minute to database, file just an example, that maria cpimport has a bug. i misunderstood suggestion from Daniel, i see only one two ways for now both of this ways are wrong, as i say problem exists only on columstore engine |
| Comment by Andrew Hutchings (Inactive) [ 2017-05-09 ] |
|
Hi Igor, What Daniel is basically saying is the options you supplied do not fit the file format use supplied as the example. You have specified a backslash as the escape character which means you are escaping the enclosed character in line 4 of your example. This will cause an error. If the intention was to have a backslash in the data then this needs to be escaped by having a double-backslash, as is standard in CSV style file formats. I'm not an Infobright user but if you are specifying the same parameters to it I can only imagine it is working around this by pattern matching the enclosed characters first before processing escape characters. This has its own set of problems, for example an unescaped enclosed character or two in the data can cause the pattern matching to pick the wrong data for a column. Unfortunately the problem is either in the data or the options supplied, not in ColumnStore in this instance. ColumnStore is behaving as intended / documented. To code a workaround to this specific use case would incur a performance hit for all imports and would still have other potential related issues. What you could do is pipe the data through sed and then through cpimport. Some of the performance hit of sed you observed would be gone because the extra file write step would not be needed. |
| Comment by Igor Kostyukovski [ 2017-05-09 ] |
|
Thanks for your answers! May be you can give advise how we can format csv to load this data to Columnstore without errors? For now we skip ~4% of collected urls with this problem. This ticket was created a long time ago, and we spent much time trying to load data and to be honest i don't remember details. |
| Comment by Andrew Hutchings (Inactive) [ 2017-05-09 ] |
|
Hi Igor, No problem. If your intention is to enclose with pipes and escape with backslashes then with the paths in column 8 if you could filter them so backslashes in the paths are themselves escaped with backslashes then this will load fine. Your paths look like URL paths, if they are you could consider URL encoding them under RFC 3986, then backslashes in the paths become "%5C". Alternatively if you don't ever have any non-printable ASCII characters in the paths you could use these as escape and enclose characters. |
| Comment by David Thompson (Inactive) [ 2017-06-03 ] |
|
System is working as designed. |