[MCOL-1774] mcsimport - enclose by character support and escape character for enclose by char Created: 2018-10-05  Updated: 2023-10-26  Resolved: 2018-11-09

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.2.0
Fix Version/s: 1.2.1

Type: New Feature Priority: Major
Reporter: Jens Röwekamp (Inactive) Assignee: Zdravelina Sokolovska (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Windows, Linux


Attachments: File MariaDB ColumnStore mcsimport-1.2.1-1-x64.msi    
Issue Links:
Duplicate
is duplicated by MCOL-1842 missing options escape character and ... Closed
Relates
relates to MCOL-1805 Remote mcsimport tool is trowing War... Closed
Sprint: 2018-19, 2018-20

 Description   

Add options for enclose by characters, like quotation marks for text fields, and escape characters for these enclose by characters.

Use the same command line parameters that are used in cpimport.



 Comments   
Comment by Jens Röwekamp (Inactive) [ 2018-11-01 ]

Done in this ticket:

  • support for enclose by and escape characters added
  • additional header command line parameter added to ignore the first line of the csv file if set
  • csv input compatibility to RFC4180 achieved
  • internal documentation in Github updated
  • tests added to the regression suite
  • test suite successfully executed on Win 10 and CentOS 7

Used CSV input format definition:

mcsimports CSV input file format derived from RFC4180 and cpimport
 
   1.  Each record is located on a separate line, delimited by a line
       break (LF or CRLF).  For example:
 
       aaa,bbb,ccc LF
       zzz,yyy,xxx LF
       aaa,bbb,ccc CRLF
       zzz,yyy,xxx CRLF
 
   2.  The last record in the file may or may not have an ending line
       break.  For example:
 
       aaa,bbb,ccc LF
       zzz,yyy,xxx
 
   3.  There maybe an optional header line appearing as the first line
       of the file with the same format as normal record lines. This
       header will contain names corresponding to the fields in the file
       and should contain the same number of fields as the records in
       the rest of the file (the presence of the header line should be
       indicated via the optional "header" command line argument of
	   mcsimport). If a header is specified it can be used as referrence
	   in mcsimport's mapping file. [Possible extension for the future]
	   
	   For example:
 
       field_name,field_name,field_name LF
       aaa,bbb,ccc LF
       zzz,yyy,xxx LF
 
   4.  Within the header and each record, there may be one or more
       fields, separated by a delimiter (default comma). Each line 
	   should contain the same number of fields throughout the file.
	   Spaces are considered part of a field and should not be ignored.
	   The last field in the record must not be followed by a delimiter.
	   
	   For example:
 
       aaa,bbb,ccc
 
   5.  Each field may or may not be enclosed by an enclosing character
       (default double quotes). If fields are not enclosed by an enclosing
	   character, the enclosing character may not appear inside the fields.
 
	   For example:
 
       "aaa","bbb","ccc" LF
       zzz,yyy,xxx
	
   6.  Fields containing line breaks (LF or CRLF), delimiters, and enclosing
       characters should be enclosed in enclosing characters.  For example:
 
       "aaa","b LF
       bb","ccc" LF
	   zzz,yyy,xxx
	
   7.  If enclosing characters are used to enclose fields, then an enclosing
       character appearing inside a field must be escaped by preceding it with
       an escaping character. (default double quotes)  For example:
 
       "aaa","b""bb","ccc"
 
   8.  An escaping character can escape itself when used in an enclosed field.
       For example with \ as escaping character:
	   
	   "aaa","b\\bb\\","c\cc"

For QA:

  • review added test cases and add additional test cases to the regression suite if you see fit
  • execute the regression test suite on Windows, CentOS 7, and one Debian/Ubuntu operating system
Generated at Thu Feb 08 02:31:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.