|
Done in this ticket:
- support for enclose by and escape characters added
- additional header command line parameter added to ignore the first line of the csv file if set
- csv input compatibility to RFC4180 achieved
- internal documentation in Github updated
- tests added to the regression suite
- test suite successfully executed on Win 10 and CentOS 7
Used CSV input format definition:
mcsimports CSV input file format derived from RFC4180 and cpimport
|
|
1. Each record is located on a separate line, delimited by a line
|
break (LF or CRLF). For example:
|
|
aaa,bbb,ccc LF
|
zzz,yyy,xxx LF
|
aaa,bbb,ccc CRLF
|
zzz,yyy,xxx CRLF
|
|
2. The last record in the file may or may not have an ending line
|
break. For example:
|
|
aaa,bbb,ccc LF
|
zzz,yyy,xxx
|
|
3. There maybe an optional header line appearing as the first line
|
of the file with the same format as normal record lines. This
|
header will contain names corresponding to the fields in the file
|
and should contain the same number of fields as the records in
|
the rest of the file (the presence of the header line should be
|
indicated via the optional "header" command line argument of
|
mcsimport). If a header is specified it can be used as referrence
|
in mcsimport's mapping file. [Possible extension for the future]
|
|
For example:
|
|
field_name,field_name,field_name LF
|
aaa,bbb,ccc LF
|
zzz,yyy,xxx LF
|
|
4. Within the header and each record, there may be one or more
|
fields, separated by a delimiter (default comma). Each line
|
should contain the same number of fields throughout the file.
|
Spaces are considered part of a field and should not be ignored.
|
The last field in the record must not be followed by a delimiter.
|
|
For example:
|
|
aaa,bbb,ccc
|
|
5. Each field may or may not be enclosed by an enclosing character
|
(default double quotes). If fields are not enclosed by an enclosing
|
character, the enclosing character may not appear inside the fields.
|
|
For example:
|
|
"aaa","bbb","ccc" LF
|
zzz,yyy,xxx
|
|
6. Fields containing line breaks (LF or CRLF), delimiters, and enclosing
|
characters should be enclosed in enclosing characters. For example:
|
|
"aaa","b LF
|
bb","ccc" LF
|
zzz,yyy,xxx
|
|
7. If enclosing characters are used to enclose fields, then an enclosing
|
character appearing inside a field must be escaped by preceding it with
|
an escaping character. (default double quotes) For example:
|
|
"aaa","b""bb","ccc"
|
|
8. An escaping character can escape itself when used in an enclosed field.
|
For example with \ as escaping character:
|
|
"aaa","b\\bb\\","c\cc"
|
For QA:
- review added test cases and add additional test cases to the regression suite if you see fit
- execute the regression test suite on Windows, CentOS 7, and one Debian/Ubuntu operating system
|