[MCOL-1805] Remote mcsimport tool is trowing Warning message : Column size of input file is higher than the column size of the target table.Remaining csv columns won't be injected. Created: 2018-10-15  Updated: 2023-10-26  Resolved: 2018-11-14

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.2.0
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Zdravelina Sokolovska (Inactive) Assignee: Jens Röwekamp (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

linux


Issue Links:
Relates
relates to MCOL-1774 mcsimport - enclose by character supp... Closed
relates to MCOL-1842 missing options escape character and ... Closed

 Description   

Remote mcsimport tool is trowing Warning message : Column size of input file is higher than the column size of the target table.Remaining csv columns won't be injected.

A.
file loaded with the local cpimport tool

 ./cpimport  tpcds_1000 catalog_page  /home/qa-user/MDC1/mariadb-columnstore-tpcds/mm1/QA/tpcds_1000/catalog_page.tbl
2018-10-15 17:42:28 (30383) INFO : Running distributed import (mode 1) on all PMs...
2018-10-15 17:42:31 (30383) INFO : For table tpcds_1000.catalog_page: 30000 rows processed and 30000 rows inserted.
2018-10-15 17:42:31 (30383) INFO : Bulk load completed, total run time : 2.76213 seconds

B.the same file loaded with the remote mcsimport tool

./mcsimport tpcds_1000 catalog_page  dd/tpcds_1000/catalog_page.tbl -c Columnstore.xml -d "|"
Warning: Column size of input file is higher than the column size of the target table.
Remaining csv columns won't be injected.
Execution time: 1.62413s
Rows inserted: 30000
Truncation count: 0
Saturated count: 0
Invalid count: 0



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2018-10-15 ]

Can you please provide a test case for this so we can reproduce it?

Comment by Jens Röwekamp (Inactive) [ 2018-10-29 ]

I don't think that's a bug in mcsimport but a bug in tpcds. tpcds creates a table with 9 columns via:

create table catalog_page
(
    cp_catalog_page_sk        integer               not null,
    cp_catalog_page_id        char(16)              not null,
    cp_start_date_sk          integer                       ,
    cp_end_date_sk            integer                       ,
    cp_department             varchar(50)                   ,
    cp_catalog_number         integer                       ,
    cp_catalog_page_number    integer                       ,
    cp_description            varchar(100)                  ,
    cp_type                   varchar(100)                  ,
    primary key (cp_catalog_page_sk)
);

but wants to inject a csv file with 10 columns:

1|AAAAAAAABAAAAAAA|2450815|2450996|DEPARTMENT|1|1|In general basic characters welcome. Clearly lively friends conv|bi-annual|
2|AAAAAAAACAAAAAAA|2450815|2450996|DEPARTMENT|1|2|English areas will leave prisoners. Too public countries ought to become beneath the years. |bi-annual|
3|AAAAAAAADAAAAAAA|2450815|2450996|DEPARTMENT|1|3|Times could not address disabled indians. Effectively public ports c|bi-annual|
4|AAAAAAAAEAAAAAAA|2450815|||1|||bi-annual|
5|AAAAAAAAFAAAAAAA|2450815|2450996|DEPARTMENT|1|5|Classic buildings ensure in a tests. Real years may not receive open systems. Now broad m|bi-annual|
6|AAAAAAAAGAAAAAAA|2450815|2450996|DEPARTMENT|1|6|Exciting principles wish greatly only excellent women. Appropriate fortunes shall not|bi-annual|
7|AAAAAAAAHAAAAAAA|2450815|2450996|DEPARTMENT|1|7|National services must not come at least into a girls|bi-annual|
8|AAAAAAAAIAAAAAAA|2450815|2450996|DEPARTMENT|1|8|Areas see early for a pounds. New goods study too serious women. Unwittingly sorry incentives shall|bi-annual|
9|AAAAAAAAJAAAAAAA|2450815|2450996|DEPARTMENT|1|9|Intensive, economic changes resist bloody of course simple economies; |bi-annual|
10|AAAAAAAAKAAAAAAA|2450815|2450996|DEPARTMENT|1|10|Careful, intense funds balance perhaps boys. Romantic chips remove legs. Direct birds get |bi-annual|
11|AAAAAAAALAAAAAAA|2450815|2450996|DEPARTMENT|1|11|At least national countries live by an sales. Weap|bi-annual|
12|AAAAAAAAMAAAAAAA|2450815|2450996|DEPARTMENT|1|12|Girls indicate so in a countries. Natural, emotional weeks try a|bi-annual|
13|AAAAAAAANAAAAAAA|2450815|2450996|DEPARTMENT|1|13|Miles see mainly clear hands. Villages finish there blue figures. Moreover wide students travel poo|bi-annual|
14|AAAAAAAAOAAAAAAA|2450815|2450996|DEPARTMENT|1|14|Rooms would say ago economic sections. Essential properties might not support groups. Ago rare eye|bi-annual|
15|AAAAAAAAPAAAAAAA|2450815|2450996|DEPARTMENT|1|15|Legal, required ends may not improve in the pictures. Really social structur|bi-annual|
16|AAAAAAAAABAAAAAA|2450815|2450996|DEPARTMENT|1|16|Schools must know now empty legs; generally daily children use sharp, loca|bi-annual|
17|AAAAAAAABBAAAAAA|2450815|2450996|DEPARTMENT|1|17|More than true carers can ensure at a officers. Candidates s|bi-annual|

The last pipe is interpreted by mcsimport as a 10th column with only NULL values.

Therefore, it warns the customer that the input file has more rows than the target ColumnStore table and only injects the first 9 columns.

Comment by Zdravelina Sokolovska (Inactive) [ 2018-10-30 ]

when we use the local cpimport tool – eg the build in cpimport in MCS Server , that issue is not obserfved; refer to the point A.
could you please check what's is the default escape character of the remote cpimport tool ?
in addition there is not escape character option of the remote mcsimport

]# ./mcsimport --help
Usage: ./mcsimport database table input_file [-m mapping_file] [-c Columnstore.xml] [-d delimiter] [-df date_format] [-default_non_mapped]

Comment by Zdravelina Sokolovska (Inactive) [ 2018-10-30 ]

it's not clear what are the default values of the escape character and enclosed by character
of the remote mcsimport tool.
in addition those are not provisioned such options:

  1. ./mcsimport --help

Usage: ./mcsimport database table input_file [-m mapping_file] [-c Columnstore.xml] [-d delimiter] [-df date_format] [-default_non_mapped]

the other thing is that when use the same file, but importing it via cpimport tool from the cs server, such problem is not observed

Comment by Jens Röwekamp (Inactive) [ 2018-10-30 ]

Hi winstone,
please provide me with the CSV specification you are referring to (that clears up the CSV separator ending ambiguity) so that I can implement it accordingly.

In my opinion this case is ambiguous and cpimport treats it the one way without warning and mcsimport the other. In both cases the CSV file gets injected successfully. But with an official specification stating otherwise I'm happy to change it. Even though cpimport is non consistent in its column counting itself.

If you try to import a file with one more column and end it with a separator like in the case above, it states 11 instead of 10 columns.

e.g. with this input data:

1|AAAAAAAABAAAAAAA|2450815|2450996|DEPARTMENT|1|1|In general basic characters welcome. Clearly lively friends conv|bi-annual|a|
2|AAAAAAAACAAAAAAA|2450815|2450996|DEPARTMENT|1|2|English areas will leave prisoners. Too public countries ought to become beneath the years. |bi-annual|b|
3|AAAAAAAADAAAAAAA|2450815|2450996|DEPARTMENT|1|3|Times could not address disabled indians. Effectively public ports c|bi-annual|c|
4|AAAAAAAAEAAAAAAA|2450815|||1|||bi-annual|d|
5|AAAAAAAAFAAAAAAA|2450815|2450996|DEPARTMENT|1|5|Classic buildings ensure in a tests. Real years may not receive open systems. Now broad m|bi-annual|e|

this error messages are stated in the input_file.Job.err file

Line number 1;  Error: Data contains wrong number of columns; num fields expected-9; num fields found-11
Line number 2;  Error: Data contains wrong number of columns; num fields expected-9; num fields found-11
Line number 3;  Error: Data contains wrong number of columns; num fields expected-9; num fields found-11
Line number 4;  Error: Data contains wrong number of columns; num fields expected-9; num fields found-11
Line number 5;  Error: Data contains wrong number of columns; num fields expected-9; num fields found-11

The enclose by character and escape character are missing and specified as feature request in MCOL-1774.
By default cpimport doesn't use any escape and a enclose by character of '\'. Therefore, the missing of both in mcsimport shouldn't have an impact on this bug.

Comment by Zdravelina Sokolovska (Inactive) [ 2018-10-30 ]

it would be expected the same behavior as with cpimport

[root@um1 tpcds_1]# cpimport  tpcds_1  catalog_page  catalog_page.tbl
2018-10-30 15:26:23 (9246) INFO : Running distributed import (mode 1) on all PMs...
2018-10-30 15:26:36 (9246) INFO : For table tpcds_1.catalog_page: 11718 rows processed and 11718 rows inserted.
2018-10-30 15:26:36 (9246) INFO : Bulk load completed, total run time : 12.512 seconds
[root@um1 tpcds_1]#

Oct 30 15:26:23 pm1 writeengineserver[808]: 23.836464 |0|0|0| D 32 CAL0000: 8262 : onReceiveMode() Setting fMode = 1
Oct 30 15:26:23 pm1 writeengineserver[808]: 23.836751 |0|0|0| D 32 CAL0000: 8262 : onReceiveMode() DbRoot Count = 1
Oct 30 15:26:23 pm1 writeengineserver[808]: 23.836963 |0|0|0| D 32 CAL0000: 8262 : CMD LINE ARGS came in /usr/local/mariadb/columnstore/bin/cpimport.bin -R /tmp/columnstore_tmp_files/BrmRpt09301526239246.rpt -m 1 -P um1-9246 -u1beee861-6f92-41fd-bc4a-d99a9b3a853c tpcds_1 catalog_page
Oct 30 15:26:23 pm1 writeengineserver[808]: 23.837077 |0|0|0| D 32 CAL0000: 8262 : Brm Rpt Filename Arrived /tmp/columnstore_tmp_files/BrmRpt09301526239246.rpt
Oct 30 15:26:23 pm1 writeengineserver[808]: 23.837273 |0|0|0| D 32 CAL0000: 8262 : Start Cpimport command reached!!
Oct 30 15:26:23 pm1 cpimport.bin[29246]: 23.882564 |0|0|0| I 34 CAL0086: Initiating BulkLoad: -R /tmp/columnstore_tmp_files/BrmRpt09301526239246.rpt -m 1 -P um1-9246 -u1beee861-6f92-41fd-bc4a-d99a9b3a853c tpcds_1 catalog_page
Oct 30 15:26:23 pm1 writeengineserver[808]: 23.932407 |0|0|0| D 32 CAL0000: 8262 : onReceiveEOD : child ID = 29246
Oct 30 15:26:23 pm1 writeengineserver[808]: 23.932498 |0|0|0| D 32 CAL0000: 8262 : Message Queue is empty; Stopping CF Thread
Oct 30 15:26:23 pm1 cpimport.bin[29246]: 23.989741 |0|0|0| I 34 CAL0081: Start BulkLoad: JobId-39249; db-tpcds_1
Oct 30 15:26:24 pm1 cpimport.bin[29246]: 24.174473 |0|0|0| I 34 CAL0083: BulkLoad: JobId-39249; finished loading table tpcds_1.catalog_page; 10000 rows inserted
Oct 30 15:26:24 pm1 writeengine[29246]: 24.174540 |0|0|0| I 19 CAL0008: Bulkload |Job: /usr/local/mariadb/columnstore/data/bulk/tmpjob/39249_D20181030_T152623_S944712_Job_39249.xml |For table tpcds_1.catalog_page: 10000 rows processed and 10000 rows inserted.
Oct 30 15:26:24 pm1 cpimport.bin[29246]: 24.179151 |0|0|0| I 34 CAL0082: End BulkLoad: JobId-39249; status-SUCCESS
Oct 30 15:26:24 pm1 writeengineserver[808]: 24.189347 |0|0|0| I 32 CAL0000: 8262 : cpimport exit on success
Oct 30 15:26:24 pm1 writeengineserver[808]: 24.189491 |0|0|0| D 32 CAL0000: 8262 : onCpimportSuccess BrmReport Send
Oct 30 15:26:24 pm1 writeengineserver[808]: 24.189555 |0|0|0| D 32 CAL0000: 8262 : onReceiveEOD : child ID = 0
Oct 30 15:26:24 pm1 writeengineserver[808]: 24.189656 |0|0|0| D 32 CAL0000: 8262 : onReceiveEOD : child ID = 0
Oct 30 15:26:36 pm1 writeengineserver[808]: 36.269248 |0|0|0| D 32 CAL0000: 8262 : OnReceiveCleanup arrived

 
Oct 30 15:26:23 um1 writeenginesplit[9246]: 23.905976 |0|0|0| I 33 CAL0000: Send EOD message to All PMs
Oct 30 15:26:24 um1 writeenginesplit[9246]: 24.191706 |0|0|0| I 33 CAL0098: Received a Cpimport Pass from PM1.
Oct 30 15:26:36 um1 writeenginesplit[9246]: 36.268843 |0|0|0| I 33 CAL0098: Received a Cpimport Pass from PM2.
Oct 30 15:26:36 um1 writeenginesplit[9246]: 36.282240 |0|0|0| I 33 CAL0000: Released Table Lock

the used import file is one and the same with cpimport and remore toll mcsimport

[root@um1 tpcds_1]# less catalog_page.tbl
1|AAAAAAAABAAAAAAA|2450815|2450996|DEPARTMENT|1|1|In general basic characters welcome. Clearly lively friends conv|bi-annual|
2|AAAAAAAACAAAAAAA|2450815|2450996|DEPARTMENT|1|2|English areas will leave prisoners. Too public countries ought to become beneath the years. |bi-annual|
3|AAAAAAAADAAAAAAA|2450815|2450996|DEPARTMENT|1|3|Times could not address disabled indians. Effectively public ports c|bi-annual|
4|AAAAAAAAEAAAAAAA|2450815|||1|||bi-annual|
5|AAAAAAAAFAAAAAAA|2450815|2450996|DEPARTMENT|1|5|Classic buildings ensure in a tests. Real years may not receive open systems. Now broad m|bi-annual|
6|AAAAAAAAGAAAAAAA|2450815|2450996|DEPARTMENT|1|6|Exciting principles wish greatly only excellent women. Appropriate fortunes shall not|bi-annual|
7|AAAAAAAAHAAAAAAA|2450815|2450996|DEPARTMENT|1|7|National services must not come at least into a girls|bi-annual|
8|AAAAAAAAIAAAAAAA|2450815|2450996|DEPARTMENT|1|8|Areas see early for a pounds. New goods study too serious women. Unwittingly sorry incentives shall|bi-annual|
9|AAAAAAAAJAAAAAAA|2450815|2450996|DEPARTMENT|1|9|Intensive, economic changes resist bloody of course simple economies; |bi-annual|
10|AAAAAAAAKAAAAAAA|2450815|2450996|DEPARTMENT|1|10|Careful, intense funds balance perhaps boys. Romantic chips remove legs. Direct birds get |bi-annual|
11|AAAAAAAALAAAAAAA|2450815|2450996|DEPARTMENT|1|11|At least national countries live by an sales. Weap|bi-annual|
12|AAAAAAAAMAAAAAAA|2450815|2450996|DEPARTMENT|1|12|Girls indicate so in a countries. Natural, emotional weeks try a|bi-annual|
13|AAAAAAAANAAAAAAA|2450815|2450996|DEPARTMENT|1|13|Miles see mainly clear hands. Villages finish there blue figures. Moreover wide students travel poo|bi-annual|
14|AAAAAAAAOAAAAAAA|2450815|2450996|DEPARTMENT|1|14|Rooms would say ago economic sections. Essential properties might not support groups. Ago rare eye|bi-annual|
15|AAAAAAAAPAAAAAAA|2450815|2450996|DEPARTMENT|1|15|Legal, required ends may not improve in the pictures. Really social structur|bi-annual|
16|AAAAAAAAABAAAAAA|2450815|2450996|DEPARTMENT|1|16|Schools must know now empty legs; generally daily children use sharp, loca|bi-annual|
17|AAAAAAAABBAAAAAA|2450815|2450996|DEPARTMENT|1|17|More than true carers can ensure at a officers. Candidates s|bi-annual|
18|AAAAAAAACBAAAAAA|2450815|2450996|DEPARTMENT|1|18|Shops end problems. Urban experiences play new stores. Institutions order as residential places.|bi-annual|
19|AAAAAAAADBAAAAAA|2450815|2450996|DEPARTMENT|1|19|Poor, hostile guidelines could hope alone early things. Secret, |bi-annual|

Comment by David Thompson (Inactive) [ 2018-11-14 ]

I think it's better for mcsimport to follow documented standards, really we should have a bug on cpimport instead.

Generated at Thu Feb 08 02:31:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.