Sprint:
2021-16, 2021-17, 2022-22, 2022-23, 2023-4, 2023-5, 2023-6, 2023-7, 2023-8, 2023-10
Rewording
Cpimport and LDIF of the same file doesn't have the same result. Cpimport appears to not truncate strings
cpimport test flights /tmp/flights.txt -m1 -s '\t'
versus
mariadb test -e "LOAD DATA INFILE '/tmp/flights.txt' IGNORE INTO TABLE flights2 FIELDS TERMINATED BY '\t';"
Expected:
When using cpimport - Strings longer than 255 are truncated to fit varchar(255) just like LDIF does
Actual:
cpimport does not truncate strings even when the column is defined as varchar(255), unlike LDIF
Reproduction:
Follow the commands/steps in reproduction.bash after scp of flights.txt to /tmp/ directory
-----------------------------
it seems that cpimport could multiply some characters (up to number of charset bytes) when loading data into varchar column(s).
For example, in the original case, data loaded from .tsv file into varchar(255) as
cpimport test flights_repro flights_repro.txt -m1 -e1 -s '\t' -n1
resulted in the following output (charset=utf8mb3), which does not look right:
select id, lengthb(notes),char_length(notes) from flights_repro;
------ ---------------------------------
id
lengthb(notes)
char_length(notes)
------ ---------------------------------
3
765
765
5199
765
765
7275
765
765
...
If the same data were loaded via LDIF as
LOAD DATA INFILE '/tmp/flights2.txt' INTO TABLE flights2_cs FIELDS TERMINATED BY '\t';
then result looks correct:
select id, lengthb(notes),char_length(notes) from flights_repro;
------ ---------------------------------
id
lengthb(notes)
char_length(notes)
------ ---------------------------------
3
255
255
5199
255
255
7275
255
255
...
An attempted simplified repro is the following:
repro.tsv produced as (in same way/options as in the original case)
mysql -Ns -B -D test --execute="select id,notes from flights_biu where id =3" > repro.txt
use test;
CREATE TABLE `repro` (
`id` int(11) NOT NULL,
`notes` varchar(255) DEFAULT NULL
) ENGINE=Columnstore DEFAULT CHARSET=utf8mb3
;
LOAD DATA INFILE '/tmp/repro.tsv' INTO TABLE repro FIELDS TERMINATED BY '\t';
...
Query OK, 1 row affected (1.186 sec)
Records: 1 Deleted: 0 Skipped: 0 Warnings: 0
...
select id, lengthb(notes),char_length(notes) from repro;
--- ---------------------------------
id
lengthb(notes)
char_length(notes)
--- ---------------------------------
--- ---------------------------------
\q
mysql -Ns -B -D test --execute="select id,notes from repro" > repro_ldif.tsv
truncate table repro;
cpimport test repro repro.tsv -m1 -e1 -s '\t' -n1
...
2021-11-22 16:19:25 (4607) INFO : Running distributed import (mode 1) on all PMs...
2021-11-22 16:19:25 (4607) INFO : For table test.repro: 1 rows processed and 1 rows inserted.
2021-11-22 16:19:25 (4607) INFO : Bulk load completed, total run time : 0.192545 seconds
...
select id, lengthb(notes),char_length(notes) from repro;
--- ---------------------------------
id
lengthb(notes)
char_length(notes)
--- ---------------------------------
--- ---------------------------------
1 row in set (0.037 sec)
\q
mysql -Ns -B -D test --execute="select id,notes from flights_biu where id =3" > repro_cpimp.tsv
SELECT and comparison of dumps produced by cpimp and ldif shows that cpimport loads 2 extra ' ' at the beginning of line. While LDIF loads data correctly, without prepending.
I not sure whether options are wrong or is there a problem with cpimport ?
{"report":{"fcp":1268.7000002861023,"ttfb":440.2000002861023,"pageVisibility":"visible","entityId":105515,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"4d311cb9-be37-45aa-a5f5-7b00d8c88393","navigationType":0,"readyForUser":1363.4000000953674,"redirectCount":0,"resourceLoadedEnd":1375,"resourceLoadedStart":445.59999990463257,"resourceTiming":[{"duration":274.6000003814697,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":445.59999990463257,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":445.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":720.2000002861023,"responseStart":0,"secureConnectionStart":0},{"duration":275,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":445.90000009536743,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":445.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":720.9000000953674,"responseStart":0,"secureConnectionStart":0},{"duration":341,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":446.09999990463257,"connectEnd":446.09999990463257,"connectStart":446.09999990463257,"domainLookupEnd":446.09999990463257,"domainLookupStart":446.09999990463257,"fetchStart":446.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":446.09999990463257,"responseEnd":787.0999999046326,"responseStart":787.0999999046326,"secureConnectionStart":446.09999990463257},{"duration":410.2999997138977,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":446.30000019073486,"connectEnd":446.30000019073486,"connectStart":446.30000019073486,"domainLookupEnd":446.30000019073486,"domainLookupStart":446.30000019073486,"fetchStart":446.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":446.30000019073486,"responseEnd":856.5999999046326,"responseStart":856.5999999046326,"secureConnectionStart":446.30000019073486},{"duration":414.09999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":446.40000009536743,"connectEnd":446.40000009536743,"connectStart":446.40000009536743,"domainLookupEnd":446.40000009536743,"domainLookupStart":446.40000009536743,"fetchStart":446.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":446.40000009536743,"responseEnd":860.5,"responseStart":860.5,"secureConnectionStart":446.40000009536743},{"duration":414.3999996185303,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":446.7000002861023,"connectEnd":446.7000002861023,"connectStart":446.7000002861023,"domainLookupEnd":446.7000002861023,"domainLookupStart":446.7000002861023,"fetchStart":446.7000002861023,"redirectEnd":0,"redirectStart":0,"requestStart":446.7000002861023,"responseEnd":861.0999999046326,"responseStart":861.0999999046326,"secureConnectionStart":446.7000002861023},{"duration":414.59999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":446.90000009536743,"connectEnd":446.90000009536743,"connectStart":446.90000009536743,"domainLookupEnd":446.90000009536743,"domainLookupStart":446.90000009536743,"fetchStart":446.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":446.90000009536743,"responseEnd":861.5,"responseStart":861.5,"secureConnectionStart":446.90000009536743},{"duration":499.90000009536743,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":447.09999990463257,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":447.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":947,"responseStart":0,"secureConnectionStart":0},{"duration":414.69999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":447.2000002861023,"connectEnd":447.2000002861023,"connectStart":447.2000002861023,"domainLookupEnd":447.2000002861023,"domainLookupStart":447.2000002861023,"fetchStart":447.2000002861023,"redirectEnd":0,"redirectStart":0,"requestStart":447.2000002861023,"responseEnd":861.9000000953674,"responseStart":861.9000000953674,"secureConnectionStart":447.2000002861023},{"duration":499.69999980926514,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":447.40000009536743,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":447.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":947.0999999046326,"responseStart":0,"secureConnectionStart":0},{"duration":415.1000003814697,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":447.59999990463257,"connectEnd":447.59999990463257,"connectStart":447.59999990463257,"domainLookupEnd":447.59999990463257,"domainLookupStart":447.59999990463257,"fetchStart":447.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":447.59999990463257,"responseEnd":862.7000002861023,"responseStart":862.7000002861023,"secureConnectionStart":447.59999990463257},{"duration":906.4000000953674,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":453.90000009536743,"connectEnd":453.90000009536743,"connectStart":453.90000009536743,"domainLookupEnd":453.90000009536743,"domainLookupStart":453.90000009536743,"fetchStart":453.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":453.90000009536743,"responseEnd":1360.3000001907349,"responseStart":1360.3000001907349,"secureConnectionStart":453.90000009536743},{"duration":911,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":453.90000009536743,"connectEnd":453.90000009536743,"connectStart":453.90000009536743,"domainLookupEnd":453.90000009536743,"domainLookupStart":453.90000009536743,"fetchStart":453.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":453.90000009536743,"responseEnd":1364.9000000953674,"responseStart":1364.9000000953674,"secureConnectionStart":453.90000009536743},{"duration":254,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":960.5999999046326,"connectEnd":960.5999999046326,"connectStart":960.5999999046326,"domainLookupEnd":960.5999999046326,"domainLookupStart":960.5999999046326,"fetchStart":960.5999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":960.5999999046326,"responseEnd":1214.5999999046326,"responseStart":1214.5,"secureConnectionStart":960.5999999046326},{"duration":150.2999997138977,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/e65b778d185daf5aee24936755b43da6/_/download/contextbatch/js/browser-metrics-plugin.contrib,-_super,-atl.general/batch.js?agile_global_admin_condition=true&jag=true&slack-enabled=true&whisper-enabled=true","startTime":1224.7000002861023,"connectEnd":1224.7000002861023,"connectStart":1224.7000002861023,"domainLookupEnd":1224.7000002861023,"domainLookupStart":1224.7000002861023,"fetchStart":1224.7000002861023,"redirectEnd":0,"redirectStart":0,"requestStart":1224.7000002861023,"responseEnd":1375,"responseStart":1375,"secureConnectionStart":1224.7000002861023}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":247,"responseStart":440,"responseEnd":446,"domLoading":444,"domInteractive":1512,"domContentLoadedEventStart":1512,"domContentLoadedEventEnd":1563,"domComplete":1839,"loadEventStart":1839,"loadEventEnd":1840,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1470},{"name":"bigPipe.sidebar-id.end","time":1470.8000001907349},{"name":"bigPipe.activity-panel-pipe-id.start","time":1471},{"name":"bigPipe.activity-panel-pipe-id.end","time":1475.2000002861023},{"name":"activityTabFullyLoaded","time":1582.0999999046326}],"measures":[],"correlationId":"f45a00efb5e693","effectiveType":"4g","downlink":9.7,"rtt":0,"serverDuration":127,"dbReadsTimeInMs":17,"dbConnsTimeInMs":27,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
For QA:
Here is a simplified test case to reproduce the issue. In the below, /tmp/utf8_test.txt contains the following text:
Query OK, 1 row affected, 1 warning (1.365 sec)
Records: 1 Deleted: 0 Skipped: 0 Warnings: 1
Verify that LDI correctly loads and truncates the multi-byte string:
| a |
| König-\n\n-Stra |
| lengthb(a) | char_length(a) |
| 16 | 15 |
Now import the same data using cpimport:
Verify that the number of bytes imported by cpimport is incorrect:
| a |
| König-\n\n-Stra |
| König-\n\n-Stra |
| lengthb(a) | char_length(a) |
| 16 | 15 |
| 19 | 17 |
With the fix, rerun cpimport:
Now verify that cpimport correctly truncates the string (with the cpimport log showing truncation message) and loads the correct number of bytes:
| a |
| König-\n\n-Stra |
| König-\n\n-Stra |
| König-\n\n-Stra |
| lengthb(a) | char_length(a) |
| 16 | 15 |
| 19 | 17 | <- row imported using cpimport before the fix