[MCOL-1335] Data load through PDi adapter appears corrupted in the database Created: 2018-04-12 Updated: 2023-10-26 Resolved: 2023-10-25 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | None |
| Affects Version/s: | 1.1.4 |
| Fix Version/s: | Icebox |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Kotsinova (Inactive) | Assignee: | Leonid Fedorov |
| Resolution: | Won't Fix | Votes: | 7 |
| Labels: | None | ||
| Environment: |
CentOS 7 |
||
| Epic Link: | Consolidate & Redevelop All Columnstore Tools (SDK, Adapters, Backup, Restore, mcsimport) |
| Sprint: | 2018-14, 2018-15, 2018-16, 2018-17, 2018-18, 2018-19 |
| Description |
|
1. Start bulk load of a large CSV file with pentaho bulk load adapter. File contains 121 million rows (9 GB in size).
3. The pentaho job finished successfully and the data seems to be inserted into the database.
5. Run query:
Result: Expected: |
| Comments |
| Comment by Elena Kotsinova (Inactive) [ 2018-04-13 ] |
|
Executed more tests. The issue is related to the data volume loaded by pdi bulk adapter. |
| Comment by GUIDI [ 2018-06-22 ] |
|
The Cache clearing (select calFlushCache() |
| Comment by Andrew Hutchings (Inactive) [ 2018-06-22 ] |
|
elena.kotsinova can you please re-test with 1.1.5? I think this could be related to |
| Comment by Elena Kotsinova (Inactive) [ 2018-07-05 ] |
|
The issue is not fixed in the 1.1.5 calFlushCache() dosn't repair the table in this case. |
| Comment by Assen Totin (Inactive) [ 2019-04-09 ] |
|
Confirming, this is fully reproducible with MCS 1.2.2 and corresponding Pentaho bulk load component (Version: 1.2.2, Revision: fddec8b). Importing 25M rows into an empty using cpimport results in fully functional table. Importing the same data to same empty table over PDI results in partially broken table - some queries work (consistently), others fail (consistently) with IDB-2035. Flushing the cache or restarting the MCS does not help. No alarms are observed during/post the error. The only relevant entries down to the debug log are: Apr 9 15:37:48 p2w5 ExeMgr[13750]: 48.454810 |22|0|0| D 16 CAL0041: Start SQL statement: select count The PID of the process that raises the assertion corresponds to PrimProc. Enabling per-process log for PrimProc does not give anything more - its log file contains the very same line as above. This is affecting a customer of ours - how can we help this to be traced/resolved faster? I have a fully working setup and the broken data is available for inspection. |
| Comment by Assen Totin (Inactive) [ 2019-04-09 ] |
|
The same issue present in mcsimport, so it is not limited to the Pentaho adapter, but rather is somewhere in the MCS API code. LinuxJedi May ask you to take a look? I can open a separate ticket if preferred. I have a dataset available for testing. |
| Comment by Andrew Hutchings (Inactive) [ 2019-04-10 ] |
|
assen.totin can you give me a test case using mcsimport? This will be a lot easier for me to debug. |
| Comment by Assen Totin (Inactive) [ 2019-04-10 ] |
|
Here is the sample table and the queries I run. When run in this order, the last one fails with the IDB-2035. When same data is loaded with cpimport, all queries succeed. Database schema name is 'ebi'. create table if not exists etl_measures_actions SELECT * FROM ebi.etl_measures_actions LIMIT 0, 1000; I'll send you a link to download the data for this table. |
| Comment by Abie Reifer [ 2019-12-25 ] |
|
Wondering if there is a fix or workaround for this issue. I seem to be running into it. Thanks |