[MCOL-3251] FILE001.cdf:No such file or directory - occurred when query, truncate and cpimport collided Created: 2019-04-12  Updated: 2020-11-12  Resolved: 2020-03-24

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.2.2
Fix Version/s: 1.2.6, 1.4.4, 1.5.1

Type: Bug Priority: Minor
Reporter: David Hill (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

2um 2pm with local query


Sprint: 2020-3, 2020-4, 2020-5

 Description   

Customer report errors on both PMs

Apr 11 19:11:05 usfit-scdb6 PrimProc[144321]: 05.309773 |0|0|0| C 28 CAL0053: PrimProc could not open file for OID 73844; /000.dir/001.dir/032.dir/116.dir/000.dir/FILE001.cdf:No such file or directory

No ExeMgr restarted, which has caused this issue in the past. Customer reported this is might have caused the problem via the Issue

There was a long running query.. and at the end of the query this uses table X. This query was running.. when at the same time a process ran that TRUNCATES and cpimports table X. It appears the data files were updated, leaving the long query to look in the previous place for the data that was not there. This is when our error returned and still presents a problem for this type of thing to be able to happen.

It is understandable that it can happen.. however the errors given seem to indicate a l detrimental loss of data



 Comments   
Comment by Patrick LeBlanc (Inactive) [ 2020-03-13 ]

The existing method to generate filenames is excessive for what PrimProc needs. Unclear whether other callers need that extra functionality or error handling. For now I've simply added a minimal method to PrimProc, because it just needs the filename. This introduces no new error paths, risks nothing outside of PrimProc breaking, and will be more informative to the user. When a dbroot goes offline during a query, the user will get a file-not-found error as before, with a filename that indicates which dbroot it looked in.

Comment by Patrick LeBlanc (Inactive) [ 2020-03-13 ]

Oops.

Comment by Gagan Goel (Inactive) [ 2020-03-17 ]

For QA: The fix might not be verifiable if the original bug cannot be reproduced. pleblanc ran some regression tests to ensure the filenames generated by the fix are correct.

Comment by Daniel Lee (Inactive) [ 2020-03-24 ]

From the information in the ticket, I am not sure exactly what the new behavior is and if this ticket is testable.

I performed the following test on these different releases:

1) cpimport 10gb lineitem table
2) in one mysql session, execute query
3) in another session, truncate table lineitem as soon as the query stated

query:

select sum(l_quantity), sum(l_extendedprice) from lineitem where l_comment > "A";

Depending on how soon I truncate the lineitem table after the query has been started, the behavior would be different.

1.2.2-1

MariaDB [mytest]> select sum(l_quantity), sum(l_extendedprice) from lineitem where l_comment > "A";
ERROR 1815 (HY000): Internal error: IDB-2039: Data file does not exist, please contact your system administrator for more information.

crit.log

[root@localhost columnstore]# cat crit.log
Mar 23 14:32:27 localhost PrimProc[4952]: 27.876992 |0|0|0| C 28 CAL0053: PrimProc could not open file for OID 3022; /usr/local/mariadb/columnstore/data1/000.dir/000.dir/011.dir/206.dir/000.dir/FILE001.cdf:No such file or directory
Mar 23 14:32:32 localhost PrimProc[4952]: 32.985273 |0|0|0| C 28 CAL0053: PrimProc could not open file for OID 3023; /usr/local/mariadb/columnstore/data1/000.dir/000.dir/011.dir/207.dir/000.dir/FILE001.cdf:No such file or directory

debug.log

Mar 23 14:32:31 localhost IDBFile[4952]: 31.983083 |0|0|0| D 35 CAL0002: Failed to open file: /usr/local/mariadb/columnstore/data1/000.dir/000.dir/011.dir/207.dir/000.dir/FILE001.cdf, exception: unable to open Unbuffered file
Mar 23 14:32:32 localhost PrimProc[4952]: 32.985273 |0|0|0| C 28 CAL0053: PrimProc could not open file for OID 3023; /usr/local/mariadb/columnstore/data1/000.dir/000.dir/011.dir/207.dir/000.dir/FILE001.cdf:No such file or directory
Mar 23 14:32:32 localhost PrimProc[4952]: 32.986084 |0|0|0| W 28 CAL0000: PrimProc AsyncLoader caught error: IDB-2039: Data file does not exist, please contact your system administrator for more information.
Mar 23 14:32:33 localhost PrimProc[4952]: 33.102656 |0|0|0| W 28 CAL0000: IDB-2031: Blocks are missing. Alter or drop table in progress?
Mar 23 14:32:33 localhost PrimProc[4952]: 33.102844 |0|0|0| W 28 CAL0000: PrimProc AsyncLoader caught error: IDB-2031: Blocks are missing. Alter or drop table in progress?
Mar 23 14:32:33 localhost PrimProc[4952]: 33.102612 |0|0|0| W 28 CAL0000: BRP::getBlock(): got a BRM lookup error. LBID=1055744 ver= SCN: 0#012 Txns: txn=0 vbFlg=0
Mar 23 14:32:33 localhost PrimProc[4952]: 33.102947 |0|0|0| W 28 CAL0000: PrimProc AsyncLoader caught error: IDB-2031: Blocks are missing. Alter or drop table in progress?
Mar 23 14:32:33 localhost PrimProc[4952]: 33.102974 |0|0|0| W 28 CAL0000: IDB-2031: Blocks are missing. Alter or drop table in progress?
Mar 23 14:32:33 localhost PrimProc[4952]: 33.102729 |0|0|0| W 28 CAL0000: BRP::getBlock(): got a BRM lookup error. LBID=645120 ver= SCN: 0#012 Txns: txn=0 vbFlg=0
Mar 23 14:32:33 localhost PrimProc[4952]: 33.103071 |0|0|0| W 28 CAL0000: PrimProc AsyncLoader caught error: IDB-2031: Blocks are missing. Alter or drop table in progress?
Mar 23 14:32:33 localhost PrimProc[4952]: 33.103739 |0|0|0| W 28 CAL0000: IDB-2031: Blocks are missing. Alter or drop table in progress?
Mar 23 14:32:33 localhost ddlpackageproc[5399]: 33.191456 |13|10|0| D 23 CAL0042: End SQL statement

1.2.6-1

MariaDB [mytest]> select sum(l_quantity), sum(l_extendedprice) from lineitem where l_comment > "A";
ERROR 1815 (HY000): Internal error: IDB-2039: Data file does not exist, please contact your system administrator for more information.

crit.log

Mar 24 18:24:50 localhost PrimProc[14561]: 50.303039 |0|0|0| C 28 CAL0053: PrimProc could not open file for OID 3033; /usr/local/mariadb/columnstore/data1/000.dir/000.dir/011.dir/217.dir/000.dir/FILE002.cdf:No such file or directory
Mar 24 18:24:52 localhost PrimProc[14561]: 52.564134 |0|0|0| C 28 CAL0061: PrimProc error reading file for OID 3033; Error reading compression header. rc=0, idx=31, ptr.size=1
Mar 24 18:24:52 localhost PrimProc[14561]: 52.565233 |0|0|0| C 28 CAL0000: Error reading compression header. rc=0, idx=31, ptr.size=1
Mar 24 18:24:52 localhost PrimProc[14561]: 52.566286 |0|0|0| C 28 CAL0061: PrimProc error reading file for OID 3033; Error reading compression header. rc=0, idx=23, ptr.size=1
Mar 24 18:24:52 localhost PrimProc[14561]: 52.567718 |0|0|0| C 28 CAL0000: Error reading compression header. rc=0, idx=23, ptr.size=1

debug.log

Mar 24 18:24:49 localhost IDBFile[14561]: 49.296938 |0|0|0| D 35 CAL0002: Failed to open file: /usr/local/mariadb/columnstore/data1/000.dir/000.dir/011.dir/217.dir/000.dir/FILE002.cdf, exception: unable to open Unbuffered file
Mar 24 18:24:50 localhost PrimProc[14561]: 50.303039 |0|0|0| C 28 CAL0053: PrimProc could not open file for OID 3033; /usr/local/mariadb/columnstore/data1/000.dir/000.dir/011.dir/217.dir/000.dir/FILE002.cdf:No such file or directory
Mar 24 18:24:50 localhost PrimProc[14561]: 50.304629 |0|0|0| W 28 CAL0000: IDB-2039: Data file does not exist, please contact your system administrator for more information.
Mar 24 18:24:52 localhost PrimProc[14561]: 52.564134 |0|0|0| C 28 CAL0061: PrimProc error reading file for OID 3033; Error reading compression header. rc=0, idx=31, ptr.size=1
Mar 24 18:24:52 localhost PrimProc[14561]: 52.565233 |0|0|0| C 28 CAL0000: Error reading compression header. rc=0, idx=31, ptr.size=1
Mar 24 18:24:52 localhost PrimProc[14561]: 52.566286 |0|0|0| C 28 CAL0061: PrimProc error reading file for OID 3033; Error reading compression header. rc=0, idx=23, ptr.size=1
Mar 24 18:24:52 localhost PrimProc[14561]: 52.567718 |0|0|0| C 28 CAL0000: Error reading compression header. rc=0, idx=23, ptr.size=1

1.4.4-1 and 1.5.0-1

Both have similar behavior.

Some times the truncate table statement seemed to be waiting for the query to finish. Both statements processed successfully.
Some times it the query failed as the following.

MariaDB [mytest]> select sum(l_quantity), sum(l_extendedprice) from lineitem where l_comment > "A";
ERROR 1815 (HY000): Internal error: An unexpected condition within the query caused an internal processing error within Columnstore. Please check the log files for more details. Additional Information: error in BatchPrimitivePro

crit.log

[root@localhost columnstore]# cat crit.log
Mar 24 18:32:34 localhost PrimProc[4072]: 34.443609 |0|0|0| C 28 CAL0061: PrimProc error reading file for OID 3036; Error decompressing block 46 code=-1 part=0 seg=0
Mar 24 18:32:34 localhost PrimProc[4072]: 34.444218 |0|0|0| C 28 CAL0000: Error decompressing block 46 code=-1 part=0 seg=0
Mar 24 18:32:37 localhost PrimProc[4072]: 37.697803 |0|0|0| C 28 CAL0061: PrimProc error reading file for OID 3036; Error decompressing block 46 code=-1 part=0 seg=0
Mar 24 18:32:37 localhost PrimProc[4072]: 37.699306 |0|0|0| C 28 CAL0000: Error decompressing block 46 code=-1 part=0 seg=0

debug.log

Mar 24 18:47:21 localhost ExeMgr[4173]: 21.795796 |14|0|0| D 16 CAL0041: Start SQL statement: select sum(l_quantity), sum(l_extendedprice) from lineitem where l_comment > "A"; |mytest|
Mar 24 18:47:36 localhost PrimProc[4072]: 36.857786 |0|0|0| I 28 CAL0061: PrimProc error reading file for OID 3022; decompress retry for /var/lib/columnstore/data1/000.dir/000.dir/011.dir/206.dir/000.dir/FILE001.cdf decompRetry chunk 18 code=-1
Mar 24 18:47:39 localhost PrimProc[4072]: 39.700203 |0|0|0| C 28 CAL0061: PrimProc error reading file for OID 3022; Error decompressing block 18 code=-1 part=0 seg=1
Mar 24 18:47:39 localhost PrimProc[4072]: 39.738179 |0|0|0| I 28 CAL0061: PrimProc error reading file for OID 3022; decompress retry for /var/lib/columnstore/data1/000.dir/000.dir/011.dir/206.dir/000.dir/FILE001.cdf decompRetry chunk 18 code=-1
Mar 24 18:47:39 localhost PrimProc[4072]: 39.738315 |0|0|0| C 28 CAL0000: Error decompressing block 18 code=-1 part=0 seg=1
Mar 24 18:47:39 localhost ExeMgr[4173]: 39.766078 |14|0|0| D 16 CAL0042: End SQL statement
Mar 24 18:47:39 localhost ddlpackageproc[4293]: 39.789453 |10|12|0| D 23 CAL0041: Start SQL statement: truncate table lineitem;|mytest|
Mar 24 18:47:42 localhost PrimProc[4072]: 42.338170 |0|0|0| C 28 CAL0061: PrimProc error reading file for OID 3022; Error decompressing block 18 code=-1 part=0 seg=1
Mar 24 18:47:42 localhost PrimProc[4072]: 42.338765 |0|0|0| W 28 CAL0000: PrimProc AsyncLoader caught error: Error decompressing block 18 code=-1 part=0 seg=1
Mar 24 18:47:42 localhost PrimProc[4072]: 42.367392 |0|0|0| I 28 CAL0061: PrimProc error reading file for OID 3022; decompress retry for /var/lib/columnstore/data1/000.dir/000.dir/011.dir/206.dir/000.dir/FILE001.cdf decompRetry chunk 18 code=-1
Mar 24 18:47:45 localhost PrimProc[4072]: 45.189166 |0|0|0| C 28 CAL0061: PrimProc error reading file for OID 3022; Error decompressing block 18 code=-1 part=0 seg=1
Mar 24 18:47:45 localhost PrimProc[4072]: 45.191152 |0|0|0| C 28 CAL0000: Error decompressing block 18 code=-1 part=0 seg=1
Mar 24 18:47:45 localhost ddlpackageproc[4293]: 45.768804 |10|12|0| D 23 CAL0042: End SQL statement
[root@localhost columnstore]#

Comment by Daniel Lee (Inactive) [ 2020-03-24 ]

Truncating a table while a long query on the table is inflight will cause a file-not-found error and the query would failed. This is the expected behavior. The change for this ticket is to make the messages more clear regarding the file names and paths.

Generated at Thu Feb 08 02:41:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.