[MCOL-4695] Need to increase resiliency in docker containers for skysql Created: 2021-04-24  Updated: 2021-05-03

Status: Open
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Isaac Venn (Inactive) Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None


 Description   

We have come across the following errors in a skysql CS installation:

Apr 23 22:24:22 cs-node-0 StorageManager[422]: synchronizeWithJournal(): no journal file found for data1/60da65f6-159c-475b-8659-a60f865168af_47185920_5242880_data1~000.dir~000.dir~066.dir~080.dir~000.dir~FILE001.cdf
Apr 23 22:24:22 cs-node-0 StorageManager[422]: synchronizeWithJournal(): data1/60da65f6-159c-475b-8659-a60f865168af_47185920_5242880_data1~000.dir~000.dir~066.dir~080.dir~000.dir~FILE001.cdf has no journal, but it does exist in the cloud. This indicates that an overlapping syncWithJournal() call handled it properly.
Apr 23 22:24:50 cs-node-0 writeengineserver[559]: 50.557593 |0|0|0| D 32 CAL0000: 9914 : onReceiveEOD : child ID = 169957
Apr 23 22:24:50 cs-node-0 writeengineserver[559]: 50.557897 |0|0|0| D 32 CAL0000: 9914 : Message Queue is empty; Stopping CF Thread
Apr 23 22:24:51 cs-node-0 writeengineserver[559]: 51.038989 |0|0|0| D 32 CAL0000: 9914 : onCpimportSuccess BrmReport Send
Apr 23 22:24:51 cs-node-0 writeengineserver[559]: 51.039084 |0|0|0| D 32 CAL0000: 9914 : onReceiveEOD : child ID = 0
Apr 23 22:24:51 cs-node-0 writeengineserver[559]: 51.039152 |0|0|0| D 32 CAL0000: 9914 : onReceiveEOD : child ID = 0
Apr 23 22:24:51 cs-node-0 writeengineserver[559]: 51.150294 |0|0|0| D 32 CAL0000: 9914 : OnReceiveCleanup arrived
Apr 23 22:24:51 cs-node-0 dbcon[328]: 51.710533 |21801|0|0| D 24 CAL0001: End SQL statement
Apr 23 22:24:51 cs-node-0 ExeMgr[551]: 51.787991 |2147505459|0|0| D 16 CAL0041: Start SQL statement: select objectid,columnname from syscolumn where schema='oquant' and tablename='es20180823' --columnRIDs/FE; ||
Apr 23 22:24:51 cs-node-0 ExeMgr[551]: 51.812633 |2147505459|0|0| D 16 CAL0042: End SQL statement

After discussing it with toddstoffel he felt like it was a problem with how the service responded to either a restart or a loss of connectivity with the s3 bucket. At his request I've opened this ticket.

The original issue came up when a client was not able to delete from the table.


Generated at Thu Feb 08 02:52:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.