[MCOL-1853] ColumnStore breaks on S3 NFS mounts Created: 2018-11-03 Updated: 2021-02-20 Resolved: 2021-02-20 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | N/A |
| Affects Version/s: | None |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Martin Adamec | Assignee: | Unassigned |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Description |
|
When running the query: MariaDB [tradealert]> SELECT sio.`id` AS option_id, o.`date` AS market_date, o.`time`, o.`size`, o.`price`, o.`spot`, o.open_int, o.theo, o.ivol, o.delta, o.vega, o.theta, o.`secid` FROM onelots AS o LEFT JOIN securities_occ AS so ON o.secid = so.secid LEFT JOIN dds_main.`sec_instrument_option` AS sio ON so.`occ_identifier` = sio.`occ_identifier` WHERE o.date = '2018-01-19' limit 10; and this is what I got back: ERROR 1815 (HY000): Internal error: IDB-2039: Data file does not exist, please contact your system administrator for more information. MariaDB [tradealert]> SELECT count(1) FROM onelots AS o LEFT JOIN securities_occ AS so ON o.secid = so.secid LEFT JOIN dds_main.`sec_instrument_option` AS sio ON so.`occ_identifier` = sio.`occ_identifier` WHERE o.date = '2018-01-19';
----------
---------- In the stretch of all market dates of Q1 2018 it happened on Jan 19., Feb 5., Feb. 21. and from Mar 5. all the way through Mar 29. All other queries for all other days in this stretch (across the Q1) returned valid and full resultsets. Attached is zipped support report file. We also ran redistributeData but there was not much to distribute because of the configuration. So it ran for 10 ms and it did not help. |
| Comments |
| Comment by Andrew Hutchings (Inactive) [ 2018-11-03 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The problem here is that you are using NFS mounts and particularly S3 NFS mounts do something very odd with file times that break ColumnStore. I highly recommend you do not use NFS for any database storage backend. For eng team: Config::checkReload() is using mtime which S3 NFS mounts often set to zero. Therefore the config never gets loaded in and the DBRoot part is empty upon file access. We should maybe do a checksum based test instead? Or at least don't rely on mtime being correct on a shared/distributed FS. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by John Dutchover [ 2018-11-03 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
So that we may move forward, I have done away with the EFS file systems and setup EBS (io1) volumes for dbroots 1,3 (pm1) and dbroots 2,4 (pm2). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Martin Adamec [ 2018-11-04 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Andrew, There was one "weirdness" I noticed at the very initial loading. mcsimport reported in few cases quite a lot of invalid records. here is a copy of the reported result: Execution time: 0.161977s Execution time: 4281.18s Execution time: 1043.61s Execution time: 0.481345s Execution time: 2718.69s Execution time: 91.0303s Execution time: 38.2951s Execution time: 26.1987s I was looking for some detail LOG similar to csimport but did not find any. could you point me to the place I should be able to find these detail logs? I tried to check loaded data and ran few queries that counted rows also did come conditional counting and it all looked OK to me. Obviously that was not enough to be 100% sure that everything loaded correctly. If there are no logs it would be probably enough to know what triggeres that invalid count to grow when loading using mcsimport. What is more scary is my next finding: {{MariaDB [tradealert]> SELECT sio.`id` AS option_id, o.`date` AS market_date, o.`time`, o.`size`, o.`price`, o.`spot`, o.open_int, o.theo, o.ivol, o.delta, o.vega, o.theta, o.`secid` FROM onelots AS o LEFT JOIN securities_occ AS so ON o.secid = so.secid LEFT JOIN dds_main.`sec_instrument_option` AS sio ON so.`occ_identifier` = sio.`occ_identifier` WHERE o.date = '2018-01-19'; MariaDB [tradealert]> SELECT sio.`id` AS option_id, o.`date` AS market_date, o.`time`, o.`size`, o.`price`, o.`spot`, o.open_int, o.theo, o.ivol, o.delta, o.vega, o.theta, o.`secid` FROM onelots AS o LEFT JOIN securities_occ AS so ON o.secid = so.secid LEFT JOIN dds_main.`sec_instrument_option` AS sio ON so.`occ_identifier` = sio.`occ_identifier` WHERE o.date = '2018-02-05'; MariaDB [tradealert]> SELECT sio.`id` AS option_id, o.`date` AS market_date, o.`time`, o.`size`, o.`price`, o.`spot`, o.open_int, o.theo, o.ivol, o.delta, o.vega, o.theta, o.`secid` FROM onelots AS o LEFT JOIN securities_occ AS so ON o.secid = so.secid LEFT JOIN dds_main.`sec_instrument_option` AS sio ON so.`occ_identifier` = sio.`occ_identifier` WHERE o.date = '2018-02-21'; MariaDB [tradealert]> SELECT sio.`id` AS option_id, o.`date` AS market_date, o.`time`, o.`size`, o.`price`, o.`spot`, o.open_int, o.theo, o.ivol, o.delta, o.vega, o.theta, o.`secid` FROM onelots AS o LEFT JOIN securities_occ AS so ON o.secid = so.secid LEFT JOIN dds_main.`sec_instrument_option` AS sio ON so.`occ_identifier` = sio.`occ_identifier` WHERE o.date = '2018-03-09' limit 10; MariaDB [tradealert]> SELECT sio.`id` AS option_id, o.`date` AS market_date, o.`time`, o.`size`, o.`price`, o.`spot`, o.open_int, o.theo, o.ivol, o.delta, o.vega, o.theta, o.`secid` FROM onelots AS o LEFT JOIN securities_occ AS so ON o.secid = so.secid LEFT JOIN dds_main.`sec_instrument_option` AS sio ON so.`occ_identifier` = sio.`occ_identifier` WHERE o.date = '2018-03-27' limit 10;
----------
---------- MariaDB [tradealert]> SELECT sio.`id` AS option_id, o.`date` AS market_date, o.`time`, o.`size`, o.`price`, o.`spot`, o.open_int, o.theo, o.ivol, o.delta, o.vega, o.theta, o.`secid` FROM onelots AS o LEFT JOIN securities_occ AS so ON o.secid = so.secid LEFT JOIN dds_main.`sec_instrument_option` AS sio ON so.`occ_identifier` = sio.`occ_identifier` WHERE o.date = '2018-03-27'; I ran the query one time with limit 10 and another time without and I had 2 different results for March 27. We need to know what is causing this issue to come back. Without knowing the reason it makes the whole environment unpredictable for us and that is very risky proposition. I attached a support report file (labeled with today's date for easier recognition) to document what we currently have configured. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrew Hutchings (Inactive) [ 2018-11-05 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
How did you move from one storage type to the other? It looks like you may have just moved the data files over to the new storage and started it? In which case you are trying to start with already damaged files. I highly recommend a clean install. As for the invalid count. Please file a new ticket for this and attach an example CSV file and schema for it so we can take a look. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by John Dutchover [ 2018-11-05 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The data had been retained when moved to EBS. It was a misunderstanding on my part. A clean installation has been completed. I generated a support file columnstoreSupportReport.cstore-pg1c.tar.gz |