Andrew,
Please help me figuring out what could be still wrong with our installation. We rebuilt the storage, I removed databases running drop database xyz from UM1, recreated them, recreated tables and started loading data from scratch.
There was one "weirdness" I noticed at the very initial loading. mcsimport reported in few cases quite a lot of invalid records. here is a copy of the reported result:
Execution time: 0.817828s
Rows inserted: 10064
Truncation count: 0
Saturated count: 0
Invalid count: 0
cat calendar-datadock_2018_Q1.csv | wc -l
10064
Execution time: 0.161977s
Rows inserted: 2077
Truncation count: 0
Saturated count: 0
Invalid count: 1
cat groups-datadock_2018_Q1.csv | wc -l
2077
Execution time: 4281.18s
Rows inserted: 87483118
Truncation count: 0
Saturated count: 0
Invalid count: 32809561
cat onelots-datadock_2018_Q1.csv | wc -l
87483118
Execution time: 1043.61s
Rows inserted: 53818130
Truncation count: 0
Saturated count: 0
Invalid count: 38551020
cat open_interest-datadock_2018_Q1.csv | wc -l
53818130
Execution time: 0.481345s
Rows inserted: 17387
Truncation count: 0
Saturated count: 0
Invalid count: 159
cat optimal_ex-datadock_2018_Q1.csv | wc -l
17387
Execution time: 2718.69s
Rows inserted: 101144576
Truncation count: 0
Saturated count: 0
Invalid count: 146538555
cat rates-datadock_2018_Q1.csv | wc -l
101144576
Execution time: 91.0303s
Rows inserted: 4278121
Truncation count: 0
Saturated count: 0
Invalid count: 34791
cat securities-datadock_2018_Q1.csv | wc -l
4278121
Execution time: 38.2951s
Rows inserted: 264387
Truncation count: 0
Saturated count: 0
Invalid count: 17
cat underlyings-datadock_2018_Q1.csv | wc -l
264387
Execution time: 26.1987s
Rows inserted: 494710
Truncation count: 61
Saturated count: 0
Invalid count: 1940652
cat usec_hist-datadock_2018_Q1.csv | wc -l
494710
I was looking for some detail LOG similar to csimport but did not find any. could you point me to the place I should be able to find these detail logs? I tried to check loaded data and ran few queries that counted rows also did come conditional counting and it all looked OK to me. Obviously that was not enough to be 100% sure that everything loaded correctly. If there are no logs it would be probably enough to know what triggeres that invalid count to grow when loading using mcsimport.
What is more scary is my next finding:
We had a case of missing files exactly the same as we had before when I opened this ticket for the first time. There is only a little difference between then and now. Not all dates between march 5 and 29 are with missing files. We have these files missing:
{{MariaDB [tradealert]> SELECT sio.`id` AS option_id, o.`date` AS market_date, o.`time`, o.`size`, o.`price`, o.`spot`, o.open_int, o.theo, o.ivol, o.delta, o.vega, o.theta, o.`secid` FROM onelots AS o LEFT JOIN securities_occ AS so ON o.secid = so.secid LEFT JOIN dds_main.`sec_instrument_option` AS sio ON so.`occ_identifier` = sio.`occ_identifier` WHERE o.date = '2018-01-19';
ERROR 1815 (HY000): Internal error: IDB-2039: Data file does not exist, please contact your system administrator for more information.
MariaDB [tradealert]> SELECT sio.`id` AS option_id, o.`date` AS market_date, o.`time`, o.`size`, o.`price`, o.`spot`, o.open_int, o.theo, o.ivol, o.delta, o.vega, o.theta, o.`secid` FROM onelots AS o LEFT JOIN securities_occ AS so ON o.secid = so.secid LEFT JOIN dds_main.`sec_instrument_option` AS sio ON so.`occ_identifier` = sio.`occ_identifier` WHERE o.date = '2018-02-05';
ERROR 1815 (HY000): Internal error: IDB-2039: Data file does not exist, please contact your system administrator for more information.
MariaDB [tradealert]> SELECT sio.`id` AS option_id, o.`date` AS market_date, o.`time`, o.`size`, o.`price`, o.`spot`, o.open_int, o.theo, o.ivol, o.delta, o.vega, o.theta, o.`secid` FROM onelots AS o LEFT JOIN securities_occ AS so ON o.secid = so.secid LEFT JOIN dds_main.`sec_instrument_option` AS sio ON so.`occ_identifier` = sio.`occ_identifier` WHERE o.date = '2018-02-21';
ERROR 1815 (HY000): Internal error: IDB-2039: Data file does not exist, please contact your system administrator for more information.
MariaDB [tradealert]> SELECT sio.`id` AS option_id, o.`date` AS market_date, o.`time`, o.`size`, o.`price`, o.`spot`, o.open_int, o.theo, o.ivol, o.delta, o.vega, o.theta, o.`secid` FROM onelots AS o LEFT JOIN securities_occ AS so ON o.secid = so.secid LEFT JOIN dds_main.`sec_instrument_option` AS sio ON so.`occ_identifier` = sio.`occ_identifier` WHERE o.date = '2018-03-09' limit 10;
ERROR 1815 (HY000): Internal error: IDB-2039: Data file does not exist, please contact your system administrator for more information.
MariaDB [tradealert]> SELECT sio.`id` AS option_id, o.`date` AS market_date, o.`time`, o.`size`, o.`price`, o.`spot`, o.open_int, o.theo, o.ivol, o.delta, o.vega, o.theta, o.`secid` FROM onelots AS o LEFT JOIN securities_occ AS so ON o.secid = so.secid LEFT JOIN dds_main.`sec_instrument_option` AS sio ON so.`occ_identifier` = sio.`occ_identifier` WHERE o.date = '2018-03-27' limit 10;
------------------------------------------------------------------------------------------------------------
option_id |
market_date |
time |
size |
price |
spot |
open_int |
theo |
ivol |
delta |
vega |
theta |
secid |
------------------------------------------------------------------------------------------------------------
55344 |
2018-03-27 |
15:45:41.313 |
1 |
87.7 |
1002.01 |
1180 |
88.8097 |
0.268411 |
-0.432299 |
3.50937 |
-0.141225 |
20393363 |
5897 |
2018-03-27 |
15:45:41.325 |
1 |
59.05 |
1489 |
3039 |
60.4389 |
0.35868 |
0.374017 |
2.64117 |
-0.617602 |
23051456 |
1270919 |
2018-03-27 |
15:45:41.331 |
1 |
0.26 |
220.71 |
4681 |
0.261891 |
0.828654 |
0.047409 |
0.016148 |
-0.233111 |
25047193 |
1493877 |
2018-03-27 |
15:45:41.333 |
1 |
12.25 |
220.71 |
1201 |
11.9327 |
0.631092 |
0.620184 |
0.139078 |
-0.453351 |
25269065 |
1653542 |
2018-03-27 |
15:45:41.375 |
1 |
2.55 |
277.91 |
17 |
2.44285 |
0.742268 |
0.277963 |
0.069004 |
-1.39039 |
25430466 |
1548574 |
2018-03-27 |
15:45:41.388 |
1 |
9.85 |
213.22 |
60 |
9.61333 |
0.316144 |
-0.662974 |
0.162157 |
-0.154413 |
25314047 |
228803 |
2018-03-27 |
15:45:41.401 |
2 |
6.61 |
1002.01 |
5553 |
6.70226 |
0.336513 |
0.152676 |
0.606164 |
-0.430907 |
23298935 |
1697220 |
2018-03-27 |
15:45:41.444 |
1 |
0.26 |
6.905 |
668 |
0.238412 |
0.594428 |
0.29706 |
0.009021 |
-0.005256 |
25507298 |
1697220 |
2018-03-27 |
15:45:41.445 |
2 |
0.26 |
6.905 |
668 |
0.238412 |
0.594428 |
0.29706 |
0.009021 |
-0.005256 |
25507298 |
1697220 |
2018-03-27 |
15:45:41.445 |
1 |
0.26 |
6.905 |
668 |
0.238412 |
0.594428 |
0.29706 |
0.009021 |
-0.005256 |
25507298 |
------------------------------------------------------------------------------------------------------------
10 rows in set (3.59 sec)
MariaDB [tradealert]> SELECT sio.`id` AS option_id, o.`date` AS market_date, o.`time`, o.`size`, o.`price`, o.`spot`, o.open_int, o.theo, o.ivol, o.delta, o.vega, o.theta, o.`secid` FROM onelots AS o LEFT JOIN securities_occ AS so ON o.secid = so.secid LEFT JOIN dds_main.`sec_instrument_option` AS sio ON so.`occ_identifier` = sio.`occ_identifier` WHERE o.date = '2018-03-27';
ERROR 1815 (HY000): Internal error: IDB-2039: Data file does not exist, please contact your system administrator for more information.
}}
I ran the query one time with limit 10 and another time without and I had 2 different results for March 27.
We need to know what is causing this issue to come back. Without knowing the reason it makes the whole environment unpredictable for us and that is very risky proposition.
I attached a support report file (labeled with today's date for easier recognition) to document what we currently have configured.
columnstoreSupportReport_20181104.csdb-pg1.tar.gz
The problem here is that you are using NFS mounts and particularly S3 NFS mounts do something very odd with file times that break ColumnStore. I highly recommend you do not use NFS for any database storage backend.
For eng team:
Config::checkReload() is using mtime which S3 NFS mounts often set to zero. Therefore the config never gets loaded in and the DBRoot part is empty upon file access. We should maybe do a checksum based test instead? Or at least don't rely on mtime being correct on a shared/distributed FS.