[MCOL-590] No extent information on Insert/Updates with table corruption Created: 2017-02-23 Updated: 2017-09-20 Resolved: 2017-09-20 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | DMLProc |
| Affects Version/s: | 1.0.7 |
| Fix Version/s: | Icebox |
| Type: | Bug | Priority: | Critical |
| Reporter: | Bernd Helm | Assignee: | David Hall (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 1 |
| Labels: | None | ||
| Environment: |
Debian 8 with kernel 4.9 with 4pm combined install |
||
| Attachments: |
|
| Sprint: | 2017-4, 2017-5, 2017-6, 2017-7, 2017-8, 2017-9, 2017-10 |
| Description |
|
We have huge problems with columnstore breaking on a regular basis. this also (more rarely) happend on infinidb 4.6.7 this is what we do:
from the logs:
|
| Comments |
| Comment by David Thompson (Inactive) [ 2017-02-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
When you had this issue with infinidb did it basically behave exactly the same or differently? Is the only difference the frequency. In looking at your code one thing that sticks out as to something that that has changed since infinidb is that insert into selects are by default converted into a stream into the cpimport code. If the behavior is different or even if not then you can disable this: Thanks for all the detailed info, it should help triage. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bernd Helm [ 2017-02-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
the "no extent information available" error did definitely also occur on infinidb and the partitions were crashed afterwards. i currently do not know if the getDbRootHWMInfo also looked the same. the frequency thing is a bit hard to tell, as it seems that it depends on the data range that is updated. (i.e. stats_hour which contains 24 times the data of stats_day is more likely to crash). it may be impossible to reproduce it with only 1000 rows. on our setup, we have ~100k rows on the source side and 12-30 million rows per day on the destination side. regarding the batch insert mode, this does not apply to this story, as the batch insert mode is only active on NON-transactional inserts. these crashes only happen on transactional inserts/updates. inserts with cpimport are stable for me. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Thompson (Inactive) [ 2017-02-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
You are correct that the batch insert mode would not apply due to this being in a transaction. The error line: Is likely the root cause, my guess is that the insert requires creation of a new extent because all across pms are full and that there is some race condition between the creation of the extent and preparing for the specific insert. This would also explain why it's sporadic. We have made some performance improvements to reduce contention so it's possible one of these is making this more likely or it's just pure coincidence on the volume / frequency of data making this more likely to happen. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bernd Helm [ 2017-02-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you for your attention. We have installed InfiniDB 4.6.7 on the same server and ran the same data importer on the same data. its running fine since Friday, while columnstore crashed after some minutes (and i have replayed the tables and retried 3 times before giving up). so i am now certain that the table crash frequency of columnstore is much higher, because other factors are almost eliminated. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Hall (Inactive) [ 2017-02-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
There's a good possibility that running update and then insert in the same transaction is causing the problem: My theory is thus: Consider the following where stats_day.hash is oid 3543 (I imported the dump of extent map from calpontsupportreport).
---------
--------- You can see that the last segment has a block offset > than the high water mark (HWM) of the previous segment and that the segment status is 1. This is not allowed and thus the error. (status: 0 is AVAILABLE and 1 is UNAVAILABLE. Segments are created UNAVAILABLE and updated by some later operations.) Also note that the seq # for the next to last segment is 37. This implies that the extent was updated 37 times during this transaction. I believe that if a COMMIT were issued before the INSERT, the problem will be resolved and should be used as a work around until this bug is fixed. Earlier in the log, I see a number of errors of the following type: These are indications of a bad extent map. There's lots of verboseness there, but basically, HWM can't be less than lowfbo, which is the block offset from the start of the file to the beginning of the segment we're looking at. HWM should point to the offset from the start of the file to the Logical Block (LBID) of the last block in the segment. 'range size' as reported in the error is the difference between them. It should never be negative which would indicate the data block is before the start of the segment. Unfortunately, the reported extent map appears to no longer have these corruptions, so further analysis is difficult. It's my hope that these anomalies are caused by the same forces at work in the main concern and that these errors will evaporate when COMMIT before INSERT is tried. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Thompson (Inactive) [ 2017-03-07 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Bernd, have you had a chance to review David Hall's suggestion, this would be a possible workaround and also help confirm this is indeed the bug? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Thompson (Inactive) [ 2017-03-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Bernd, did you have a chance to review this? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bernd Helm [ 2017-03-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
sorry for having you ask twice; no, i had no chance to test this. anyways, it would be the better if you could reproduce it yourself. i will setup an vm to test this out (i hope its reproduceable with a single-vm multi-dbroot setup, we will see). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Thompson (Inactive) [ 2017-03-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
May be best to provide us data and scripts as that makes it more independently reproducible. Probably best if you can email the details offline. |