[MCOL-2044] Update CPRange max/min for extent doing DML Created: 2018-12-20 Updated: 2021-07-09 Resolved: 2021-06-25 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ? |
| Affects Version/s: | 1.0.0, 1.2 |
| Fix Version/s: | 6.1.1 |
| Type: | Bug | Priority: | Major |
| Reporter: | Richard Stracke | Assignee: | Sergey Zefirov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Epic Link: | ColumnStore Performance Improvements | ||||||||||||||||||||||||||||||||
| Sprint: | 2021-1, 2021-2, 2021-3, 2021-4, 2021-5, 2021-6, 2021-7, 2021-8, 2021-9 | ||||||||||||||||||||||||||||||||
| Description |
|
An extent is marked as invalid if DELETE, INSERT, UPDATE touches this extent. The extent has to be reread by SELECT to become valid again. If one bulk inserts in the invalid extent then min/max values aren't updated in the EM. |
| Comments |
| Comment by Sergey Zefirov [ 2020-11-16 ] | |||||||||||||||||||||||||||||||||||
|
I will talk below about MAX operation and/or property. Same logic can be applied to MIN operation as well. The mix operation is saturating: if you add an element to a set, the max change if new element is greater than old min/max. Basically, INSERT operations can compute max on the fly. The DELETE operation may result in recomputation if element to delete is equal to MAX. The UPDATE operation can result in recomputation if element to be updated is equal to MAX and new value is less than MAX. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2020-11-16 ] | |||||||||||||||||||||||||||||||||||
|
Typical sequence of actions is "process INSERT/UPDATE/DELETE, collecting LBIDs; mark extents for LBIDs invalid". The collecting phase is done with method AddLBIDToList in writeengine/wrapper/writeengine.* - it is called from the processing of operations. The second part is to call DBRM instance's method markExtentsInvalid (note the "s" in Extent*s* - it operates over vector). At this point we are out of luck to change something. From the notes above and general structure of code it looks like we cannot write some general code. We must patch operations of INSERT/UPDATE/DELETE instead. The idea is to fetch extent ranges before calling AddLBIDToList and check them and not call AddLBIDToList if ranges do not change. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2020-11-17 ] | |||||||||||||||||||||||||||||||||||
|
CPRange contains int64_t (uint64_t) ranges. So we have to work with subset of all database types, namely, integers. The "invalid" state is when upper bound of range is smaller than lower bound. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2020-11-17 ] | |||||||||||||||||||||||||||||||||||
|
Thorough look through the code revealed that actuall calls for extent invalidaton live in writeengine/wrapper/writeengine.cpp:
So I think I will do experiment to see whether batch inserts do actually record information. And if so, I will use them as an example on how to proceed with other operations. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2020-11-17 ] | |||||||||||||||||||||||||||||||||||
|
No, bulk inserts processing does not store correct min/max values. For some reason it clears extent with explicitly created cprange values with invalid marks. It also set extents as invalid twice - in the beginning for new PM (what it is, I want to know?) and later if all of the processing was done without errors. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2020-11-18 ] | |||||||||||||||||||||||||||||||||||
|
From what I get, the number of rows that get inserted is bounded by extent size: the call to allocRowId sets number of rows to be added (by changing rowsLeft variable). | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2020-11-20 ] | |||||||||||||||||||||||||||||||||||
|
Relevant branch in my fork is here. Not much there yet. I am still gathering information. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2020-11-20 ] | |||||||||||||||||||||||||||||||||||
|
The proper handling of ranges seem to be more complex than simply setting them. We have to keep two copies of cpranges - original for rollback and current ones. At commit point we do nothing or may copy ranges into rollback copy immediately. At rollback we do the opposite - ranges for rolled back extents are copied into current ones. Because of this, we must remove bulk drop through calling AddLBIDToList (writeengine.cpp) and subsequent marking of extents in this list as invalid ones. Write Engine class contains a map of transactions (sparse array represented as map) to TxnLBIDRec's. These TxnLBIDRecs contain a map of LBIDs processed. We may record cprange changes there. I think I'll start with that. | |||||||||||||||||||||||||||||||||||
| Comment by Roman [ 2020-11-20 ] | |||||||||||||||||||||||||||||||||||
|
As I said we don't need to account for rollback doing this task. Mark the Regards, On Fri, Nov 20, 2020 at 6:15 PM Sergey Zefirov (Jira) <jira@mariadb.org> | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2020-11-25 ] | |||||||||||||||||||||||||||||||||||
|
Relevant code branch: https://github.com/mariadb-SergeyZefirov/mariadb-columnstore-engine/tree/MCOL-2044-update-ranges-during-DML Right now I am rewriting and fixing various things in the main write engine. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2020-11-26 ] | |||||||||||||||||||||||||||||||||||
|
Tests:
It is strange that problem arise in DDL part. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2020-11-27 ] | |||||||||||||||||||||||||||||||||||
|
Refactored "mark used extents as invalid" logic works. I have to review and test it for probable memory leaks, though. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2020-11-27 ] | |||||||||||||||||||||||||||||||||||
|
Branch relevant to refactoring done is here. No leaks reported or found during review. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2020-11-30 ] | |||||||||||||||||||||||||||||||||||
|
There is another structure for keeping map of LBIDs and their ranges, called CPMaxMinMergeMap_t (bmtypes.h). It is quite close but to completely same to what I am doing. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2020-12-01 ] | |||||||||||||||||||||||||||||||||||
|
There is a good new functionality of type handlers. It allows to find a type handler for a column type (which I have in inserts etc) and perform various tasks there. Also, there's clear need to make bridges with WriteEngine:
(it is from datatypes/mcs_datatypes.h) I guess I have to make at least one bridge then - convert boost::any that is in ColTuple to a value, | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2020-12-07 ] | |||||||||||||||||||||||||||||||||||
|
First tries show no core dumps but no extent information change either. Will review code again, most probble I forgot something. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2020-12-08 ] | |||||||||||||||||||||||||||||||||||
|
Currently I have an issue:
I can create table but CS does not recognize it or record it in its internal records. I am looking for what and who is the reason. | |||||||||||||||||||||||||||||||||||
| Comment by Roman [ 2020-12-09 ] | |||||||||||||||||||||||||||||||||||
|
I'd tested the case and it works for me in develop. On Tue, Dec 8, 2020 at 10:40 PM Sergey Zefirov (Jira) <jira@mariadb.org> | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2020-12-14 ] | |||||||||||||||||||||||||||||||||||
|
Porting my changes from So far, so good, but slow. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-11 ] | |||||||||||||||||||||||||||||||||||
|
The culprit: parameters with default values, bool fromList = true and my addition CPInfo* maxmins. And also my insufficient familiarity with the code base. The pointer (maxmins) was silently converted to boolean value, because I forgot to pass explicit "true" for the call to convertValArray. And result is:
So table gets added but DML (insertion) does not work due to error above. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-12 ] | |||||||||||||||||||||||||||||||||||
|
The "image inconsistency" error is due to different semantics of "int" return code in different parts of the system. It has nothing to do with the distributed work. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-12 ] | |||||||||||||||||||||||||||||||||||
|
The plan:
| |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-13 ] | |||||||||||||||||||||||||||||||||||
|
Some operation can drop ranges to "unknown range" value: as an example, deletion of value that is equal to maximum or minimum. Thus, there can be insertion into an extent that has "unknown range" range value. This insertion also must result in "unknown range". The code to test:
The code above does not test for cross-block or cross-extent operation. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-13 ] | |||||||||||||||||||||||||||||||||||
|
As a side note, immutable data strctures like in log structured merge trees sense (or any other static structure for that matter) does not suffer from proliferation of corner cases. A sequence of values has a range of values (including counting ranges where each max and min value has associated count, below an explanation). The range is computed for that concrete sequence of values and will not change because sequence will not change too. Instead of update we insert. Instead of delete we write deletion mark . The insertions form sequence of values which has range (possibly, counted). The deletion also form sequence of values and also has range, possiblly counted. We read sequences of values according to the ranges. If some sequence of values does not have values from the range we interested in, we will not read it. We may have a hierarchy of sequences just like B-tree index, with ranges associated with coarser levels. Now I stop - I have to work on concrete code for concrete problem. But I am not conceding. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-13 ] | |||||||||||||||||||||||||||||||||||
|
Okay, debugging the updates. So far I see no regressions but also ranges do not get updated. Investigating the latter. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-14 ] | |||||||||||||||||||||||||||||||||||
|
Now the culprit in the commit logic - commit does a call to bulk set of HW and CPInfo with the sequence number of -1 (magic sequence number indicating the need of reset). It does so for all extents modified. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-18 ] | |||||||||||||||||||||||||||||||||||
|
Last friday I succed with the range updates. Now I am working on the test case for cross-extent test case. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-19 ] | |||||||||||||||||||||||||||||||||||
|
I have strange behavior here:
I insert 8388600 (0x07ffff8 - 8 less than 2^23^) , which obviously uses cpimport. Then I insert three values and all three go into second extent - which extent should not exist in a first place. So I will work on the update and delete part now on. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-20 ] | |||||||||||||||||||||||||||||||||||
|
Partial success in cross-extent insertion. The test gen-cross-extent.py I've got correct result for first extent and incorrect one for second (below). But I see no exceptions, assertions or other regressions.
| |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-22 ] | |||||||||||||||||||||||||||||||||||
|
The results for gen-cross-extent-and-insert-again.py
I've added an insertion after cross-extent insertion. I expected the following:
The bright side: I have no regressions otherwise. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-22 ] | |||||||||||||||||||||||||||||||||||
|
The reason for not quite expected result in previous attempt was due to single wor insert. Single row and batch inserts have different code paths in the write engine (a problem in itself) and single row insertion does not have range update code (yet). I've changed test code from singular insert to batch insert ("insert into t ( x ) values (424242),(424243);" instead of insertion of just 424242) and got the following result:
The result of first select should be 43 and 44 - the values we inserted into second insert in our cross-extent insertion. These values were not recorded (the range for that extent was set to empty range with "updating" state). The values 424242 and 424243 were recorded into range and resulted in a wrong range and, consequently, wrong selection result. If we are updating ranges, we have to do that consistently. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-25 ] | |||||||||||||||||||||||||||||||||||
|
About consistency of updates. Terms:
Deletion of single element from block:
Please note that it is possible to reexamine the block from which element(s) are deleted and check whether it still has elements that define same range (e.g., equal to max and equal to min). It is like the rescan operation, but for block only. Update of an element:
It is clear that it is possible to have some extents with invalid ranges even if we try to keep ranges updated. This means that insertion can encounter extents with invalid ranges and not only in case of the new extent. For example, imagine we inserted enough values selected alternatively from 42 and 44 to cross block bundaries and now we have at least two blocks. Then, in some block, we update everything to 43. Now this block will not produce old range after block rescan and that forces us to drop whole extent. This, basically, forces us to adopt the following rule: on insertion, we update invalid extent range to valid range only if we are inserting into new extent.
| |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-25 ] | |||||||||||||||||||||||||||||||||||
|
There are two separate allocation paths in the WriteEngine::insert* code: one for isFirstExtentOnThisPM==true and another is for regular situation but when existing extent was overflown. This indicates a need for tests which invoke creation of "first extent on this partition manager" situation. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-25 ] | |||||||||||||||||||||||||||||||||||
|
https://jira.mariadb.org/browse/MCOL-2044?focusedCommentId=178041&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel - now whole test works allright:
Going to have tests for updates, etc, using MTR framework. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-26 ] | |||||||||||||||||||||||||||||||||||
|
Update and delete operations works the same and they operate on columns split into extents, quite fortunately. Column operation for those operations reads a block from extent (based on rowid), updates it and then writes back., "Write column records" operation of write engine also differ between bulk inserts and update/delete. This means we can more or less safely change logic of the specific routines. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-28 ] | |||||||||||||||||||||||||||||||||||
|
Degenerate corner case. Suppose we have an extent with just single value, 5, ont or more rows. Current logic of update checks are:
Updating row from 5 to 6 will trigger right subexpression of logical OR in condition above and updating from 5 to 4 will trigger left subexpression. In any case we will get extent range invalidated. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-29 ] | |||||||||||||||||||||||||||||||||||
|
Tests are here.
Range is dropped but update does not work correctly. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-29 ] | |||||||||||||||||||||||||||||||||||
|
Regression is in insert code - insertion should update invalid range only if that invalid range for completely empty extent. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-01-29 ] | |||||||||||||||||||||||||||||||||||
|
Regression was due to incorrect operation type set in WriteEngine. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-01 ] | |||||||||||||||||||||||||||||||||||
|
Regression is fixed, debugging update's failure to keep ranges. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-03 ] | |||||||||||||||||||||||||||||||||||
|
Fixed regression in "drop table" for columnstore-managed tables. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-03 ] | |||||||||||||||||||||||||||||||||||
|
First non-trivial test runs:
We have 4-line defense tier in the range keeping: not so long ago we dropped ranges in writeengine, in we_dml, in dmlproc and, finally, in commandpackageprocessor in the COMMIT/ROLLBACK part. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-05 ] | |||||||||||||||||||||||||||||||||||
|
It appears that my by-hand runs of tests indicate that most of cases work. Tests that should preserve or extend ranges do so, tests that should drop ranges also do so. I am putting tests into MTR and form a PR then. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-10 ] | |||||||||||||||||||||||||||||||||||
|
MTR tests successfully fail - I am working on providing proper results and validating MTR output. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-10 ] | |||||||||||||||||||||||||||||||||||
|
Current results:
| |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-12 ] | |||||||||||||||||||||||||||||||||||
|
Current state of tests: Failed 3/11 tests, 72.73% were successful. Deletes do not work properly - investigating. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-12 ] | |||||||||||||||||||||||||||||||||||
|
Okay, only one remained to fix - delete of element inside range. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-12 ] | |||||||||||||||||||||||||||||||||||
|
All current tests pass. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-15 ] | |||||||||||||||||||||||||||||||||||
|
Some tests for unsigned integer types fail due to invalid range being displayed as max=0, min=-1 instead of NULL, NULL. MTR output below:
Test expects two NULL values and get 0 and -1. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-15 ] | |||||||||||||||||||||||||||||||||||
|
Tests for signed and unsigned integers and big ints have successfully passed. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-16 ] | |||||||||||||||||||||||||||||||||||
|
Tests for tinyints, smallints, integers and bigints are passing, for both signed and unsigned variants. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-16 ] | |||||||||||||||||||||||||||||||||||
|
Relevant pull request: https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1771 | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-18 ] | |||||||||||||||||||||||||||||||||||
|
There are two regressions: mcs30_update_table and mcs81_self_join. msc30_update_table regression is due to SELECT with integer column comparison with constant wrongfully returning empty set. I suspect it is due to invalid range being in state CP_VALID and not CP_UPDATE. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-18 ] | |||||||||||||||||||||||||||||||||||
|
The code that reports on extent ranges does not take into account case for unsigned types (especially bigint). It is in function generate_result() in file is_columnstore_extents.cpp:
It may bite some curious souls. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-19 ] | |||||||||||||||||||||||||||||||||||
|
With recent changes (do not set invalid ranges to CP_VALID state) mcs30_update_table test passes. mcs81_self_join does not. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-19 ] | |||||||||||||||||||||||||||||||||||
|
The same thing (in different place at write engine code) was the reason mcs81_self_join was failing. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-24 ] | |||||||||||||||||||||||||||||||||||
|
I now have one test failing: mcs211_idbExtentId_function I suspect it is due to my tests not clearing database/tables they created. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-02-24 ] | |||||||||||||||||||||||||||||||||||
|
mcs211_idbExtentId_function is green now. I was right in my suspicion. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-03-01 ] | |||||||||||||||||||||||||||||||||||
|
I am investigating strange discrepance in results: select max_value, min_value from information_schema.columnstore_extents; The bold value is ALMOST NULL - NULL being 0x8000000000000000 and the above is 0x80000000000000c0. Logs for extent range set code indicate that we are setting everything right. But virtual table select result gives us something wrong. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-03-01 ] | |||||||||||||||||||||||||||||||||||
|
One more interesting discrepancy:
-1..0 is a valid invalid range for unsigned columns. Yet generate_result() in code of virtual table columnstore_extents fails to do so. Fixing that ay break other tests. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-03-03 ] | |||||||||||||||||||||||||||||||||||
|
I have genuine regressions in 8 tests - WE update operations sometimes fail and even BRM operations fail. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-03-03 ] | |||||||||||||||||||||||||||||||||||
|
Here: https://ci.columnstore.mariadb.net/mariadb-corporation/mariadb-columnstore-engine/1788/1/9 "mcol_2044_unsigned_.*" tests are fixed (spaces vs tabs) the remaining ones are in work. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-03-03 ] | |||||||||||||||||||||||||||||||||||
|
Failure of mcs12_alter_table test was due to inconsistent assignment of types in write engine. During operations on system catalog token columns aren't marked as WR_TOKEN, but as WR_ULONGLONG and for these columns a special flag (tokenColumn) is set in column information record. Some other tests (extent functions in particular) aer failing due to incomplete cleanup in failing tests. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-03-04 ] | |||||||||||||||||||||||||||||||||||
|
Due to extent rescan during, well, select operation in other DML operations, it is not possible to properly test that update and delete keep invalid range. This is so because we have to use selection operation (WHERE x=y) and that triggers extent rescan for invalid extents. So I deleted these tests (16 in total). MTR is now green. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-03-11 ] | |||||||||||||||||||||||||||||||||||
|
Fixed DECIMAL(38) tests, they should pass. Also will see whether mcs30_update-table failure will persist. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-03-16 ] | |||||||||||||||||||||||||||||||||||
|
The test001.sh from old regression tests (mariadb-columnstore-regression-tests) shows no difference in 10.6 and my PR behavior except for slightly better run times in my PR run. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-04-06 ] | |||||||||||||||||||||||||||||||||||
|
The concerns I have:
| |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-04-19 ] | |||||||||||||||||||||||||||||||||||
The algorithm for keeping ranges in update operationValues and their meaningold_value - old values corresponding to some RID to update; this value was overwritten. range_max - max bound of a range, all values in extent are <= range_max The condition to drop range to invalid state (empty range)if (old_value == range_min && new_value > range_min) || (old_value == range_max && new_value < range_min): mark range invalid because of possible contraction of range. Condition to testThe tests that demonstrate range dropping: https://github.com/mariadb-SergeyZefirov/mariadb-columnstore-engine/blob/develop/mysql-test/columnstore/basic/t/mcol2044_signed_bigint_update-drop-max.test and https://github.com/mariadb-SergeyZefirov/mariadb-columnstore-engine/blob/develop/mysql-test/columnstore/basic/t/mcol2044_signed_bigint_update-drop-min.test The test that demonstrate range keeping while doing update operations that do not touch bounds: https://github.com/mariadb-SergeyZefirov/mariadb-columnstore-engine/blob/develop/mysql-test/columnstore/basic/t/mcol2044_signed_bigint_update-within-range.test The tests that extends min and max: https://github.com/mariadb-SergeyZefirov/mariadb-columnstore-engine/blob/develop/mysql-test/columnstore/basic/t/mcol2044_signed_bigint_update-extends-min.test and https://github.com/mariadb-SergeyZefirov/mariadb-columnstore-engine/blob/develop/mysql-test/columnstore/basic/t/mcol2044_signed_bigint_update-extends-max.test They provide demonstration of how algorithm behaves. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-04-20 ] | |||||||||||||||||||||||||||||||||||
Range contraction problemThe problem is that we do not look at whole extent when doing update. We look at data on the block level. If we update a block and overwrite a value that lies on the minimum bound and new value is bigger than old minimum bound, what new minimum bound should we assign? We can have another block with exact same minimum bound value. Let me offer an example. Imagine there is an extent with three values 1, 1, and 10 and our blocks hold exactly one value. We overwrite first 1 with 2. From information that is accessible to us, we can set new range as 1..10 and 2..10. Second choice is clearly wrong. First choice (never shrink a range) is partially wrong and will bring us to an overextended ranges. E.g., we may end having ranges like -1e6..1e6 for actual values of -100..100 due to events happened long ago (one customer was rich, then took huge overdraft and now they are all middle class). Marking ranges as invalid and set them to the "empty set" value would trigger rescan which will set ranges properly. Probable cpimport problemsI was wrong. The case where extent set into valid state is here: https://github.com/mariadb-corporation/mariadb-columnstore-engine/blob/develop/versioning/BRM/extentmap.cpp#L1018 It does not trigger during normal operation. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-04-20 ] | |||||||||||||||||||||||||||||||||||
|
https://jira.mariadb.org/browse/MCOL-4621 - the ticket where I've fixed cpimport problems. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-04-21 ] | |||||||||||||||||||||||||||||||||||
|
Just in case, in | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-06-23 ] | |||||||||||||||||||||||||||||||||||
|
This is interesting - Write Engine treats such short varchar(s) as integers that are large enough and does not do any additional processing. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Zefirov [ 2021-06-24 ] | |||||||||||||||||||||||||||||||||||
|
tntnatbry what is bug 4594? JIRA says I cannot see this task because of insufficient permissions or task was deleted. As per problem with calshowpartitionsbyvalue, the extents after insert linger in UPDATING state. I will look into why. It is much like https://jira.mariadb.org/browse/MCOL-4673 only with character columns. |