|
Regarding to this task the file header should contain new information:
1. Column width.
2. Column data type.
Also we need a function which maps full filename to oid to be able to create columnExtent.
|
|
file2Oid added for review https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1794
|
|
patch which adds 2 new fields to `CompressedDBFileHeader` added for review https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1795
|
|
add patch with rebuildEM tool for review https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1808
|
|
Currently this tool works with compressed segment files created by engine, but it's not enough to rebuild full extent map.
The problem: system segment files does not have compression, that means we cannot simply restore needed data (column type and width) to create a column extent. Those extents are needed inside extent map for proper work of the database.
Some ideas:
1. Keep the initial state of extent map inside the rebuildEM tool and restore it, before going inside dbroot.
a. Keep as binary blob, a global inside rebuildEM tool, the initial size is about 4k, but looking to the hexdum it seems like a sparse matrix and the restore it using ExtentMap.load()
b. Keep as (oid, partition, segment, width, col data type, and may be other other needed data) and try to restore it by calling createColumnExtent.
Will experiment on it starting from the next week.
Problems:
Currently I'm not sure about structure of the system files, the main question could the tables increase in size over time and create another segment files to keep data. In this case the initial state will be invalid.
2. Another way to keep separate file with system extent map.
Problems:
it seems like it's the same as keeping full extent map in file, as it works now.
|
|
After initializing extents for the system table i got to errors:
1. Was related to extent status, if we use `createColumnExtent` it by default is 'unavailable` so we need to mark it available by setLocalHWM.
2. Error related to tables with 'varchar' and `char` (all columns which has additional dictionary segment file) the current approach uses a `greedy` strategy to allocate LBID from the freelist, so after running the rebuildEM we got extents with different `range.start` compared with original if we start from oid different than it was in original pass, so I got different errors when trying to select from 'varchar` column, it could be `null` values or it could be a values from other tables.
Some solutions could be:
1. extend file header and add `range.start` field, walk on all segment files in dbroot, save all needed data and create extent map.
2. walk on dbroot and try to sort extents by oid, than create extent map.
Currently I'm able to rebuildEM and get `partially` working database. The database works with all columns except 'varchar'.
|
|
Unfortunately we have to keep the start address of freeList in the segment files.
Following example:
create table t1 (a varchar (255)) engine = columnstore;
insert into t1 values("a");
create table t2 (a varchar (255), b varchar (255), c varchar (255)) engine=columnstore;
insert into t2 values("a", "b", "c");
will create for us following extents:
range.start|range.size|fileId|blockOffset|HWM|partition|segment|dbroot|width|status|hiVal|loVal|seqNum|isValid|
234496|8|3001|0|0|0|0|1|8|0|0|-1|4|0|
242688|8|3002|0|0|0|0|1|0|0|-9223372036854775808|9223372036854775807|2|0|
250880|8|3004|0|0|0|0|1|8|0|0|-1|2|0|
259072|8|3007|0|0|0|0|1|0|0|-9223372036854775808|9223372036854775807|1|0|
267264|8|3005|0|0|0|0|1|8|0|0|-1|2|0|
275456|8|3008|0|0|0|0|1|0|0|-9223372036854775808|9223372036854775807|1|0|
283648|8|3006|0|0|0|0|1|8|0|0|-1|2|0|
291840|8|3009|0|0|0|0|1|0|0|-9223372036854775808|9223372036854775807|1|0|
To rebuild extent map in the right way (to be able to access columns with char data through dictionary files) we have to rebuild `range.start` as it was originally, but currently we do not have enough information. I was thinking that we can walk inside dbroot, collect needed data, and keep it sorted by oid, than rebuild em, but the example above shows that it was a wrong.
|
|
Updated https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1808
Currently it works in following way:
1. Initialize the extents for the system tables from the initial binary blob.
2. Walks dbroot and collects (oid, part, segment, coltype, colwidth, isDict) for segment files. (Keeps it sorted via map<FIleId, OidComparator>)
3. Rebuilds extent maps from the collected data starting from the lowest OID.
Current version successfully rebuild initial tables works with columns without (varchar types).
This should be updated to keep sorted not by oid, but by offset in freeList like (map<FileId, start.rangeComparator>), after this we can build extent map in greedy way.
|
|
Updated patch on review. Current solution create extent in order it was created originally.
Tested solution on different tables, seems like working solution.
Need more testing - like adding table with million rows.
|
|
Run some test on rows > 8kk, and found that we also need to support multiple extents per one segment file. I think we can hard-code it into 2 as in config file and check the config, if we got more. Just do not run the tool.
Also other question is how to set HWM properly?
|
|
Currently it does not work after inserting data with `cpimport` for example inserting 17kk rows:
create table t1(a int) engine=columnstore;
`$cpimport temp t1 content.tbl`.
it will create a 3 segment files and 3 extents, but `lbid` is not changing from default values.
running:
`
$rebuilEM -v
FileId is collected [OID: 3001, partition: 0, segment: 1, col width: 4, lbid:-1, isDict: 0]
Processing file: /var/lib/columnstore/data1/000.dir/000.dir/011.dir/185.dir/000.dir/FILE000.cdf [OID: 3001, partition: 0, segment: 0]
FileId is collected [OID: 3001, partition: 0, segment: 0, col width: 4, lbid:-1, isDict: 0]
Processing file: /var/lib/columnstore/data1/000.dir/000.dir/011.dir/185.dir/000.dir/FILE002.cdf [OID: 3001, partition: 0, segment: 2]
FileId is collected [OID: 3001, partition: 0, segment: 2, col width: 4, lbid:-1, isDict: 0]
Build extent map with size 1
Extent is created, allocated size 4096 actual LBID 234496 for [OID: 3001, partition: 0, segment: 1, col width: 4, lbid:-1, isDict: 0]
`
Another thing I dont understand currently. It seems like after the size of extent is exceed 8m rows, engine should create a new extent for the same segment file. The number of extents per segment file is defined inside config, but using `cpimport` it creates a new segment file for new extent.
`
range.start|range.size|fileId|blockOffset|HWM|partition|segment|dbroot|width|status|hiVal|loVal|seqNum|isValid|
234496|4|3001|0|4095|0|0|1|4|0|999999|0|9|2|
238592|4|3001|0|4095|0|1|1|4|0|999999|0|9|2|
242688|4|3001|0|121|0|2|1|4|0|999999|751616|1|2|
`
`
./000.dir/011.dir/185.dir/000.dir/FILE001.cdf
./000.dir/011.dir/185.dir/000.dir/FILE000.cdf
./000.dir/011.dir/185.dir/000.dir/FILE002.cdf
`
|
|
Needed to add HWM calculation such as (decompressed file size - header size * 2 ) / block size.
Updated this will not work. File size does not reflect to HWM. Currently HWM is incremented by one after next block is needed inside extent.
Will try to calculate hwm by searching an first block with empty value in file.
`
200 for (j = 0, curVal = buf; j < totalRowPerBlock; j++, curVal += column.colWidth)
201 {
202 if (isEmptyRow((uint64_t*)curVal, emptyVal, column.colWidth))
203 {
`
|
|
Last patch https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/1808
adds:
Proper HWM recovery from segment file.
Added support for bulk insertion via cpimport.
Current limitations - it does not work with multiple extents per segment file.
We can simple detect those kind of files in case we recover hwm >= (columnWidth * numExtentRows) / blockSizeInBytes but there is no straight way to create an extent in the same order as it was created originally, because starting lbid is not known for each extent.
Test for different tables schemes, for example:
create table t1 (a int, b varchar (255), c int, d varchar(255), e int, d varchar(255)) engine=columnstore;
and insert 20M rows into table via cpimport.
In this case bulk will create 3 segment file for each int column and 6 for each varchar column (1 segment file with tokens and 1 dictionary file).
|
|
Found bug related to HWM calculation for dictionary files.
|
|
Final version is on review.
Currently it supports 2 extents per segment file, it could be updated if needed, but will require to add one more field in compressed header.
|
|
4QA. Plz msg me or denis0x0D when you are ready to test the tool so I can explain how does it work.
|
|
dleeyh
You can try any latest build from develop branch for testing rpm-based platforms. If you want to test others, you can use this one:
https://cspkg.s3.amazonaws.com/index.html?prefix=develop/pull_request/2141/amd64/
Also it will be available from regular cron builds since tomorrow I suppose.
|
|
Reopened pending for requirement discussion by management.
|
|
ExtentsPerSegmentFile controls the number of extents per segment file so it doesn't affect Dictionary files. We should just remove the setting ExtentsPerSegmentFile from the default Columnstore.xml shiped with the package.
BTW All this activity crosses the scope of this project and must be done outside of this issue.
|
|
gdorman After discussion with Denis I will answer the 4th question. The effort is minimal since we just need to remove the option from the default config file shiped.
|
|
The next commentary is a developer note of how to overcome the limitation described by Denis previously.
There are two fixed extents descriptors that are added to every Segment and Dictionary file. The descriptor tells about initial LBID and number of blocks in the extent and its purpose is to map Segment files with Tokens and Dictionaries. The number of extents in a Dictionary is dynamic though. We extend the compressed Dictionary with a dynamic sized section. After the mandatory two extent descriptors there will be a number of the following descriptors and a set of these additional descriptors.
|
|
Then don’t. Keep for 6.1.
|
|
Build tested: 6.1.1 ( Drone #2576 )
[centos8:root~]# mcsRebuildEM -v
The launch of mcsRebuildEM tool must be sanctioned by MariaDB support.
Requirement: all DBRoots must be on this node.
Note: that the launch can break the cluster.
Do you want to continue Y/N?
Build tested: 6.1.1 ( Drone #2576 )
Performed a test on a 3-node cluster with local storage and did not received any error/warning indicating that such configuration is not supported. I noticed there is a PR (#1884) was declined.
Test #1, without -v option
[centos8:root~]# mcsRebuildEM
|
The launch of mcsRebuildEM tool must be sanctioned by MariaDB support.
|
Requirement: all DBRoots must be on this node.
|
Note: that the launch can break the cluster.
|
Do you want to continue Y/N?
|
Y
|
I have few concerns for the maturity of the tool.
1. The message "Note: that the launch can break the cluster." is quite alarming. Breaking a cluster is too serious that user or support team would not want to continue. The tool should be mature enough for the user to move on with confidence.
2. "Requirement: all DBRoots must be on this node. ". The user should not need to find out if all DBroots are on this node. The tool should determine where the data is. If it does not meet the requirement, it should exit with an appropriate message.
3. "Do you want to continue Y/N? ". The user's reply should be on the same line
4. After answering with an "Y", the tool simply ended. It must return a message indicating if the run was successful or not. If successful, it should print out the BRM file that was generated, with directory path.
5. For large database, the tool may take a while to run. For each dbroot, the tool should print out few steps of the BRM building process, prefixed with system time stamp. Such information will be helpful for support engineers should the run failed.
6. The tool should check if the ColumnStore cluster is active. If yes, then exit out immediately.
[centos8:root~]# mcsRebuildEM
|
The launch of mcsRebuildEM tool must be sanctioned by MariaDB support.
|
Requirement: all DBRoots must be on this node.
|
Note: that the launch can break the cluster.
|
Do you want to continue Y/N?
|
Y
|
/var/lib/columnstore/data1/systemFiles/dbrm/BRM_saves_em file exists.
|
Please note: this tool is only suitable in situations where there is no `BRM_saves_em` file.
|
If `BRM_saves_em` exists extent map will be restored from it.
|
7. The tool should check for the existence of the BRM_saves_em file at the beginning of the run. If exists, it should exit immediately without user interaction.
BRM_saved_em only when the run is successful.
Test #2, with -v option
[centos8:root~]# mcsRebuildEM -v
|
The launch of mcsRebuildEM tool must be sanctioned by MariaDB support.
|
Requirement: all DBRoots must be on this node.
|
Note: that the launch can break the cluster.
|
Do you want to continue Y/N?
|
Y
|
Initialize system extents from the initial state
|
Collect extents for the DBRoot /var/lib/columnstore/data1
|
Cannot read file header from the file /var/lib/columnstore/data1/000.dir/000.dir/008.dir/019.dir/000.dir/FILE000.cdf, probably this file was created without compression.
|
Cannot read file header from the file /var/lib/columnstore/data1/000.dir/000.dir/008.dir/013.dir/000.dir/FILE000.cdf, probably this file was created without compression.
|
Cannot read file header from the file /var/lib/columnstore/data1/000.dir/000.dir/008.dir/016.dir/000.dir/FILE000.cdf, probably this file was created without compression.
|
Cannot read file header from the file /var/lib/columnstore/data1/000.dir/000.dir/008.dir/028.dir/000.dir/FILE000.cdf, probably this file was created without compression.
|
Cannot read file header from the file /var/lib/columnstore/data1/000.dir/000.dir/008.dir/025.dir/000.dir/FILE000.cdf, probably this file was created without compression.
|
Cannot read file header from the file /var/lib/columnstore/data1/000.dir/000.dir/008.dir/022.dir/000.dir/FILE000.cdf, probably this file was created without compression.
|
Cannot read file header from
|
The above messages are always outputted. Engineer explained that there are system catalog files and they are not compressed. Since system catalog files are not compressed by design, this is an expected behavior. The tool should not output these messages. These messages should only apply to user data files that are not compressed. Please do not simply suppress such messages, only the ones for the system catalog files.
|
|
This is a certain step towards a generic MCOL-312.
|