[MCOL-680] cpimport not creating clean (new) partitions Created: 2017-04-24  Updated: 2017-11-27  Resolved: 2017-11-27

Status: Closed
Project: MariaDB ColumnStore
Component/s: cpimport
Affects Version/s: 1.0.8
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Allan Assignee: Andrew Hutchings (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

centos 7; XFS file system


Attachments: Zip Archive PartitionIssue.zip    
Issue Links:
PartOf
is part of MCOL-685 support dropping extents in addition ... Closed
is part of MCOL-700 Failure of clientrotator.cpp causes D... Closed

 Description   

Under Centos 6 when loading data with cpimport the successfully loaded data occupied a fraction of the (apparently) last partition and added new partitions. Under Centos 7 with the XFS file system it appears to interleave the new data with the old. This make calDropPartitionByValue not work and also produces the error:

Internal error: IDB-2040: No partition found for the given value range

Yet when you show partitions, they are there.



 Comments   
Comment by David Thompson (Inactive) [ 2017-04-25 ]

Thanks, i think i can reproduce this. What i think is happening is that the partition id parts are not being made unique causing the partition procs to get confused when referencing partitions.

rm -f jan.tbl
for i in {1..31}; do 
   for j in {1..10}; do echo "2017-01-$(printf %02d $i)|$j"; done > t0.tbl
   for k in {1..5}; do  
     cat t0.tbl t0.tbl t0.tbl t0.tbl t0.tbl t0.tbl t0.tbl t0.tbl t0.tbl t0.tbl  > t1.tbl 
     rm -f t0.tbl; cp t1.tbl t0.tbl
   done
   cat t0.tbl >> jan.tbl; rm -f t0.tbl t1.tbl
done
 
rm -f feb.tbl
for i in {1..28}; do 
   for j in {1..10}; do echo "2017-02-$(printf %02d $i)|$j"; done > t0.tbl
   for k in {1..5}; do  
     cat t0.tbl t0.tbl t0.tbl t0.tbl t0.tbl t0.tbl t0.tbl t0.tbl t0.tbl t0.tbl  > t1.tbl 
     rm -f t0.tbl; cp t1.tbl t0.tbl
   done
   cat t0.tbl >> feb.tbl; rm -f t0.tbl t1.tbl
done
 
 
CREATE TABLE `t1` (
  `dt` date DEFAULT NULL,
  `i` int(11) DEFAULT NULL
) ENGINE=Columnstore;
 
cpimport test t1 jan.tbl
cpimport test t1 feb.tbl
 
MariaDB [test]> select calshowpartitions('t1', 'dt')\G
*************************** 1. row ***************************
calshowpartitions('t1', 'dt'): Part#     Min                           Max                           Status
  0.0.1     2017-01-01                    2017-02-11                    Enabled
  0.1.1     2017-01-09                    2017-02-20                    Enabled
  0.2.1     2017-01-17                    2017-02-28                    Enabled
  0.3.1     2017-01-26                    2017-02-28                    Enabled
1 row in set (0.01 sec)
 
MariaDB [test]> exit
Bye
[root@centos ~]# /usr/local/mariadb/columnstore/bin/editem -o 3060 -t
Col OID = 3060, NumExtents = 8, width = 4
1072128 - 1076223 (4096) min: 2017-01-01, max: 2017-01-09, seqNum: 1, state: valid, fbo: 0, DBRoot: 1, part#: 0, seg#: 0, HWM: 0; status: avail
1080320 - 1084415 (4096) min: 2017-01-09, max: 2017-01-17, seqNum: 1, state: valid, fbo: 0, DBRoot: 1, part#: 0, seg#: 1, HWM: 0; status: avail
1088512 - 1092607 (4096) min: 2017-01-17, max: 2017-01-26, seqNum: 1, state: valid, fbo: 0, DBRoot: 1, part#: 0, seg#: 2, HWM: 0; status: avail
1096704 - 1100799 (4096) min: 2017-01-26, max: 2017-02-03, seqNum: 2, state: valid, fbo: 0, DBRoot: 1, part#: 0, seg#: 3, HWM: 0; status: avail
1104896 - 1108991 (4096) min: 2017-02-03, max: 2017-02-11, seqNum: 1, state: valid, fbo: 4096, DBRoot: 1, part#: 0, seg#: 0, HWM: 8191; status: avail
1113088 - 1117183 (4096) min: 2017-02-11, max: 2017-02-20, seqNum: 1, state: valid, fbo: 4096, DBRoot: 1, part#: 0, seg#: 1, HWM: 8191; status: avail
1121280 - 1125375 (4096) min: 2017-02-20, max: 2017-02-28, seqNum: 1, state: valid, fbo: 4096, DBRoot: 1, part#: 0, seg#: 2, HWM: 8191; status: avail
1129472 - 1133567 (4096) min: 2017-02-28, max: 2017-02-28, seqNum: 1, state: valid, fbo: 4096, DBRoot: 1, part#: 0, seg#: 3, HWM: 4233; status: avail

as you can see there are 8 entries in the extent map but 4 (with overlapping date values).

Comment by Andrew Hutchings (Inactive) [ 2017-04-25 ]

OK. I think this behaviour is correct and I'll try my best to explain why (warning: some of the terms used here are pretty overloaded, particularly 'partition' and 'segment')...

ColumnStore's algorithm for filling extents goes as follows, I'm assuming single server here as it is easier to explain:

1. Make 4 segments for the extent
2. Make a second segment for each segment file from part 1
3. Make a new partition

Step 2 is the interesting part here, it adds a second extent segment to each file. When using information_schema.columnstore_extents you can see this effect:

MariaDB [test]> select * from information_schema.columnstore_extents where object_id=15435\G
*************************** 1. row ***************************
          OBJECT_ID: 15435
        OBJECT_TYPE: Column
LOGICAL_BLOCK_START: 20444160
  LOGICAL_BLOCK_END: 20448255
          MIN_VALUE: 132190334
          MAX_VALUE: 132190846
              WIDTH: 4
             DBROOT: 1
       PARTITION_ID: 0
         SEGMENT_ID: 0
       BLOCK_OFFSET: 0
         MAX_BLOCKS: 4096
    HIGH_WATER_MARK: 0
              STATE: Valid
             STATUS: Available
          DATA_SIZE: 8192
*************************** 2. row ***************************
          OBJECT_ID: 15435
        OBJECT_TYPE: Column
LOGICAL_BLOCK_START: 20452352
  LOGICAL_BLOCK_END: 20456447
          MIN_VALUE: 132190846
          MAX_VALUE: 132191358
              WIDTH: 4
             DBROOT: 1
       PARTITION_ID: 0
         SEGMENT_ID: 1
       BLOCK_OFFSET: 0
         MAX_BLOCKS: 4096
    HIGH_WATER_MARK: 0
              STATE: Valid
             STATUS: Available
          DATA_SIZE: 8192
*************************** 3. row ***************************
          OBJECT_ID: 15435
        OBJECT_TYPE: Column
LOGICAL_BLOCK_START: 20460544
  LOGICAL_BLOCK_END: 20464639
          MIN_VALUE: 132191358
          MAX_VALUE: 132191934
              WIDTH: 4
             DBROOT: 1
       PARTITION_ID: 0
         SEGMENT_ID: 2
       BLOCK_OFFSET: 0
         MAX_BLOCKS: 4096
    HIGH_WATER_MARK: 0
              STATE: Valid
             STATUS: Available
          DATA_SIZE: 8192
*************************** 4. row ***************************
          OBJECT_ID: 15435
        OBJECT_TYPE: Column
LOGICAL_BLOCK_START: 20468736
  LOGICAL_BLOCK_END: 20472831
          MIN_VALUE: 132191934
          MAX_VALUE: 132194558
              WIDTH: 4
             DBROOT: 1
       PARTITION_ID: 0
         SEGMENT_ID: 3
       BLOCK_OFFSET: 0
         MAX_BLOCKS: 4096
    HIGH_WATER_MARK: 0
              STATE: Valid
             STATUS: Available
          DATA_SIZE: 8192
*************************** 5. row ***************************
          OBJECT_ID: 15435
        OBJECT_TYPE: Column
LOGICAL_BLOCK_START: 20476928
  LOGICAL_BLOCK_END: 20481023
          MIN_VALUE: 132194558
          MAX_VALUE: 132195070
              WIDTH: 4
             DBROOT: 1
       PARTITION_ID: 0
         SEGMENT_ID: 0
       BLOCK_OFFSET: 4096
         MAX_BLOCKS: 4096
    HIGH_WATER_MARK: 8191
              STATE: Valid
             STATUS: Available
          DATA_SIZE: 33554432
*************************** 6. row ***************************
          OBJECT_ID: 15435
        OBJECT_TYPE: Column
LOGICAL_BLOCK_START: 20485120
  LOGICAL_BLOCK_END: 20489215
          MIN_VALUE: 132195070
          MAX_VALUE: 132195646
              WIDTH: 4
             DBROOT: 1
       PARTITION_ID: 0
         SEGMENT_ID: 1
       BLOCK_OFFSET: 4096
         MAX_BLOCKS: 4096
    HIGH_WATER_MARK: 8191
              STATE: Valid
             STATUS: Available
          DATA_SIZE: 33554432
*************************** 7. row ***************************
          OBJECT_ID: 15435
        OBJECT_TYPE: Column
LOGICAL_BLOCK_START: 20493312
  LOGICAL_BLOCK_END: 20497407
          MIN_VALUE: 132195646
          MAX_VALUE: 132196158
              WIDTH: 4
             DBROOT: 1
       PARTITION_ID: 0
         SEGMENT_ID: 2
       BLOCK_OFFSET: 4096
         MAX_BLOCKS: 4096
    HIGH_WATER_MARK: 8191
              STATE: Valid
             STATUS: Available
          DATA_SIZE: 33554432
*************************** 8. row ***************************
          OBJECT_ID: 15435
        OBJECT_TYPE: Column
LOGICAL_BLOCK_START: 20501504
  LOGICAL_BLOCK_END: 20505599
          MIN_VALUE: 132196158
          MAX_VALUE: 132196158
              WIDTH: 4
             DBROOT: 1
       PARTITION_ID: 0
         SEGMENT_ID: 3
       BLOCK_OFFSET: 4096
         MAX_BLOCKS: 4096
    HIGH_WATER_MARK: 4233
              STATE: Valid
             STATUS: Available
          DATA_SIZE: 1130496
8 rows in set (0.18 sec)

When the block offset is 4096 that means it starts at block 4096 on the previous file.

Now, the two segments for each segment ID are stored in the same file and the partition functions are based on segment files. The IDs in the partition functions are in the format partition_id.segment_id.dbroot. It cannot distinguish between the two segments in the segment file and effectively merges the min/max for the purposes of these functions.

So, to summarise the sequence when filling extents is:

segment 0
segment 1
segment 2
segment 3
segment 0 with offset
segment 1 with offset
segment 2 with offset
segment 3 with offset
partition 1 segment 0
...

As far as the partition functions go, this will look like a full partition gets appended because it works at the file level and a segment file can contain 2 segments.

I hope this helps. Please let me know if any of this needs explaining in more detail.

Comment by David Thompson (Inactive) [ 2017-04-25 ]

I've updated the partition document to clarify this: https://mariadb.com/kb/en/mariadb/columnstore-partition-management

The behavior still seems wrong to me but need to think through why it was done this way and if we are losing something with this.

Comment by Allan [ 2017-04-25 ]

I agree that the behavior is wrong and this is why it impacts us in a very large way.

We load 100 million records or more every day (for just one machine) and we also need to age out old records. Because the data loaded 'today' by cpimport is not distinguished by partition we can't do a drop partition to get rid of 'old' data. We have to result to a MySQL DELETE statement that is 3 orders of magnitude slower, literally. We did not have this problem under InfiniDB

Comment by Allan [ 2017-04-25 ]

The second issue in this report is the error message (which was never addressed):

Internal error: IDB-2040: No partition found for the given value range

What's going on with this.

Comment by Andrew Hutchings (Inactive) [ 2017-04-25 ]

Sorry, I missed that part with my initial message. I've noticed you have opened a new bug report for this (MCOL-685) and it will be addressed there.

Comment by Andrew Hutchings (Inactive) [ 2017-04-28 ]

Another suggestion by Allan in MCOL-685:

"I suggest that if bulk loading is going to fill the extents of partitions that some bulk delete/drop method be made available to function like calDropPartitionByValue. Perhaps calDropExtentByValue"

We could do this once MCOL-498 is implemented using ftruncate(). If we are removing the second extent in a segment file then mmap/memmove/munmap followed by ftruncate() is probably the fastest way.

Alternatively we could re-use extents marked as free.

Comment by David Thompson (Inactive) [ 2017-11-27 ]

Have changed MCOL-685 into in an improvement for dropping by extents

Generated at Thu Feb 08 02:23:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.