[MCOL-3574] EMPTY column for persistant storage Created: 2019-10-24  Updated: 2023-10-25  Resolved: 2023-10-25

Status: Closed
Project: MariaDB ColumnStore
Component/s: DDLProc
Affects Version/s: None
Fix Version/s: 23.10

Type: New Feature Priority: Major
Reporter: Andrew Hutchings (Inactive) Assignee: Gagan Goel (Inactive)
Resolution: Fixed Votes: 0
Labels: Compatibility

Issue Links:
Blocks
blocks MCOL-4648 CAST(UBIGINTNULL_inWideDecimal AS UNS... Closed
PartOf
is part of MCOL-4343 umbrella for tech debt issues Open
Problem/Incident
causes MCOL-4403 Trailing spaces not behaving as InnoDB Closed
causes MCOL-4577 VARCHAR(2) NOT NULL: explicit empty s... Closed
causes MCOL-4578 CHAR(2) NOT NULL: Empty string or SPA... Closed
causes MCOL-4579 CHAR(2) NULL: empty strings or SPACE(... Closed
Relates
relates to MCOL-5021 Implement an auxiliary (hidden) colum... Closed
Epic Link: ColumnStore Compatibility Improvements

 Description   

MCS 6.3.1 has a number(column type dependent) special column values(magic values) for empty records. This allows in-place DELETE-s but slows down DELETE-s a lot b/c MCS needs to write magics into all columns for a record deleted.
This project describes another approach with an additional auxiliary column(every table will have such column) that stores EMPTY attribute as its Nth bit. With such record attribute MCS can:

  • improve INSERT cache speed. MCS inserts/marks the record deleted until commit arrives.
  • improve DELETE speed/user experience. MCS journals/changes a single-bit per-record and not N * columns number per record.
  • improve UPDATE speed. MCS becomes append-only so that the whole UPDATE will become an INSERT + DELETE operation.

These are immediate benefits. There are some potential positive side-effects, e.g. MCS should be able to cope with medium OLTP workload that it can't do previously so that OLTP to OLAP replication can be done.

The first cut of the implementation plan consists of a number of steps:

  • existing installations migration planning(this will affect the feature implementation)
  • hidden aux column creation running CREATE TABLE in DDL or plugin. The aux column data type empty value signifies empty record so that EMPTY value code can be reused for this column.
  • basic scanning support in EM or PP
  • basic INSERT support(a record is on or off by default) in WE
  • basic bulk insert support in WE
  • basic DELETE support via an aux column) in WE

Here are some notes on the bullets above.
Migration can be seamless or with additional upgrade downtime. The first approach reduces upgrade downtime but the performance will be potentially degraded b/c MCS now needs to detect whether aux column exists or not and use specific code for reads/inserts/deletes/updates. Developer's needs also to maintain both processing paths that adds complexity. The second migration approach is to run a binary or update routine in the cluster that will add/populate aux column for every table. We should provide customers with ways to rollback update changes on per-table basis.
I presume that at any moment MCS knows table OID so we can make aux column OID so column OID is table OID + 1.
Basic scanning support can be implemented adding aux column filter for any disk-based scan so there is no need to change actual scanning code.
There are two options changing INSERT path. The most simple is to avoid any changes so that MCS WE paths will work with the table + additional column with its EMPTY magic. The more elaborate is to reduce the number of blocks we put into a VersionBuffer cache to one, namely aux column only. If ROLLBACK fires MCS needs to restore only aux column block but not all other columns' blocks. The best thing about this particular part is that VBBM and VSS will be used for aux column only but not for all table columns.
Bulk insert operation must extend aux column and populate it with EMPTY values.
DELETE operation will flip empty record attribute bit value.


Generated at Thu Feb 08 02:43:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.