[MCOL-5005] Add charset number to system catalog Created: 2022-03-03  Updated: 2023-09-22  Resolved: 2023-08-16

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: None
Fix Version/s: 23.10.0

Type: New Feature Priority: Major
Reporter: Sergey Zefirov Assignee: Gagan Goel (Inactive)
Resolution: Fixed Votes: 0
Labels: rm_infra

Issue Links:
Blocks
blocks MCOL-4580 Extent's approximate range keeping fo... In Testing
blocks MCOL-4931 Make cpimport charset aware Closed
Relates
relates to MCOL-5162 dbbuilder will support syscat upgrades Closed
Sprint: 2023-8
Assigned for Review: Roman Roman
Assigned for Testing: Daniel Lee Daniel Lee (Inactive)

 Description   

When MCOL-4580 was started it is believed that information given by MariaDB's server is sufficient for our purposes. But, it is not.

INSERT operations, right now, are given charset numbers but not actual collation. And if collations for database differ, we get wrong ranges in MCOL-4580 (linked) which leads to incorrect SELECT results.

The cpimport utility does not have collation information at all and it should receive it from system catalog queries.

All-in-all, we need to store in system catalog either pair (charset index, collation index) or just collation index (it appears it is possible to deduce charset index from it) and provide API to fetch these.

[Note for the documentation team]:
Upgrade instructions from previous versions of ColumnStore to a version with this feature:
dbbuilder should perform an automatic upgrade of the system catalog. When the package manager starts the ColumnStore services using systemd during installation, dbbuilder is executed.
After the upgrade, sanity testing should be performed on the installation to ensure the existing database is not impacted in any way.



 Comments   
Comment by Gagan Goel (Inactive) [ 2023-07-31 ]

For QA:

As part of MCOL-5021 last year, we added a new column to calpontsys.systable system catalog table: auxcolumnoid. That means, existing ColumnStore installations went through an automatic system catalog upgrade to support the newly added column. This automatic system catalog upgrade was added in MCOL-5162.

As part of this MCOL, we are adding a new column to calpontsys.syscolumn system catalog table: charsetnum. So, similar to MCOL-5162, we again need to perform upgrade testing when an existing installation of ColumnStore is upgraded to a version with the newly added charsetnum column.

Here are the steps:

In an existing installation of ColumnStore (you can use one of the nightly builds of develop branch such as build 8270: https://cspkg.s3.amazonaws.com/index.html?prefix=develop/cron/8270), perform the following operations:

DROP TABLE IF EXISTS t1, t2, t3;
CREATE TABLE t1 (
  a VARCHAR(15),
  b VARCHAR(15),
  c VARCHAR(15),
  d BLOB(15),
  e INT
) ENGINE=columnstore charset latin2;
 
ALTER TABLE t1 ADD COLUMN (f VARCHAR(15));
ALTER TABLE t1 ADD COLUMN (g VARCHAR(15));
 
insert into t1(a, b, c, e) values ("abc", "bcd", "cde", 123);
                                                                                                                                                                                     
CREATE TABLE t2 ENGINE=columnstore AS SELECT * FROM t1;
 
CREATE TABLE t3 LIKE t1;     
insert into t3(a, b, c, e) values ("abcd", "bcde", "cdef", 1234);
 
SELECT * FROM t1;
SELECT * FROM t2;
SELECT * FROM t3;
 
SHOW CREATE TABLE t1;
SHOW CREATE TABLE t2;
SHOW CREATE TABLE t3;
 
SELECT * FROM calpontsys.syscolumn;

You should notice the last column in calpontsys.syscolumn should be nextvalue.

Now, upgrade (remove the existing ColumnStore package and install the new package with the existing data as is) to the version of ColumnStore that contains the charsetnum column in calpontsys.syscolumn. The following PR build 8276 contains this feature: https://cspkg.s3.amazonaws.com/index.html?prefix=develop/pull_request/8276.

After the upgrade, make sure the old tables are not impacted:

SELECT * FROM t1;
SELECT * FROM t2;
SELECT * FROM t3;

Create new tables in the upgraded version of ColumnStore:

DROP TABLE IF EXISTS t1_2, t2_2, t3_2;
CREATE TABLE t1_2 (
  a VARCHAR(15) collate utf8mb4_romanian_ci,
  b VARCHAR(15) charset 'utf8mb3',
  c VARCHAR(15),
  d BLOB(15),
  e INT
) ENGINE=columnstore collate=latin2_croatian_ci;
 
ALTER TABLE t1_2 ADD COLUMN (f VARCHAR(15) collate 'utf8mb4_icelandic_ci');
ALTER TABLE t1_2 ADD COLUMN (g VARCHAR(15));
 
insert into t1_2(a, b, c, e) values ("abc", "bcd", "cde", 123);
 
CREATE TABLE t2_2 ENGINE=columnstore AS SELECT * FROM t1_2;
CREATE TABLE t3_2 LIKE t1_2;
insert into t3_2(a, b, c, e) values ("abcd", "bcde", "cdef", 1234);
 
SELECT * FROM t1_2;
SELECT * FROM t2_2;
SELECT * FROM t3_2;
 
SHOW CREATE TABLE t1_2;
SHOW CREATE TABLE t2_2;
SHOW CREATE TABLE t3_2;

Now run the following system catalog query:

SELECT * FROM calpontsys.syscolumn;

This system catalog query should now show a new column: charsetnum. The value of this column should be 0 for the old tables (t1, t2, t3). The value of this column for the new tables (t1_2, t2_2, t3_2) should be:
1. A positive integer for CHAR/VARCHAR/TEXT columns
2. 63 for BLOB/VARBINARY columns
3. 0 for all other column types (such as numeric columns, etc.)

Comment by Gagan Goel (Inactive) [ 2023-08-01 ]

dleeyh Please read my last comment for testing instructions.

Comment by Daniel Lee (Inactive) [ 2023-08-16 ]

Build verified: develop latest
engine: f55d41c0792c98c760fdb6e69a1ed0aff87aa42b
server: 62d6100a913699fec9ff48284a76bfe6226e70bc
buildNo: 8435

Repeated the above tests on the latest build.

MTR test case columnstore/basic.mcol-5005 failed because MCOL-5519 has been merged. The MTR test case will be updated separately.

columnstore/basic.mcol-5005              [ fail ]
        Test ended at 2023-08-16 14:26:36
 
--- /usr/share/mysql-test/suite/columnstore/basic/r/mcol-5005.result	2023-08-16 14:03:03.371323826 +0000
+++ /usr/share/mysql-test/suite/columnstore/basic/r/mcol-5005.reject	2023-08-16 14:26:36.391103227 +0000
@@ -34,13 +34,13 @@
 Table	Create Table
 t2	CREATE TABLE `t2` (
   `a` varchar(15) CHARACTER SET utf8mb4 COLLATE utf8mb4_romanian_ci DEFAULT NULL,
-  `b` varchar(15) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci DEFAULT NULL,
+  `b` varchar(15) DEFAULT NULL,
   `c` varchar(15) CHARACTER SET latin2 COLLATE latin2_croatian_ci DEFAULT NULL,
   `d` tinyblob DEFAULT NULL,
   `e` int(11) DEFAULT NULL,
   `f` varchar(15) CHARACTER SET utf8mb4 COLLATE utf8mb4_icelandic_ci DEFAULT NULL,
   `g` varchar(15) CHARACTER SET latin2 COLLATE latin2_croatian_ci DEFAULT NULL
-) ENGINE=Columnstore DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci
+) ENGINE=Columnstore DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci
 SHOW CREATE TABLE t3;
 Table	Create Table
 t3	CREATE TABLE `t3` (
 
mysqltest: Result length mismatch
 
 - saving '/usr/share/mysql-test/var/log/columnstore/basic.mcol-5005/' to '/usr/share/mysql-test/var/log/basic.mcol-5005/'

Generated at Thu Feb 08 02:54:38 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.