[MCOL-4218] Remove NFS/EFS/ Filestore prereq for S3 storage Created: 2020-07-29  Updated: 2023-12-21

Status: Stalled
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.5.3
Fix Version/s: 23.10

Type: New Feature Priority: Blocker
Reporter: Todd Stoffel (Inactive) Assignee: Denis Khalikov
Resolution: Unresolved Votes: 3
Labels: rm_stability

Issue Links:
Blocks
is blocked by MCOL-5201 Process crash recovery Closed
Relates
relates to MCOL-5157 Remove Unnecessary Message about Fail... Closed
relates to MCOL-5158 MCOL-4939 Causes Regression With Dock... Closed
Sub-Tasks:
Key
Summary
Type
Status
Assignee
MCOL-4544 Design and implement run-time sharing... Sub-Task Stalled Denis Khalikov  
MCOL-4545 Generate metadata from S3 data. Sub-Task Needs Feedback Denis Khalikov  
Epic Link: ColumnStore Failover Improvements
Sprint: 2021-3, 2021-4, 2021-5, 2021-6, 2021-7, 2021-8, 2021-9, 2021-10, 2021-11, 2021-12, 2021-16, 2021-17, 2022-22, 2022-23, 2023-4, 2023-5, 2023-6, 2023-7, 2023-8, 2023-10, 2023-11, 2023-12

 Description   

When using storagemanager/S3 with Columnstore we want to avoid requiring additional third party hardware (NFS) or software (GlusterFS) for HA.

The subject is touched upon (without proposing a resolution) in https://docs.google.com/document/d/1USO3iXosBIv-jFOQNd820KSXdNPkOnlGDOPcfRkw1rA/edit#heading=h.2bu0ywfefwgs

Previous discussions centered around run-time synchronization of this object using plain network (like we do for Extent map). Another idea was offered recently (see below).

An adjacent part of the effort has to be "reconstruction from data" in case of crash.

++++++++++++++++++++++++++++++++++ proposal from Todd +++++++++++++++++++++++++++++++++++++++++++++++++
Maybe we should consider storing the metadata in a system database table. The current system uses files that are simple JSON and we already have JSON functionality within the server that we can take advantage of this. For example:

CREATE TABLE `columnstore_info`.`columnstore_meta` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `node` varchar(6) NOT NULL DEFAULT '',
  `path` varchar(128) NOT NULL DEFAULT '',
  `name` varchar(128) NOT NULL DEFAULT '',
  `metadata` longtext NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=Aria AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb4 PAGE_CHECKSUM=1;

Sample data here:

MariaDB [(none)]> select * from columnstore_info.columnstore_meta\G
*************************** 1. row ***************************
      id: 1
    node: data1
    path: systemFiles/dbrm
    name: tablelocks.meta
metadata: {
    "version": "1",
    "revision": "1",
    "objects":
    [
        {
            "offset": "0",
            "length": "4",
            "key": "c22873a7-77ef-4f95-b29e-de0a3a06145b_0_4_data1~systemFiles~dbrm~tablelocks"
        }
    ]
}
*************************** 2. row ***************************
      id: 2
    node: data1
    path: 000.dir/000.dir/003.dir/233.dir/000.dir
    name: FILE000.cdf.meta
metadata: {
    "version": "1",
    "revision": "1",
    "objects":
    [
        {
            "offset": "0",
            "length": "2097152",
            "key": "d1250ef6-c4ae-4efb-b1a3-784c98f1f230_0_2097152_data1~000.dir~000.dir~003.dir~233.dir~000.dir~FILE000.cdf"
        }
    ]
}
2 rows in set (0.000 sec)

MariaDB JSON functions handle all the CRUD functionality that we would need.

MariaDB [(none)]> SELECT node, path, name, json_value(metadata,'$.objects[0].key') as `key` from columnstore_info.columnstore_meta;
+-------+-----------------------------------------+------------------+----------------------------------------------------------------------------------------------------------+
| node  | path                                    | name             | key                                                                                                   |
+-------+-----------------------------------------+------------------+----------------------------------------------------------------------------------------------------------+
| data1 | systemFiles/dbrm                        | tablelocks.meta  | c22873a7-77ef-4f95-b29e-de0a3a06145b_0_4_data1~systemFiles~dbrm~tablelocks                               |
| data1 | 000.dir/000.dir/003.dir/233.dir/000.dir | FILE000.cdf.meta | d1250ef6-c4ae-4efb-b1a3-784c98f1f230_0_2097152_data1~000.dir~000.dir~003.dir~233.dir~000.dir~FILE000.cdf |
+-------+-----------------------------------------+------------------+----------------------------------------------------------------------------------------------------------+
2 rows in set (0.000 sec)

Instead of writing to files on disk, we should be sticking this right into the database. When written on the primary, this could be synced to the replicas via normal binlog traffic. (Same as our DDL)


Generated at Thu Feb 08 02:48:41 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.