Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
None
Description
Use named MariaDB dynamic columns to provide access to Cassandra's columns.
Basic idea:
- A row in Cassandra's column family may have an arbitrary set of columns
- In case of what Cassandra calls "static column families", columns are mapped to SQL columns
- In case of what Cassandra calls "dynamic column families", there is no apparent way to access the columns.
The idea is to access them through dynamic columns. Table declaration should look like this:
create table cassandra_tbl
|
(
|
rowkey type PRIMARY KEY,
|
column1 type, -- a "static" column. Its name matches the name of Cassandra's static column
|
...
|
dynamic_cols VARBINARY(N) dynamic_column_blob -- a blob that contains all dynamic columns, packed.
|
) engine=cassandra
|
In the above definition
- dynamic_cols - is a name that can be arbitrary
- VARBINARY(N) - is a datatype that's appropriate for storing dynamic column blobs.
- dynamic_column_blob is a column property which specifies that this column is a container for all dynamic columns. Each ha_cassandra table may have only one such column
== Implementation ==
See http://kb.askmonty.org/en/hbase-storage-engine/#342-dynamic-columns-optimizations for basic ideas.
Comments from SergeiG: the sane approach to implement these optimizations is through query-wide item pushdown. That is, instead of handler->cond_push(), the server should try pushing down all parts of the query.
== Dynamic Columns and Datatypes ==
Dynamic columns store values together with their datatypes. Cassandra's
columns in dynamic family may, or may not have an assigned datatype.
=== Cassandra-to-MariaDB ===
When Cassandra columns are converted into DynamicColumnsBlob:
- If Cassandra column has a specified datatype, then Cassandra's datatype is
mapped into SQL datatype according to the table at
http://kb.askmonty.org/en/cassandra-storage-engine/#mapping-for-datatypes.
Then, SQL datatype is mapped to dynamic column type:
(todo: write the table)
- If Casandra column has no specified datatype, we assume it to be 'blob', and
then the conversion happens according to the rules specified above.
==== Conversion for timestamp type ====
Cassandra's timestamp type is a 8-byte integer specifying milliseconds-since-epoch.
Dynamic columns code supports "DATETIME" type, which is stored with microsecond-precision. However, this type represents a "local" time in unknown timezone, so there is no 1<->1 mapping between DATETIME values and TIMESTAMP time.
A suggestion to get out of this is to map Cassandra's timestamp to a BIGINT type. In order to get SQL datetime, one will have to use this syntax:
select FROM_UNIXTIME(COLUMN_GET(dynamic_blob, 'column_name') / 1000) from cassandra_table;
|
==== Conversion for decimal type ====
Cassandra's decimal type will be mapped to its string representation.
=== MariaDB-to-Cassandra ===
The task is:
Given a DynamicColumnsBlob value, store the columns into Cassandra
DynamicColumnsBlob specifies datatypes. Cassandra has its own set of datatypes.
Inside MySQL, there are three "primary" datatypes:
- double
- longlong
- String.
The idea is to extend DYNAMIC_COLUMN_VALUE with functions to get either of three types.
Then, Cassandra's destination type will invoke the most appropriate function, get a double/longlong/String value, and convert it to Cassandra's representation.
For example, Cassandra's BIGINT will invoke the val_int() function. Cassandra's UUID will invoke val_str() and then parse the uuid.
=== TODO ===
- Verify that we can get default validation class (datatype of dynamic columns) from Cassandra
- Write some example queries for the above and check if that is acceptable,