Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-506

Cassandra dynamic columns access

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • None
    • None
    • None
    • None

    Description

      Use named MariaDB dynamic columns to provide access to Cassandra's columns.

      Basic idea:

      • A row in Cassandra's column family may have an arbitrary set of columns
      • In case of what Cassandra calls "static column families", columns are mapped to SQL columns
      • In case of what Cassandra calls "dynamic column families", there is no apparent way to access the columns.

      The idea is to access them through dynamic columns. Table declaration should look like this:

      create table cassandra_tbl
      (
        rowkey  type PRIMARY KEY,
        column1 type,                -- a "static" column. Its name matches the name of Cassandra's static column
        ...
        dynamic_cols VARBINARY(N) dynamic_column_blob  -- a blob that contains all dynamic columns, packed.
      ) engine=cassandra

      In the above definition

      1. dynamic_cols - is a name that can be arbitrary
      2. VARBINARY(N) - is a datatype that's appropriate for storing dynamic column blobs.
      3. dynamic_column_blob is a column property which specifies that this column is a container for all dynamic columns. Each ha_cassandra table may have only one such column

      == Implementation ==
      See http://kb.askmonty.org/en/hbase-storage-engine/#342-dynamic-columns-optimizations for basic ideas.

      Comments from SergeiG: the sane approach to implement these optimizations is through query-wide item pushdown. That is, instead of handler->cond_push(), the server should try pushing down all parts of the query.

      == Dynamic Columns and Datatypes ==

      Dynamic columns store values together with their datatypes. Cassandra's
      columns in dynamic family may, or may not have an assigned datatype.

      === Cassandra-to-MariaDB ===

      When Cassandra columns are converted into DynamicColumnsBlob:

      • If Casandra column has no specified datatype, we assume it to be 'blob', and
        then the conversion happens according to the rules specified above.

      ==== Conversion for timestamp type ====
      Cassandra's timestamp type is a 8-byte integer specifying milliseconds-since-epoch.
      Dynamic columns code supports "DATETIME" type, which is stored with microsecond-precision. However, this type represents a "local" time in unknown timezone, so there is no 1<->1 mapping between DATETIME values and TIMESTAMP time.

      A suggestion to get out of this is to map Cassandra's timestamp to a BIGINT type. In order to get SQL datetime, one will have to use this syntax:

      select FROM_UNIXTIME(COLUMN_GET(dynamic_blob, 'column_name') / 1000) from cassandra_table;

      ==== Conversion for decimal type ====
      Cassandra's decimal type will be mapped to its string representation.

      === MariaDB-to-Cassandra ===

      The task is:

      Given a DynamicColumnsBlob value, store the columns into Cassandra

      DynamicColumnsBlob specifies datatypes. Cassandra has its own set of datatypes.
      Inside MySQL, there are three "primary" datatypes:

      • double
      • longlong
      • String.

      The idea is to extend DYNAMIC_COLUMN_VALUE with functions to get either of three types.
      Then, Cassandra's destination type will invoke the most appropriate function, get a double/longlong/String value, and convert it to Cassandra's representation.

      For example, Cassandra's BIGINT will invoke the val_int() function. Cassandra's UUID will invoke val_str() and then parse the uuid.

      === TODO ===

      • Verify that we can get default validation class (datatype of dynamic columns) from Cassandra
      • Write some example queries for the above and check if that is acceptable,

      Attachments

        Issue Links

          Activity

            People

              sanja Oleksandr Byelkin
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.