[MDEV-122] Data mapping between HBase and SQL - Jira

Sergei Petrunia added a comment - 2012-01-30 13:05

== Data mapping from HBase to SQL ==

Hbase table consists of rows, which are identified by row key. Each row has an
arbitrary (potentially, very large) number of columns. (Looking at HBase
documentation, one can conclude that HBase is specifically targeting two use
cases:

a column family has few columns
a column family has so many columns that "get column by name" operation is
specifically optimized for (TODO: exact link!)

One can see two ways to map that to SQL tables:

=== Per-row mapping ===

Let each row in HBase table be mapped into a row from SQL point of view:

SELECT * FROM hbase_table;

row-id column1 column2
------ ------- -------
row1 data data
row2 data data
row3 data data

This is the most straightforward mapping. However, accessing some of the hbase
data will be difficult. For example, in hbase, it is natural to have this query

given a row-id, get a list of columns in the row (maybe, with their values)

in this mapping, result of this query will be one row, and there is no
convinient way to return arbitrarily-long list of columns on one row.

Table DDL could look like this:

create table hbase_tbl_rows (
row_id binary(MAX_HBASE_ROWID_LEN),
columns blob,
key (row_content)
) engine=hbase_row;

TODO: Does Hbase have MAX_HBASE_ROWID_LEN limit? What is it?

=== Per-cell mapping ===

HBase shell has 'scan' command, here's an example of its output:

hbase(main):007:0> scan 'testtable'
ROW COLUMN+CELL
myrow-1 column=colfam1:q1, timestamp=1297345476469, value=value-1
myrow-2 column=colfam1:q2, timestamp=1297345495663, value=value-2
myrow-2 column=colfam1:q3, timestamp=1297345508999, value=value-3

Here, one HBase row produces multiple rows in the output. Each output row
represents one (row_id, column) combination, so rows with multiple columns
(and multiple revisions of column data) are easily represented.

create table hbase_tbl_cells (
row_id binary(MAX_HBASE_ROWID_LEN),
column_name binary(N),
timestamp TIMESTAMP,
value binary(N)
) engine=hbase_row;

TODO: Does Hbase have MAX_HBASE_ROWID_LEN limit? What is it?

NOTE: It is nice to have table DDL independent of the hbase table. This
saves us from need to synchronize table DDLs between hbase and mysql (which
NDB cluster had to do and they ended up implementing a rather complex
subsystem for it)

== Consistency and transactionality ==

HBase has single-record transactions. Does this mean that HBase engine
will have MyISAM-like characteristics? e.g. if we fail in the middle of
an UPDATE, there is no way to go back?

Sergei Petrunia added a comment - 2012-01-30 13:05 == Data mapping from HBase to SQL == Hbase table consists of rows, which are identified by row key. Each row has an arbitrary (potentially, very large) number of columns. (Looking at HBase documentation, one can conclude that HBase is specifically targeting two use cases: a column family has few columns a column family has so many columns that "get column by name" operation is specifically optimized for (TODO: exact link!) One can see two ways to map that to SQL tables: === Per-row mapping === Let each row in HBase table be mapped into a row from SQL point of view: SELECT * FROM hbase_table; row-id column1 column2 ------ ------- ------- row1 data data row2 data data row3 data data This is the most straightforward mapping. However, accessing some of the hbase data will be difficult. For example, in hbase, it is natural to have this query given a row-id, get a list of columns in the row (maybe, with their values) in this mapping, result of this query will be one row, and there is no convinient way to return arbitrarily-long list of columns on one row. Table DDL could look like this: create table hbase_tbl_rows ( row_id binary(MAX_HBASE_ROWID_LEN), columns blob, key (row_content) ) engine=hbase_row; TODO: Does Hbase have MAX_HBASE_ROWID_LEN limit? What is it? === Per-cell mapping === HBase shell has 'scan' command, here's an example of its output: hbase(main):007:0> scan 'testtable' ROW COLUMN+CELL myrow-1 column=colfam1:q1, timestamp=1297345476469, value=value-1 myrow-2 column=colfam1:q2, timestamp=1297345495663, value=value-2 myrow-2 column=colfam1:q3, timestamp=1297345508999, value=value-3 Here, one HBase row produces multiple rows in the output. Each output row represents one (row_id, column) combination, so rows with multiple columns (and multiple revisions of column data) are easily represented. create table hbase_tbl_cells ( row_id binary(MAX_HBASE_ROWID_LEN), column_name binary(N), timestamp TIMESTAMP, value binary(N) ) engine=hbase_row; TODO: Does Hbase have MAX_HBASE_ROWID_LEN limit? What is it? NOTE: It is nice to have table DDL independent of the hbase table. This saves us from need to synchronize table DDLs between hbase and mysql (which NDB cluster had to do and they ended up implementing a rather complex subsystem for it) == Consistency and transactionality == HBase has single-record transactions. Does this mean that HBase engine will have MyISAM-like characteristics? e.g. if we fail in the middle of an UPDATE, there is no way to go back?

Rasmus Johansson (Inactive) added a comment - 2012-01-30 13:40 - edited

Finalize the description and any questions we might have. Let's then send it for review.

Rasmus Johansson (Inactive) added a comment - 2012-01-30 13:40 - edited Finalize the description and any questions we might have. Let's then send it for review.

Sergei Petrunia added a comment - 2012-01-30 13:59

== Data mapping from HBase to SQL ==

Hbase table consists of rows, which are identified by row key. Each row has an
arbitrary (potentially, very large) number of columns. (Looking at HBase
documentation, one can conclude that HBase is specifically targeting two use
cases:

a column family has few columns
a column family has so many columns that "get column by name" operation is
specifically optimized for (TODO: exact link!)

One can see two ways to map that to SQL tables:

=== Per-row mapping ===

Let each row in HBase table be mapped into a row from SQL point of view:

SELECT * FROM hbase_table;

row-id column1 column2
------ ------- -------
row1 data data
row2 data data
row3 data data

This is the most straightforward mapping. However, accessing some of the hbase
data will be difficult. For example, in hbase, it is natural to have this query

given a row-id, get a list of columns in the row (maybe, with their values)

in this mapping, result of this query will be one row, and there is no
convinient way to return arbitrarily-long list of columns on one row.

Table DDL could look like this:

CREATE TABLE hbase_tbl_rows (
row_id BINARY(MAX_HBASE_ROWID_LEN),
columns BLOB,
PRIMARY KEY (row_id)
) ENGINE=hbase_row;

TODO: Does Hbase have MAX_HBASE_ROWID_LEN limit? What is it?

=== Per-cell mapping ===

HBase shell has 'scan' command, here's an example of its output:

hbase(main):007:0> scan 'testtable'
ROW COLUMN+CELL
myrow-1 column=colfam1:q1, timestamp=1297345476469, value=value-1
myrow-2 column=colfam1:q2, timestamp=1297345495663, value=value-2
myrow-2 column=colfam1:q3, timestamp=1297345508999, value=value-3

Here, one HBase row produces multiple rows in the output. Each output row
represents one (row_id, column) combination, so rows with multiple columns
(and multiple revisions of column data) are easily represented.

CREATE TABLE hbase_tbl_cells (
row_id binary(MAX_HBASE_ROWID_LEN),
column_name binary(MAX_HBASE_NAME_LEN),
timestamp TIMESTAMP,
value BLOB,
PRIMARY KEY (row_id, column_name)
) ENGINE=hbase_cell;

TODO: Does Hbase have MAX_HBASE_ROWID_LEN limit? What is it?

NOTE: It is nice to have SQL table DDLs independent of the content of
the backend hbase table. This saves us from the need to synchronize
table DDLs between hbase and mysql (NDB cluster had to do this and they
have ended up implementing a rather complex system for this).

== Consistency, transactions, etc ==

HBase has single-record transactions. Does this mean that HBase storage
engine will have MyISAM-like characteristics? e.g. if we fail in the
middle of a multi-row UPDATE, there is no way to go back.

How should writes be made? Should UPDATEs/INSERTs use checkAndPut operation
so that they don't overwrite data that's already there?

Are the writes important at all? (e.g. if we've had the first version with
provide read-only access, would that be useful?)

Sergei Petrunia added a comment - 2012-01-30 13:59 == Data mapping from HBase to SQL == Hbase table consists of rows, which are identified by row key. Each row has an arbitrary (potentially, very large) number of columns. (Looking at HBase documentation, one can conclude that HBase is specifically targeting two use cases: a column family has few columns a column family has so many columns that "get column by name" operation is specifically optimized for (TODO: exact link!) One can see two ways to map that to SQL tables: === Per-row mapping === Let each row in HBase table be mapped into a row from SQL point of view: SELECT * FROM hbase_table; row-id column1 column2 ------ ------- ------- row1 data data row2 data data row3 data data This is the most straightforward mapping. However, accessing some of the hbase data will be difficult. For example, in hbase, it is natural to have this query given a row-id, get a list of columns in the row (maybe, with their values) in this mapping, result of this query will be one row, and there is no convinient way to return arbitrarily-long list of columns on one row. Table DDL could look like this: CREATE TABLE hbase_tbl_rows ( row_id BINARY(MAX_HBASE_ROWID_LEN), columns BLOB, PRIMARY KEY (row_id) ) ENGINE=hbase_row; TODO: Does Hbase have MAX_HBASE_ROWID_LEN limit? What is it? === Per-cell mapping === HBase shell has 'scan' command, here's an example of its output: hbase(main):007:0> scan 'testtable' ROW COLUMN+CELL myrow-1 column=colfam1:q1, timestamp=1297345476469, value=value-1 myrow-2 column=colfam1:q2, timestamp=1297345495663, value=value-2 myrow-2 column=colfam1:q3, timestamp=1297345508999, value=value-3 Here, one HBase row produces multiple rows in the output. Each output row represents one (row_id, column) combination, so rows with multiple columns (and multiple revisions of column data) are easily represented. CREATE TABLE hbase_tbl_cells ( row_id binary(MAX_HBASE_ROWID_LEN), column_name binary(MAX_HBASE_NAME_LEN), timestamp TIMESTAMP, value BLOB, PRIMARY KEY (row_id, column_name) ) ENGINE=hbase_cell; TODO: Does Hbase have MAX_HBASE_ROWID_LEN limit? What is it? NOTE: It is nice to have SQL table DDLs independent of the content of the backend hbase table. This saves us from the need to synchronize table DDLs between hbase and mysql (NDB cluster had to do this and they have ended up implementing a rather complex system for this). == Consistency, transactions, etc == HBase has single-record transactions. Does this mean that HBase storage engine will have MyISAM-like characteristics? e.g. if we fail in the middle of a multi-row UPDATE, there is no way to go back. How should writes be made? Should UPDATEs/INSERTs use checkAndPut operation so that they don't overwrite data that's already there? Are the writes important at all? (e.g. if we've had the first version with provide read-only access, would that be useful?)

Sergei Petrunia added a comment - 2012-04-12 20:18

== Results of discussion with Monty ==

As first milestone, implement only Hbase_row<-->MySQL row (see above for DDL of
hbase_tbl_rows). The `columns` blob will contain all of the columns.

=== Partial blob writes/reads ===
Currently, MySQL uses this approach to work with blobs: If any fragment
of the query accesses the blob field, then the storage engine will retrieve
the entire blob and provide it to the SQL layer as a malloc'ed char*. It is not
possible to read/write a part of the blob.

We'll need to extend the storage engine API somehow to accomodate working on
parts of BLOBs.

TODO: how exactly will the optimizer/hbase-se recognize that we need to readd
only some particular column? Hook into table condition pushdown?

=== Need column names for Dynamic columns ===

Dynamic columns currently identify columns by numbers. HBase identifies them by
string names.

Two possible approaches

Genuinely support arbitrary column names
Store names in .frm file << TODO: Monty mentioned this has some advantage
but I don't have access to where I've written this down.

Sergei Petrunia added a comment - 2012-04-12 20:18 == Results of discussion with Monty == As first milestone, implement only Hbase_row<-->MySQL row (see above for DDL of hbase_tbl_rows). The `columns` blob will contain all of the columns. === Partial blob writes/reads === Currently, MySQL uses this approach to work with blobs: If any fragment of the query accesses the blob field, then the storage engine will retrieve the entire blob and provide it to the SQL layer as a malloc'ed char*. It is not possible to read/write a part of the blob. We'll need to extend the storage engine API somehow to accomodate working on parts of BLOBs. TODO: how exactly will the optimizer/hbase-se recognize that we need to readd only some particular column? Hook into table condition pushdown? === Need column names for Dynamic columns === Dynamic columns currently identify columns by numbers. HBase identifies them by string names. Two possible approaches Genuinely support arbitrary column names Store names in .frm file << TODO: Monty mentioned this has some advantage but I don't have access to where I've written this down.

Timour Katchaounov (Inactive) added a comment - 2012-06-13 17:49

Per Monty's request investigated if Cassandra provides any C/C++ API, and if such an API will be easier to program against compared to HBase.

Cassandra provides the following three levels of APIs ordered by ease of use:

High level clients - most recommended API by Cassandra. Provides clients for many different environments/languages.
The one for C++ is: http://snapwebsites.org/project/libqtcassandra. It is used only in one project AFAIU, therefore I would not rely on this one much.

Thrift API (http://wiki.apache.org/cassandra/API). Thrift can generate C++ client API (same as HBase).

Internal API "StorageProxy". This is a native Java API. As such it would have to be accessed via JNI. Cassandra documentation advise strongly against it because it is unstable, and low-level.
(source: http://wiki.apache.org/cassandra/ClientOptions)

In summary, I see no advantages in using Cassandra with respect to its API. The most reasonable choice seems to be Thrift, however HBase provides a Thrift API as well.
It may be possible that Cassandra has other advantages, such as richer set of data access primitives, but I did not compare Cassandra and HBase in this respect.

Timour Katchaounov (Inactive) added a comment - 2012-06-13 17:49 Per Monty's request investigated if Cassandra provides any C/C++ API, and if such an API will be easier to program against compared to HBase. Cassandra provides the following three levels of APIs ordered by ease of use: High level clients - most recommended API by Cassandra. Provides clients for many different environments/languages. The one for C++ is: http://snapwebsites.org/project/libqtcassandra . It is used only in one project AFAIU, therefore I would not rely on this one much. Thrift API ( http://wiki.apache.org/cassandra/API ). Thrift can generate C++ client API (same as HBase). Internal API "StorageProxy". This is a native Java API. As such it would have to be accessed via JNI. Cassandra documentation advise strongly against it because it is unstable, and low-level. (source: http://wiki.apache.org/cassandra/ClientOptions ) In summary, I see no advantages in using Cassandra with respect to its API. The most reasonable choice seems to be Thrift, however HBase provides a Thrift API as well. It may be possible that Cassandra has other advantages, such as richer set of data access primitives, but I did not compare Cassandra and HBase in this respect.

Lars George added a comment - 2012-11-03 11:19

Please note that HBase trunk (termed "singularity") is changing the RPC to ProtoBufs, just like Hadoop Common has done. That way it will be really easy to talk straight to the RPC natively. Obviously, this is not yet released, but seems like a good place to start given that work here still seems pending. Maybe a storage driver, one for ThriftHBase, and one later on for ProtoBufHBase if you want not to wait?

Lars George added a comment - 2012-11-03 11:19 Please note that HBase trunk (termed "singularity") is changing the RPC to ProtoBufs, just like Hadoop Common has done. That way it will be really easy to talk straight to the RPC natively. Obviously, this is not yet released, but seems like a good place to start given that work here still seems pending. Maybe a storage driver, one for ThriftHBase, and one later on for ProtoBufHBase if you want not to wait?

Sergei Petrunia added a comment - 2013-03-21 13:19

Lars, thanks for the note.

Alas, it seems that for now, ~~MDEV-431~~ "Cassandra storage engine" is satisfying all the demand that there was for MySQL-> cloud-no-sql connectivity.

Sergei Petrunia added a comment - 2013-03-21 13:19 Lars, thanks for the note. Alas, it seems that for now, MDEV-431 "Cassandra storage engine" is satisfying all the demand that there was for MySQL-> cloud-no-sql connectivity.

Sergei Petrunia added a comment - 2013-03-21 13:21

Right now, nobody has this work in their plans. Feel free to reopen if/when that changes.

Sergei Petrunia added a comment - 2013-03-21 13:21 Right now, nobody has this work in their plans. Feel free to reopen if/when that changes.

MariaDB Server

Data mapping between HBase and SQL

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration