[MDEV-122] Data mapping between HBase and SQL Created: 2012-01-27 Updated: 2021-01-18 Resolved: 2013-03-21 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major |
| Reporter: | Rasmus Johansson (Inactive) | Assignee: | Sergei Petrunia |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
The spec is at http://kb.askmonty.org/en/hbase-storage-engine/ |
| Comments |
| Comment by Sergei Petrunia [ 2012-01-30 ] |
|
== Data mapping from HBase to SQL == Hbase table consists of rows, which are identified by row key. Each row has an
One can see two ways to map that to SQL tables: === Per-row mapping === Let each row in HBase table be mapped into a row from SQL point of view: SELECT * FROM hbase_table; row-id column1 column2 This is the most straightforward mapping. However, accessing some of the hbase given a row-id, get a list of columns in the row (maybe, with their values) in this mapping, result of this query will be one row, and there is no Table DDL could look like this: create table hbase_tbl_rows (
=== Per-cell mapping === HBase shell has 'scan' command, here's an example of its output: hbase(main):007:0> scan 'testtable' Here, one HBase row produces multiple rows in the output. Each output row create table hbase_tbl_cells (
== Consistency and transactionality ==
|
| Comment by Rasmus Johansson (Inactive) [ 2012-01-30 ] |
|
Finalize the description and any questions we might have. Let's then send it for review. |
| Comment by Sergei Petrunia [ 2012-01-30 ] |
|
== Data mapping from HBase to SQL == Hbase table consists of rows, which are identified by row key. Each row has an
One can see two ways to map that to SQL tables: === Per-row mapping === Let each row in HBase table be mapped into a row from SQL point of view: SELECT * FROM hbase_table; row-id column1 column2 This is the most straightforward mapping. However, accessing some of the hbase given a row-id, get a list of columns in the row (maybe, with their values) in this mapping, result of this query will be one row, and there is no Table DDL could look like this: CREATE TABLE hbase_tbl_rows (
=== Per-cell mapping === HBase shell has 'scan' command, here's an example of its output: hbase(main):007:0> scan 'testtable' Here, one HBase row produces multiple rows in the output. Each output row CREATE TABLE hbase_tbl_cells (
== Consistency, transactions, etc ==
|
| Comment by Sergei Petrunia [ 2012-04-12 ] |
|
== Results of discussion with Monty == As first milestone, implement only Hbase_row<-->MySQL row (see above for DDL of === Partial blob writes/reads === We'll need to extend the storage engine API somehow to accomodate working on TODO: how exactly will the optimizer/hbase-se recognize that we need to readd === Need column names for Dynamic columns === Dynamic columns currently identify columns by numbers. HBase identifies them by Two possible approaches
|
| Comment by Timour Katchaounov (Inactive) [ 2012-06-13 ] |
|
Per Monty's request investigated if Cassandra provides any C/C++ API, and if such an API will be easier to program against compared to HBase. Cassandra provides the following three levels of APIs ordered by ease of use:
In summary, I see no advantages in using Cassandra with respect to its API. The most reasonable choice seems to be Thrift, however HBase provides a Thrift API as well. |
| Comment by Lars George [ 2012-11-03 ] |
|
Please note that HBase trunk (termed "singularity") is changing the RPC to ProtoBufs, just like Hadoop Common has done. That way it will be really easy to talk straight to the RPC natively. Obviously, this is not yet released, but seems like a good place to start given that work here still seems pending. Maybe a storage driver, one for ThriftHBase, and one later on for ProtoBufHBase if you want not to wait? |
| Comment by Sergei Petrunia [ 2013-03-21 ] |
|
Lars, thanks for the note. Alas, it seems that for now, |
| Comment by Sergei Petrunia [ 2013-03-21 ] |
|
Right now, nobody has this work in their plans. Feel free to reopen if/when that changes. |