[MDEV-24001] Implement hidden PK for RBR of no-unique key table Created: 2020-10-21  Updated: 2023-12-22

Status: Open
Project: MariaDB Server
Component/s: Replication
Fix Version/s: 11.5

Type: Task Priority: Major
Reporter: Andrei Elkin Assignee: Andrei Elkin
Resolution: Unresolved Votes: 1
Labels: beginner-friendly

Issue Links:
Relates
relates to MDEV-21181 Automatic invisible primary key Open
relates to MDEV-28412 Introduce sql_require_primary_key Open
relates to MDEV-30557 Adding invisible SERIAL column primar... Closed
relates to MDEV-19506 Remove the global sequence DICT_HDR_R... Closed
relates to MDEV-24830 Write a warning to error log if Galer... Closed
relates to MDEV-25551 Assertion `mode_ == m_local || transa... Closed
relates to MDEV-30430 Enabling system versioning on tables ... Closed

 Description   

Performance of row event applying on no-unique key table is inferior and degrades
with growing size of the table.
This task aims at replacement of table scanning with key lookup, and such key - or a hidden "PK" - would be created
automatically. It may not be visible to the user as a part of the table def like the row id of Innodb is never exposed.



 Comments   
Comment by Sergei Golubchik [ 2020-11-06 ]

what will be a user interface for that?

Comment by Marko Mäkelä [ 2021-01-04 ]

I remember first hearing of this problem in 2012 from a cloud service provider. I understood that they had an internal patch for this.

This ticket was filed on my request. In the discussion thread, I had written my suggestion for the interface:

When it comes to tables without primary key, I think that it would be a good idea to make use of the MariaDB hidden columns and indexes, to actually expose a ‘rowid’ to the SQL and replication layer. In that way, we could avoid hacks around this, and the InnoDB internal DB_ROW_ID would never be materialized or used.

The above is not a user interface, which serg asked about. I think that we could introduce an SQL_mode for disabling this feature on DDL time, if someone really wants to create or alter a table that does not have any primary key, not even a hidden one. I note that we missed the opportunity to introduce such modes earlier, for example in MDEV-371, so maybe no interface change would be needed here at all?

Comment by Andrei Elkin [ 2021-01-05 ]

marko, sachin.setiya.007: to the user interfaces part, maybe something like
add up a hidden "ROW_ID" attribute at any unique-less attribute table creation, if the engine
is capable for that? It makes to do so automatically when binlogging is enabled.

Comment by Marko Mäkelä [ 2021-01-05 ]

Elkin, I do not think that storage engines should have any say in whether the SQL layer considers something to be a hidden column. A hidden (virtual) column was already implemented in MDEV-371. It should not be that much more complex to implement a hidden column and a hidden primary key. I do not think that it is feasible to try to expose the InnoDB-internal DB_ROW_ID in the SQL layer.

As far as I understand, O(log n) operations can become O(n) or worse even when using a heap-organized table, such as ENGINE=MyISAM. InnoDB is internally creating a hidden primary key (GEN_CLUST_INDEX(DB_ROW_ID)) that is almost entirely hidden from the SQL layer. Such InnoDB-internal hidden primary key would never be created if the SQL layer automatically created a hidden primary key (in the .frm file and in TABLE_SHARE). I think that it could be feasible to unconditionally and automatically create such a hidden primary key.

But, I have no idea how this would affect cross-version replication (replicating from a table that lacks a primary key to one that has a hidden primary key in TABLE_SHARE, or vice versa). For that, we might want to have some configuration parameter.

Comment by Marko Mäkelä [ 2021-01-23 ]

A blog post that appeared soon after the MySQL 8.0.23 release suggests that one could explicitly create a hidden primary key:

alter table table2 
     add column id int unsigned auto_increment 
     primary key invisible first;

I think that we should aim for something fool-proof, automatically creating such hidden primary keys based on a SET variable, such as SQL_mode. As far as I understand, the original problem statement was that a DBaaS provider gets severe replication performance problems when a user does something like this:

CREATE TABLE t (a INT, b INT);
INSERT INTO t(a) SELECT * FROM seq_1_to_1000000;
UPDATE t SET b=a;

On the primary server, the UPDATE would be a table scan. But, on replicas, due to a missing primary key (or UNIQUE INDEX on NOT NULL columns), in row-based replication, applying each change would require a table scan (visiting n² rows instead of n rows). We would like the hidden primary key to be created automatically, so that row-based replication can take advantage of it.

Comment by Andrei Elkin [ 2022-04-26 ]

serg to your question, MDEV-21181 reporter suggests to this matter.

Generated at Thu Feb 08 09:26:41 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.