Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27385

Adding constraint sometimes (non-repeatably) checks historical rows

Details

    Description

      There is an error that I am running across but am unable to repeat on a simple schema. If anyone can point me in the right direction to collect more details, I would be happy to look.

      The problem arises when attempting to add a unique key on two columns with the following:

      set @@system_versioning_alter_history=1;
      ALTER TABLE manufacturer_pn
      ADD CONSTRAINT manufacturer_pn_UN
      UNIQUE KEY (manufacturer, manufacturer_pn);
      

      This results in a duplicate entry error:
      `Duplicate entry 'Yageo-RC0805FR-071ML' for key 'manufacturer_pn_UN'

      In an effort to find the problem, I searched for everything that matched the problematic key:

      SELECT id, manufacturer, manufacturer_pn, ROW_START, ROW_END
      FROM manufacturer_pn
      FOR SYSTEM_TIME BETWEEN (NOW() - INTERVAL 1 YEAR) AND NOW()
      WHERE manufacturer_pn LIKE 'RC0805FR-071ML'
      ORDER BY ROW_END;
      

      Results of the command:

      id  |manufacturer|manufacturer_pn|ROW_START          |ROW_END            |
      ----+------------+---------------+-------------------+-------------------+
      5407|Yageo       |RC0805FR-071ML |2021-12-06 10:05:11|2021-12-16 00:13:34|
      5363|Yageo       |RC0805FR-071ML |2021-12-06 10:05:11|2021-12-16 00:13:34|
      5407|Yageo       |RC0805FR-071ML |2021-12-16 00:13:34|2021-12-29 12:03:47|
      5363|Yageo       |RC0805FR-071ML |2021-12-16 00:13:34|2021-12-29 12:22:30|
      5363|Yageo Group |RC0805FR-071ML |2021-12-29 12:22:30|2021-12-29 23:41:56|
      5363|Yageo       |RC0805FR-071ML |2021-12-29 23:41:56|2021-12-29 23:57:01|
      5363|Yageo Test  |RC0805FR-071ML |2021-12-29 23:57:01|2021-12-29 23:57:31|
      

      Note that none of these rows are active, so there should be no violation. This leads me to think that adding the unique constraint is running a check on historical rows, which it shouldn't. Setting `check_constraint_checks` to 0 also has no effect, and adding the key still fails.

      As mentioned, I would be happy to dig deeper to try to find the cause since I can't seem to replicate this issue on table from scratch.

      Attachments

        Activity

          This does not look like a problem of the InnoDB storage engine, but rather a problem with the implementation of versioned tables in MariaDB.

          I would expect the ROW_START column to be appended to the PRIMARY KEY (I will assume it is id) and therefore to each index. So, we would have PRIMARY KEY(id,ROW_START). If the SQL layer (including the versioning related changes) told InnoDB that the secondary index definition is UNIQUE KEY manufacturer_pn_UN(manufacturer,manufacturer_pn), then the internal representation of the UNIQUE KEY manufacturer_pn_UN in InnoDB should be (manufacturer, manufacturer_pn, id, ROW_START). Because the dict_index_t::n_uniq would be 2, InnoDB would flag a uniqueness violations. InnoDB only tolerates ‘duplicates’ for NULL values, and in that case the internal uniqueness will be guaranteed by the PRIMARY KEY values. Example:

          CREATE TABLE t(a INT PRIMARY KEY, b INT UNIQUE) ENGINE=InnoDB;
          INSERT INTO t VALUES(1,NULL),(2,NULL);
          

          Here, in the index tree of UNIQUE b(b,a) we would have the values (NULL,1),(NULL,2).

          I don’t know if this really is a code bug or a user error. A possible work-around could be to explicitly include the ROW_START column in the index:

          set @@system_versioning_alter_history=1;
          ALTER TABLE manufacturer_pn
          ADD CONSTRAINT manufacturer_pn_UN
          UNIQUE KEY (manufacturer, manufacturer_pn, ROW_START);
          

          In that way, the internal representation would be (manufacturer,manufacturer_pn,ROW_START,id), and the uniqueness violation should disappear.

          tgross35, can you try this? I did not try it myself, because your SQL example is not self-contained.

          marko Marko Mäkelä added a comment - This does not look like a problem of the InnoDB storage engine, but rather a problem with the implementation of versioned tables in MariaDB. I would expect the ROW_START column to be appended to the PRIMARY KEY (I will assume it is id ) and therefore to each index. So, we would have PRIMARY KEY(id,ROW_START) . If the SQL layer (including the versioning related changes) told InnoDB that the secondary index definition is UNIQUE KEY manufacturer_pn_UN(manufacturer,manufacturer_pn) , then the internal representation of the UNIQUE KEY manufacturer_pn_UN in InnoDB should be (manufacturer, manufacturer_pn, id, ROW_START) . Because the dict_index_t::n_uniq would be 2, InnoDB would flag a uniqueness violations. InnoDB only tolerates ‘duplicates’ for NULL values, and in that case the internal uniqueness will be guaranteed by the PRIMARY KEY values. Example: CREATE TABLE t(a INT PRIMARY KEY , b INT UNIQUE ) ENGINE=InnoDB; INSERT INTO t VALUES (1, NULL ),(2, NULL ); Here, in the index tree of UNIQUE b(b,a) we would have the values (NULL,1),(NULL,2) . I don’t know if this really is a code bug or a user error. A possible work-around could be to explicitly include the ROW_START column in the index: set @@system_versioning_alter_history=1; ALTER TABLE manufacturer_pn ADD CONSTRAINT manufacturer_pn_UN UNIQUE KEY (manufacturer, manufacturer_pn, ROW_START); In that way, the internal representation would be (manufacturer,manufacturer_pn,ROW_START,id) , and the uniqueness violation should disappear. tgross35 , can you try this? I did not try it myself, because your SQL example is not self-contained.
          serg Sergei Golubchik added a comment - - edited

          ROW_START shouldn't be part of a PRIMARY KEY, why would it? It'd mean you can do

          create table t1 (id int primary key, b varchar(100)) with system versioning;
          insert t1 values (1, "first row");
          insert t1 values (1, "another first row o_O");
          

          but ROW_END is, indeed, implicitly appended to unique keys.

          See first two rows, at (or just before) 2021-12-16 00:13:34 you had two rows with the value Yageo-RC0805FR-071ML.

          As documentation says, table metadata is not versioned, after ALTER TABLE the new table looks as if it had new structure from the very beginning, only the data is versioned.
          https://mariadb.com/kb/en/system-versioned-tables/#system_versioning_alter_history

          And because you had conflicting rows earlier, you cannot add a UNIQUE constraint that was violated somewhere in the table's past.

          serg Sergei Golubchik added a comment - - edited ROW_START shouldn't be part of a PRIMARY KEY, why would it? It'd mean you can do create table t1 (id int primary key , b varchar (100)) with system versioning; insert t1 values (1, "first row" ); insert t1 values (1, "another first row o_O" ); but ROW_END is, indeed, implicitly appended to unique keys. See first two rows, at (or just before) 2021-12-16 00:13:34 you had two rows with the value Yageo-RC0805FR-071ML . As documentation says, table metadata is not versioned, after ALTER TABLE the new table looks as if it had new structure from the very beginning, only the data is versioned. https://mariadb.com/kb/en/system-versioned-tables/#system_versioning_alter_history And because you had conflicting rows earlier, you cannot add a UNIQUE constraint that was violated somewhere in the table's past.
          tgross35 Trevor Gross added a comment -

          I have since upgraded this server to 11.1, but the problem still seems to exist. @marko that query claims ROW_START does not exist, probably because it is anonymous.

          But I think Sergei found the issue, I didn't even realize that two of those rows must have gotten inserted in the same transaction. That makes sense, this is easy enough to repeat knowing that:

          MariaDB [db]> create table t1 (id serial, a int, b int) with system versioning;
          MariaDB [db]> insert into t1 (id, a, b) values (1, 1, 1), (2, 1, 1);
          MariaDB [db]> delete from t1;
          MariaDB [db]> select * from t1;
          Empty set (0.000 sec)
           
          MariaDB [db]> select *, ROW_START, ROW_END from t1 for system_time all;
          +----+------+------+----------------------------+----------------------------+
          | id | a    | b    | ROW_START                  | ROW_END                    |
          +----+------+------+----------------------------+----------------------------+
          |  1 |    1 |    1 | 2024-01-23 23:54:16.065755 | 2024-01-23 23:56:09.719628 |
          |  2 |    1 |    1 | 2024-01-23 23:54:16.065755 | 2024-01-23 23:56:09.719628 |
          +----+------+------+----------------------------+----------------------------+
          2 rows in set (0.000 sec)
           
          MariaDB [db]> set @@system_versioning_alter_history=1;
          MariaDB [db]> alter table t1 add constraint foo unique key (a, b);
          ERROR 1062 (23000): Duplicate entry '1-1' for key 'foo'
          

          It threw me off that the time isn't included in the error message, I couldn't understand why adding other keys with duplicate history worked but this specific one failed.

          Thanks all for the help. Wish that `DELETE HISTORY` took a where clause

          tgross35 Trevor Gross added a comment - I have since upgraded this server to 11.1, but the problem still seems to exist. @marko that query claims ROW_START does not exist, probably because it is anonymous. But I think Sergei found the issue, I didn't even realize that two of those rows must have gotten inserted in the same transaction. That makes sense, this is easy enough to repeat knowing that: MariaDB [db]> create table t1 (id serial, a int , b int ) with system versioning; MariaDB [db]> insert into t1 (id, a, b) values (1, 1, 1), (2, 1, 1); MariaDB [db]> delete from t1; MariaDB [db]> select * from t1; Empty set (0.000 sec)   MariaDB [db]> select *, ROW_START, ROW_END from t1 for system_time all ; + ----+------+------+----------------------------+----------------------------+ | id | a | b | ROW_START | ROW_END | + ----+------+------+----------------------------+----------------------------+ | 1 | 1 | 1 | 2024-01-23 23:54:16.065755 | 2024-01-23 23:56:09.719628 | | 2 | 1 | 1 | 2024-01-23 23:54:16.065755 | 2024-01-23 23:56:09.719628 | + ----+------+------+----------------------------+----------------------------+ 2 rows in set (0.000 sec)   MariaDB [db]> set @@system_versioning_alter_history=1; MariaDB [db]> alter table t1 add constraint foo unique key (a, b); ERROR 1062 (23000): Duplicate entry '1-1' for key 'foo' It threw me off that the time isn't included in the error message, I couldn't understand why adding other keys with duplicate history worked but this specific one failed. Thanks all for the help. Wish that `DELETE HISTORY` took a where clause

          I stand corrected on how the PRIMARY KEY is being padded with version-related metadata. I think that for InnoDB and MyRocks, ROW_START would have been a better choice than ROW_END. In heap-organized storage engines it might matter less.

          A DELETE or UPDATE operation would have to update ROW_END of the original record. In InnoDB, updating a PRIMARY KEY column will require delete-marking the record in all indexes, inserting a new record in all indexes, and eventually, purging the delete-marked record. In MyRocks, I understand that this operation would involve inserting a tombstone record for the record that may have carried an infinite ROW_END, followed by inserting a new record with the ROW_END set to the current time. As far as I can tell, ROW_START is ‘immutable’ after the record has been created, and therefore it would have been a better choice in this regard. That is, a DELETE (or the DELETE part of a versioned UPDATE) would only update a non-indexed column in the existing record.

          marko Marko Mäkelä added a comment - I stand corrected on how the PRIMARY KEY is being padded with version-related metadata. I think that for InnoDB and MyRocks, ROW_START would have been a better choice than ROW_END . In heap-organized storage engines it might matter less. A DELETE or UPDATE operation would have to update ROW_END of the original record. In InnoDB, updating a PRIMARY KEY column will require delete-marking the record in all indexes, inserting a new record in all indexes, and eventually, purging the delete-marked record. In MyRocks, I understand that this operation would involve inserting a tombstone record for the record that may have carried an infinite ROW_END , followed by inserting a new record with the ROW_END set to the current time. As far as I can tell, ROW_START is ‘immutable’ after the record has been created, and therefore it would have been a better choice in this regard. That is, a DELETE (or the DELETE part of a versioned UPDATE ) would only update a non-indexed column in the existing record.

          People

            Unassigned Unassigned
            tgross35 Trevor Gross
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.