[MDEV-10075] Provide index of error causing error in array INSERT Created: 2016-05-16  Updated: 2023-03-21  Due: 2021-09-14  Resolved: 2021-10-26

Status: Closed
Project: MariaDB Server
Component/s: Admin statements
Fix Version/s: 10.7.1

Type: Task Priority: Blocker
Reporter: Anders Karlsson Assignee: Rucha Deodhar
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Problem/Incident
causes MDEV-26606 ROW_NUMBER property value isn't passe... Closed
causes MDEV-26610 ERROR_INDEX isn't seen in the error m... Closed
causes MDEV-26611 ERROR_INDEX isn't intuitively clear Closed
causes MDEV-26635 ROW_NUMBER is not 0 for errors not ca... Closed
causes MDEV-26654 ROW_NUMBER is wrong upon INSERT into ... Closed
causes MDEV-26681 ROW_NUMBER is not available within co... Closed
causes MDEV-26684 Unexpected ROW_NUMBER in a condition ... Closed
causes MDEV-26693 ROW_NUMBER is wrong upon INSERT or UP... Closed
causes MDEV-26695 Number of an invalid row is not calcu... Closed
causes MDEV-26698 Incorrect row number upon INSERT .. S... Closed
causes MDEV-26699 Row counter does not work consistentl... Open
causes MDEV-26702 Unexpected ROW_NUMBER upon SELECT fro... Open
causes MDEV-26830 Wrong ROW_NUMBER in diagnostics upon ... Closed
causes MDEV-26832 ROW_NUMBER in SIGNAL/RESIGNAL causes ... Closed
causes MDEV-26836 ROW_NUMBER differs from the number in... Closed
causes MDEV-26841 ROW_NUMBER is not set and differs fro... Closed
causes MDEV-26842 ROW_NUMBER is not set and differs fro... Closed
causes MDEV-26843 Inconsistent behavior of ROW_NUMBER u... In Review
causes MDEV-26844 DELETE returns ROW_NUMBER=1 for every... Closed
causes MDEV-26845 For unsupported statements ROW_NUMBER... Open
causes MDEV-26848 ROW_NUMBER is wrong and differs from ... Open
causes MDEV-26878 Query failing with syntax error sets ... Closed
causes MDEV-26909 ROW_NUMBER and DELETE/UPDATE with ORD... Open
Relates

 Description   

When executing a multi-row INSERT using array notation, i.e.:

INSERT INTO sometable VALUES(1,'a'),(2,'b'),(3,'c);

If there is an error in this statement for example a PRIMARY KEY error, then there is no way of knowing which element in the array caused the error. This really limits the usefulness of array INSERTs which is an issue as this is a really good way to increase INSERT performance.

The suggestion is to add the ability to find the index of the value with an issue by adding a new function to the API. This also need to be made available through Stored Procedures so maybe it should be complemented with a SQL-function. In addition, one such function will also be necessary, for the same reason, for prepared statements.



 Comments   
Comment by Anders Karlsson [ 2016-05-19 ]

I have had a closer look at what an implementation could look like. A API function might not be necessary or even desired, rather a SQL function should work well and this has the advantage of being accessible independent of the client. I created an experimental patch for this, where I added a variable for the index of the current and previous statement to Diagnostics_area in sql_error.h, set this in write_record in sql_insert.cc and then a SQL function to return the value of the previous statement value in item_func.cc and item_create.cc. There is more to it than this for a complete implementation, if for no other reason so because I'm don't know the MariaDB source code that well anymore, but this is enough to prove that this works, which it does:

MariaDB [test]> create table test10075(c1 int not null primary key, c2 char(10) not null unique);
Query OK, 0 rows affected (0.97 sec)
 
MariaDB [test]> insert into test10075 values(1, 'one'),(2, 'two'),(3, 'three');
Query OK, 3 rows affected (0.09 sec)
Records: 3  Duplicates: 0  Warnings: 0
 
MariaDB [test]> insert into test10075 values(4, 'four'),(3, 'three');
ERROR 1062 (23000): Duplicate entry '3' for key 'PRIMARY'
MariaDB [test]> select error_array_index();
+---------------------+
| error_array_index() |
+---------------------+
|                   2 |
+---------------------+
1 row in set (0.00 sec)
MariaDB [test]> insert into test10075 values(4, 'four'),(5, 'five'),(6, 'three');
ERROR 1062 (23000): Duplicate entry 'three' for key 'c2'
MariaDB [test]> select error_array_index();
+---------------------+
| error_array_index() |
+---------------------+
|                   3 |
+---------------------+
1 row in set (0.00 sec)

Comment by Sergei Golubchik [ 2017-05-29 ]

I'd say, it'll be more consistent with the current implementation to use GET DIAGNOSTICS:

MariaDB [test]> create table test10075(c1 int not null primary key, c2 char(10) not null unique);
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> insert into test10075 values(1, 'one'),(2, 'two'),(3, 'three');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0
 
MariaDB [test]> insert into test10075 values(4, 'four'),(3, 'three');
ERROR 1062 (23000): Duplicate entry '3' for key 'PRIMARY'
 
MariaDB [test]> get diagnostics condition 1 @var = MESSAGE_TEXT;
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> select @var;
+---------------------------------------+
| @var                                  |
+---------------------------------------+
| Duplicate entry '3' for key 'PRIMARY' |
+---------------------------------------+
1 row in set (0.00 sec)

Similarly we can implement

MariaDB [test]> get diagnostics condition 1 @var = ROW_NUMBER;
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> select @var;
+------+
| @var |
+------+
| 2    |
+------+
1 row in set (0.00 sec)

Comment by Rucha Deodhar [ 2021-08-16 ]

Patch:
https://github.com/MariaDB/server/commit/9295921e8740335e12b2697ffa8e444cc942b0a2

Comment by Oleksandr Byelkin [ 2021-09-08 ]

All is OK, just fix 2 things I mentioned about test suite.

Comment by Elena Stepanova [ 2021-09-14 ]

Is there a reason this row number cannot be a part of the error message itself (in addition to diagnostics property)?
It wouldn't be anything new, we already have it for some errors, e.g.

MariaDB [test]> insert into t1 values (1,'foo'),(2,'foobar');
ERROR 1406 (22001): Data too long for column 'c' at row 2

While diagnostics property is useful to detect the number of the flawed row programmatically, in interactive execution having it in the error text is clearly more user-friendly, especially when many values are involved:

MariaDB [test]> insert ignore into t1 values (1,'foo'),(2,'foobar'),(2,'bar'),(3,'a'),(4,'qux'),(4,'quux'),(5,'baz');
Query OK, 5 rows affected, 4 warnings (0.014 sec)
Records: 7  Duplicates: 2  Warnings: 4
 
MariaDB [test]> show warnings;
+---------+------+----------------------------------------+
| Level   | Code | Message                                |
+---------+------+----------------------------------------+
| Warning | 1265 | Data truncated for column 'c' at row 2 |
| Warning | 1062 | Duplicate entry '2' for key 'PRIMARY'  |
| Warning | 1265 | Data truncated for column 'c' at row 6 |
| Warning | 1062 | Duplicate entry '4' for key 'PRIMARY'  |
+---------+------+----------------------------------------+
4 rows in set (0.000 sec)

Comment by Elena Stepanova [ 2021-09-14 ]

A note unrelated to the previous one
The initially suggested ROW_NUMBER was intuitively understandable, but ERROR_INDEX in the final implementation is not. It is not necessarily an error, and it is certainly not its index.

MariaDB [test]> insert ignore into t1 (pk) values (1),(1);
Query OK, 1 row affected, 1 warning (0.017 sec)
Records: 2  Duplicates: 1  Warnings: 1
 
MariaDB [test]> show warnings;
+---------+------+---------------------------------------+
| Level   | Code | Message                               |
+---------+------+---------------------------------------+
| Warning | 1062 | Duplicate entry '1' for key 'PRIMARY' |
+---------+------+---------------------------------------+
1 row in set (0.000 sec)
 
MariaDB [test]>  get diagnostics condition 1 @num = ERROR_INDEX, @msg = MESSAGE_TEXT; select @num, @msg;
Query OK, 0 rows affected (0.000 sec)
 
+------+---------------------------------------+
| @num | @msg                                  |
+------+---------------------------------------+
|    2 | Duplicate entry '1' for key 'PRIMARY' |
+------+---------------------------------------+
1 row in set (0.000 sec)

If something is an "index" here, it's the condition number (1), but the failing row number is 2. And it's a warning.

Comment by Rucha Deodhar [ 2021-09-14 ]

Hi elenst , Error messages are not very specific for every statement. Example we can have Data truncated for col 'a' at row i for ALTER too and not just INSERT. Plus not all error messages reflect which row gave error or warning. ERROR_INDEX works only for INSERT and also makes it possible to know which row has problem for all INSERT-related errors/warnings.

ROW_NUMBER looked a little non-specific to me, so I named it ERROR_INDEX from the task title. I agree it isn't really an "index", but it indicates the i-th row, so it made sense to start from 1 instead of 0. (But I can change it if it is not too late, the closer it is to its actual function the better). It seemed better to also have warning along with errors if there are problematic rows especially in case of INSERT...IGNORE where errors are ignored but user might still want to know the row that gave warning.

Comment by Sergei Golubchik [ 2021-09-15 ]

This issue was closed too early, before all comments were addressed. So they were moved to separate, linked, issues.

Comment by Sergei Golubchik [ 2021-10-16 ]

let's keep it open until bb-10.7-row_number is pushed

Comment by Elena Stepanova [ 2021-10-21 ]

In my opinion the functionality as of bb-10.7-row_number d555ae3 can be merged into 10.7 main branch and released with 10.7.1.

It is not perfect due to inherited legacy issues, but

  • the mechanism for extracting the row number has been introduced and it is consistent with already existing functionality;
  • for common and presumably most important use cases it returns reasonable values;
  • there are no known major discrepancies between the new value and the existing warning/error messages reporting a row number (remaining open issues of this kind are corner cases).

There will be a fair amount of wrong or questionable results for SQL statements which weren't the target of this task. Some are already reported as bugs, and there are more to come. In many cases it is not even obvious what the result should be. I expect that depending on the feedback we will be able to determine whether the functionality is employed by a sufficient part of users, and if it is, in future releases it will be adjusted to a more predictable behavior with a wider range of statements.

Generated at Thu Feb 08 07:39:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.