If there is an error in this statement for example a PRIMARY KEY error, then there is no way of knowing which element in the array caused the error. This really limits the usefulness of array INSERTs which is an issue as this is a really good way to increase INSERT performance.
The suggestion is to add the ability to find the index of the value with an issue by adding a new function to the API. This also need to be made available through Stored Procedures so maybe it should be complemented with a SQL-function. In addition, one such function will also be necessary, for the same reason, for prepared statements.
Attachments
Issue Links
causes
MDEV-26606ROW_NUMBER property value isn't passed from inside a stored procedure
Closed
MDEV-26610ERROR_INDEX isn't seen in the error message
When executing a multi-row INSERT using array notation, i.e.:
INSERT INTO sometable VALUES(1,'a'),(2,'b'),(3,'c);
If there is an error in this statement for example a PRIMARY KEY error, then there is no way of knowing which element in the array caused the error. This really limits the usefulness of array INSERTs which is an issue as this is a really good way to increase INSERT performance.
When executing a multi-row INSERT using array notation, i.e.:
INSERT INTO sometable VALUES(1,'a'),(2,'b'),(3,'c);
If there is an error in this statement for example a PRIMARY KEY error, then there is no way of knowing which element in the array caused the error. This really limits the usefulness of array INSERTs which is an issue as this is a really good way to increase INSERT performance.
The suggestion is to add the ability to find the inxed of the value with an issue by adding a new function to the API. This also need to be made available through Stored Procedures so maybe it should be complemented with a SQL-function. In addition, one such function will also be necessary, for the same reason, for prepared statements.
When executing a multi-row INSERT using array notation, i.e.:
INSERT INTO sometable VALUES(1,'a'),(2,'b'),(3,'c);
If there is an error in this statement for example a PRIMARY KEY error, then there is no way of knowing which element in the array caused the error. This really limits the usefulness of array INSERTs which is an issue as this is a really good way to increase INSERT performance.
The suggestion is to add the ability to find the inxed of the value with an issue by adding a new function to the API. This also need to be made available through Stored Procedures so maybe it should be complemented with a SQL-function. In addition, one such function will also be necessary, for the same reason, for prepared statements.
When executing a multi-row INSERT using array notation, i.e.:
{code:sql}
INSERT INTO sometable VALUES(1,'a'),(2,'b'),(3,'c);
{code}
If there is an error in this statement for example a PRIMARY KEY error, then there is no way of knowing which element in the array caused the error. This really limits the usefulness of array INSERTs which is an issue as this is a really good way to increase INSERT performance.
The suggestion is to add the ability to find the inxed of the value with an issue by adding a new function to the API. This also need to be made available through Stored Procedures so maybe it should be complemented with a SQL-function. In addition, one such function will also be necessary, for the same reason, for prepared statements.
When executing a multi-row INSERT using array notation, i.e.:
{code:sql}
INSERT INTO sometable VALUES(1,'a'),(2,'b'),(3,'c);
{code}
If there is an error in this statement for example a PRIMARY KEY error, then there is no way of knowing which element in the array caused the error. This really limits the usefulness of array INSERTs which is an issue as this is a really good way to increase INSERT performance.
The suggestion is to add the ability to find the inxed of the value with an issue by adding a new function to the API. This also need to be made available through Stored Procedures so maybe it should be complemented with a SQL-function. In addition, one such function will also be necessary, for the same reason, for prepared statements.
When executing a multi-row INSERT using array notation, i.e.:
{code:sql}
INSERT INTO sometable VALUES(1,'a'),(2,'b'),(3,'c);
{code}
If there is an error in this statement for example a PRIMARY KEY error, then there is no way of knowing which element in the array caused the error. This really limits the usefulness of array INSERTs which is an issue as this is a really good way to increase INSERT performance.
The suggestion is to add the ability to find the index of the value with an issue by adding a new function to the API. This also need to be made available through Stored Procedures so maybe it should be complemented with a SQL-function. In addition, one such function will also be necessary, for the same reason, for prepared statements.
I have had a closer look at what an implementation could look like. A API function might not be necessary or even desired, rather a SQL function should work well and this has the advantage of being accessible independent of the client. I created an experimental patch for this, where I added a variable for the index of the current and previous statement to Diagnostics_area in sql_error.h, set this in write_record in sql_insert.cc and then a SQL function to return the value of the previous statement value in item_func.cc and item_create.cc. There is more to it than this for a complete implementation, if for no other reason so because I'm don't know the MariaDB source code that well anymore, but this is enough to prove that this works, which it does:
Anders Karlsson
added a comment - I have had a closer look at what an implementation could look like. A API function might not be necessary or even desired, rather a SQL function should work well and this has the advantage of being accessible independent of the client. I created an experimental patch for this, where I added a variable for the index of the current and previous statement to Diagnostics_area in sql_error.h, set this in write_record in sql_insert.cc and then a SQL function to return the value of the previous statement value in item_func.cc and item_create.cc. There is more to it than this for a complete implementation, if for no other reason so because I'm don't know the MariaDB source code that well anymore, but this is enough to prove that this works, which it does:
MariaDB [test]> create table test10075(c1 int not null primary key , c2 char (10) not null unique );
Query OK, 0 rows affected (0.97 sec)
MariaDB [test]> insert into test10075 values (1, 'one' ),(2, 'two' ),(3, 'three' );
Query OK, 3 rows affected (0.09 sec)
Records: 3 Duplicates: 0 Warnings: 0
MariaDB [test]> insert into test10075 values (4, 'four' ),(3, 'three' );
ERROR 1062 (23000): Duplicate entry '3' for key 'PRIMARY'
MariaDB [test]> select error_array_index();
+ ---------------------+
| error_array_index() |
+ ---------------------+
| 2 |
+ ---------------------+
1 row in set (0.00 sec)
MariaDB [test]> insert into test10075 values (4, 'four' ),(5, 'five' ),(6, 'three' );
ERROR 1062 (23000): Duplicate entry 'three' for key 'c2'
MariaDB [test]> select error_array_index();
+ ---------------------+
| error_array_index() |
+ ---------------------+
| 3 |
+ ---------------------+
1 row in set (0.00 sec)
Is there a reason this row number cannot be a part of the error message itself (in addition to diagnostics property)?
It wouldn't be anything new, we already have it for some errors, e.g.
ERROR 1406 (22001): Data too long forcolumn'c'at row 2
While diagnostics property is useful to detect the number of the flawed row programmatically, in interactive execution having it in the error text is clearly more user-friendly, especially when many values are involved:
Elena Stepanova
added a comment - Is there a reason this row number cannot be a part of the error message itself (in addition to diagnostics property)?
It wouldn't be anything new, we already have it for some errors, e.g.
MariaDB [test]> insert into t1 values (1, 'foo' ),(2, 'foobar' );
ERROR 1406 (22001): Data too long for column 'c' at row 2
While diagnostics property is useful to detect the number of the flawed row programmatically, in interactive execution having it in the error text is clearly more user-friendly, especially when many values are involved:
MariaDB [test]> insert ignore into t1 values (1, 'foo' ),(2, 'foobar' ),(2, 'bar' ),(3, 'a' ),(4, 'qux' ),(4, 'quux' ),(5, 'baz' );
Query OK, 5 rows affected, 4 warnings (0.014 sec)
Records: 7 Duplicates: 2 Warnings: 4
MariaDB [test]> show warnings;
+ ---------+------+----------------------------------------+
| Level | Code | Message |
+ ---------+------+----------------------------------------+
| Warning | 1265 | Data truncated for column 'c' at row 2 |
| Warning | 1062 | Duplicate entry '2' for key 'PRIMARY' |
| Warning | 1265 | Data truncated for column 'c' at row 6 |
| Warning | 1062 | Duplicate entry '4' for key 'PRIMARY' |
+ ---------+------+----------------------------------------+
4 rows in set (0.000 sec)
A note unrelated to the previous one
The initially suggested ROW_NUMBER was intuitively understandable, but ERROR_INDEX in the final implementation is not. It is not necessarily an error, and it is certainly not its index.
If something is an "index" here, it's the condition number (1), but the failing row number is 2. And it's a warning.
Elena Stepanova
added a comment - A note unrelated to the previous one
The initially suggested ROW_NUMBER was intuitively understandable, but ERROR_INDEX in the final implementation is not. It is not necessarily an error, and it is certainly not its index.
MariaDB [test]> insert ignore into t1 (pk) values (1),(1);
Query OK, 1 row affected, 1 warning (0.017 sec)
Records: 2 Duplicates: 1 Warnings: 1
MariaDB [test]> show warnings;
+ ---------+------+---------------------------------------+
| Level | Code | Message |
+ ---------+------+---------------------------------------+
| Warning | 1062 | Duplicate entry '1' for key 'PRIMARY' |
+ ---------+------+---------------------------------------+
1 row in set (0.000 sec)
MariaDB [test]> get diagnostics condition 1 @num = ERROR_INDEX, @msg = MESSAGE_TEXT; select @num, @msg;
Query OK, 0 rows affected (0.000 sec)
+ ------+---------------------------------------+
| @num | @msg |
+ ------+---------------------------------------+
| 2 | Duplicate entry '1' for key 'PRIMARY' |
+ ------+---------------------------------------+
1 row in set (0.000 sec)
If something is an "index" here, it's the condition number (1), but the failing row number is 2. And it's a warning.
Hi elenst , Error messages are not very specific for every statement. Example we can have Data truncated for col 'a' at row i for ALTER too and not just INSERT. Plus not all error messages reflect which row gave error or warning. ERROR_INDEX works only for INSERT and also makes it possible to know which row has problem for all INSERT-related errors/warnings.
ROW_NUMBER looked a little non-specific to me, so I named it ERROR_INDEX from the task title. I agree it isn't really an "index", but it indicates the i-th row, so it made sense to start from 1 instead of 0. (But I can change it if it is not too late, the closer it is to its actual function the better). It seemed better to also have warning along with errors if there are problematic rows especially in case of INSERT...IGNORE where errors are ignored but user might still want to know the row that gave warning.
Rucha Deodhar
added a comment - Hi elenst , Error messages are not very specific for every statement. Example we can have Data truncated for col 'a' at row i for ALTER too and not just INSERT. Plus not all error messages reflect which row gave error or warning. ERROR_INDEX works only for INSERT and also makes it possible to know which row has problem for all INSERT-related errors/warnings.
ROW_NUMBER looked a little non-specific to me, so I named it ERROR_INDEX from the task title. I agree it isn't really an "index", but it indicates the i-th row, so it made sense to start from 1 instead of 0. (But I can change it if it is not too late, the closer it is to its actual function the better). It seemed better to also have warning along with errors if there are problematic rows especially in case of INSERT...IGNORE where errors are ignored but user might still want to know the row that gave warning.
This issue was closed too early, before all comments were addressed. So they were moved to separate, linked, issues.
Sergei Golubchik
added a comment - This issue was closed too early, before all comments were addressed. So they were moved to separate, linked, issues.
In my opinion the functionality as of bb-10.7-row_number d555ae3 can be merged into 10.7 main branch and released with 10.7.1.
It is not perfect due to inherited legacy issues, but
the mechanism for extracting the row number has been introduced and it is consistent with already existing functionality;
for common and presumably most important use cases it returns reasonable values;
there are no known major discrepancies between the new value and the existing warning/error messages reporting a row number (remaining open issues of this kind are corner cases).
There will be a fair amount of wrong or questionable results for SQL statements which weren't the target of this task. Some are already reported as bugs, and there are more to come. In many cases it is not even obvious what the result should be. I expect that depending on the feedback we will be able to determine whether the functionality is employed by a sufficient part of users, and if it is, in future releases it will be adjusted to a more predictable behavior with a wider range of statements.
Elena Stepanova
added a comment - In my opinion the functionality as of bb-10.7-row_number d555ae3 can be merged into 10.7 main branch and released with 10.7.1.
It is not perfect due to inherited legacy issues, but
the mechanism for extracting the row number has been introduced and it is consistent with already existing functionality;
for common and presumably most important use cases it returns reasonable values;
there are no known major discrepancies between the new value and the existing warning/error messages reporting a row number (remaining open issues of this kind are corner cases).
There will be a fair amount of wrong or questionable results for SQL statements which weren't the target of this task. Some are already reported as bugs, and there are more to come. In many cases it is not even obvious what the result should be. I expect that depending on the feedback we will be able to determine whether the functionality is employed by a sufficient part of users, and if it is, in future releases it will be adjusted to a more predictable behavior with a wider range of statements.
I have had a closer look at what an implementation could look like. A API function might not be necessary or even desired, rather a SQL function should work well and this has the advantage of being accessible independent of the client. I created an experimental patch for this, where I added a variable for the index of the current and previous statement to Diagnostics_area in sql_error.h, set this in write_record in sql_insert.cc and then a SQL function to return the value of the previous statement value in item_func.cc and item_create.cc. There is more to it than this for a complete implementation, if for no other reason so because I'm don't know the MariaDB source code that well anymore, but this is enough to prove that this works, which it does:
Records: 3 Duplicates: 0 Warnings: 0
| error_array_index() |
| 2 |
| error_array_index() |
| 3 |