[CONC-608] mysql_stmt_fetch() returns 1, but mysql_stmt_errno() returns 0 - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.3.1
Fix Version/s: 3.1, 3.3
Component/s: Prepared Statements
Labels:
None
Environment:
Linux

Description

It appears some SELECT ... FOR UPDATE statements sometimes fail during mysql_stmt_fetch() but no mysql_stmt_errno() is set and mysql_stmt_error() is blank. It looks like we don't need to restart the connection or anything so the connection didn't fail or similar.

I am using Percona XtraDB Cluster 5.7 (MySQL 5.7 + Galera) that is geo-distributed with a max latency between nodes of 10ms. The requests it occurs on are likely to be run on different nodes simultaneously, though I'm not sure if this is relevant to the issue or not, just a guess.

I haven't seen if I could write a specific test case to reproduce this.

The error is output from:
https://github.com/Monetra/mstdlib/blob/master/sql/mysql/mstdlib_sql_mysql.c#L864

Looking at the MariaDB Connector/C code, I can see mysql_stmt_fetch() calls out to methods->db_stmt_fetch (mthd_stmt_fetch_row) and methods->db_stmt_fetch_to_bind (mthd_stmt_fetch_to_bind).

mthd_stmt_fetch_to_bind() can only return 0 or MYSQL_DATA_TRUNCATED(101), so the problem isn't there.

mthd_stmt_fetch_row points to stmt->fetch_row_func, which appears to be either stmt_unbuffered_fetch() or stmt_cursor_fetch(). As far as I know, we aren't requesting cursors (and don't use STMT_ATTR_PREFETCH_ROWS which from what I know would create a cursor), so I'm pretty sure that means we're taking the stmt_unbuffered_fetch() path.

In stmt_unbuffered_fetch() I do see a check for pkt_len == packet_error where it will return (1) as an error but never sets an errno or error message, so I'm pretty sure this is what we are hitting.

This then means why are we getting a packet error?

Attachments

Issue Links

relates to

MDEV-35953 mysql_stmt_errno() returns 0 after an error in mysql_stmt_execute()

Closed

Activity

Ascending order - Click to sort in descending order

Georg Richter added a comment - 2022-08-09 22:17

Hi Brad,

That no error was set is a bug, but it would be more interesting to know why a packet error shows up while fetching a result set. Can you reproduce this issue? If yes, would it be possible to attach a tcp dump?

Georg Richter added a comment - 2022-08-09 22:17 Hi Brad, That no error was set is a bug, but it would be more interesting to know why a packet error shows up while fetching a result set. Can you reproduce this issue? If yes, would it be possible to attach a tcp dump?

Brad House added a comment - 2022-08-10 11:58

I'll see if I can write a test case for this and reproduce at any consistent rate. If so, I should be able to get a packet capture. Right now the issue happens extremely infrequently (less than once per day), but I've seen it happen across multiple tables that have similar usage patterns.

The gist of the usage pattern is there are multiple nodes querying a task list table periodically (30s - 1min) in a SERIALIZABLE transaction, and it claims the tasks by either updating the records or deleting the records after the "select ... for update". The task list can contain 0 or more records, but can return thousands of records during times of high load.

I've never seen this occur in any other situation, such as a "select ... for update" with a single row. Nor a multi-row select outside of a transaction.

Is it possible that in a serializable transaction with "select ... for update" that the select can be aborted before all rows are output on the server side? Basically, might it detect its resultset has already been invalidated by another transaction and stop sending records and instead send some sort of error code indicating we need to rollback?

Brad House added a comment - 2022-08-10 11:58 I'll see if I can write a test case for this and reproduce at any consistent rate. If so, I should be able to get a packet capture. Right now the issue happens extremely infrequently (less than once per day), but I've seen it happen across multiple tables that have similar usage patterns. The gist of the usage pattern is there are multiple nodes querying a task list table periodically (30s - 1min) in a SERIALIZABLE transaction, and it claims the tasks by either updating the records or deleting the records after the "select ... for update". The task list can contain 0 or more records, but can return thousands of records during times of high load. I've never seen this occur in any other situation, such as a "select ... for update" with a single row. Nor a multi-row select outside of a transaction. Is it possible that in a serializable transaction with "select ... for update" that the select can be aborted before all rows are output on the server side? Basically, might it detect its resultset has already been invalidated by another transaction and stop sending records and instead send some sort of error code indicating we need to rollback?

Georg Richter added a comment - 2025-01-28 10:56

I dont know if it fails for same reason, but it is another example where the server returns an error without error code.

Georg Richter added a comment - 2025-01-28 10:56 I dont know if it fails for same reason, but it is another example where the server returns an error without error code.

People

Assignee:: Georg Richter

Reporter:: Brad House

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2022-08-09 18:56

Updated:: 2025-01-28 10:56

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.