Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33207

Killing application cannot remove DB locked session

Details

    Description

      When a DB session running an UPDATE on locked row(s), killing the corresponding application (e.g. MySQL client) does not kill/remove that session in DB. The session is gone when the innodb_lock_wait_timeout is reached.

      Scenario:
      1. Create a simple table and insert a row:
      CREATE TABLE `t1` (
      `idx` int(11) NOT NULL,
      `name` varchar(100) DEFAULT NULL,
      PRIMARY KEY (`idx`)
      ) ;
      INSERT INTO t1 VALUES (1, 'abc'); COMMIT;

      2. Run MySQL client to create session A to lock the row:
      > SET AUTOCOMMIT=OFF;
      > UPDATE t1 SET name='sess A' WHERE idx=1;

      3. Run another MySQL client to create sesssion B, update the locked row by session A:
      > SET AUTOCOMMIT=ON;
      > UPDATE t1 SET name='sess B' WHERE idx=1;

      4. Run SHOW FULL PROCESSLIST on another MySQL client, it shows session B is running running the UPDATE statement (waiting for the row to be unlocked).

      5. On the OS shell, run "kill -9 <Session B MySQL client PID>" to kill the MySQL client of session B.

      6. Run SHOW FULL PROCESSLIST on another MySQL client again, it shows session B still remains in the DB, running the UPDATE statement. Session B is gone until innodb_lock_wait_timeout is reached.

      7. If session A commited the UPDATE before session B's innodb_lock_wait_timeout is reached, the row will be updated by session B.

      I would suppose that killing application can kill the session in DB but it doesn't in the above scenario. Please advise the above behavior is normal or not.

      Thanks.

      Attachments

        Activity

          danblack Daniel Black added a comment -

          When a connected process is killed, the mariadb server doesn't receive an indication that the socket session has been removed.

          I tested this an in a strace

          1039777 <... poll resumed>)             = 1 ([{fd=47, revents=POLLIN}])
          1039777 recvfrom(47, "(\0\0\0", 4, MSG_DONTWAIT, NULL, NULL) = 4 
          1039777 recvfrom(47, "\3UPDATE t1 SET name='sess B' WHERE idx=1", 40, MSG_DONTWAIT, NULL, NULL) = 40
          

          No error on file descriptor 47 or the poll of it return any error after the process was killed.

          so normal behaviour it is.

          With tcp keep alives you should be able to detect this earlier - https://mariadb.com/kb/en/server-system-variables/#tcp_keepalive_interval

          danblack Daniel Black added a comment - When a connected process is killed, the mariadb server doesn't receive an indication that the socket session has been removed. I tested this an in a strace 1039777 <... poll resumed>) = 1 ([{fd=47, revents=POLLIN}]) 1039777 recvfrom(47, "(\0\0\0", 4, MSG_DONTWAIT, NULL, NULL) = 4 1039777 recvfrom(47, "\3UPDATE t1 SET name='sess B' WHERE idx=1", 40, MSG_DONTWAIT, NULL, NULL) = 40 No error on file descriptor 47 or the poll of it return any error after the process was killed. so normal behaviour it is. With tcp keep alives you should be able to detect this earlier - https://mariadb.com/kb/en/server-system-variables/#tcp_keepalive_interval
          wlad Vladislav Vaintroub added a comment - - edited

          I think the suspected bug is that server thread is stuck in innodb lock, while executing UPDATE, and while it is stuck, it does not read or write, or check the status of the socket.
          For example , a SLEEP() wait is implemented such that it wakes up every 5 seconds and checks a) whether connection was killed and b) THD::is_connected(), which ends up in vio_is_connected.
          (s. Interruptible_wait::wait() in item_func.cc) . If network connection is broken, SLEEP() is interrupted.

          Currently, only SLEEP() waits are aware of THD::is_connected, as far as I can see, Would be an improvement if other long-ish waits (table, user, row locks) were aware, too, so they do not hog resources, and disappear faster.

          wlad Vladislav Vaintroub added a comment - - edited I think the suspected bug is that server thread is stuck in innodb lock, while executing UPDATE, and while it is stuck, it does not read or write, or check the status of the socket. For example , a SLEEP() wait is implemented such that it wakes up every 5 seconds and checks a) whether connection was killed and b) THD::is_connected(), which ends up in vio_is_connected. (s. Interruptible_wait::wait() in item_func.cc) . If network connection is broken, SLEEP() is interrupted. Currently, only SLEEP() waits are aware of THD::is_connected, as far as I can see, Would be an improvement if other long-ish waits (table, user, row locks) were aware, too, so they do not hog resources, and disappear faster.
          danblack Daniel Black added a comment -

          Nice! There's one other use in MDL_context::acquire_lock too.

          danblack Daniel Black added a comment - Nice! There's one other use in MDL_context::acquire_lock too.
          wlad Vladislav Vaintroub added a comment - - edited

          Right. I had a vague idea of having something like poll() loop that would only wait for POLLERR (polldf.events=0, if this works) for active connections, and once error is returned, initiate KILL_CONNECTION with new kill_errno = ER_NETWORK_DISCONNECT .

          That's because "kill connection" works much better generally, to get rid of a connection. THD::killed is getting periodically checked, and KILL interrupts lock waits as well, at least usually it does usually, so converting network error to KILL might be a good idea.

          Needs something on Windows, too, There is WSAEventSelect to detect socket closure,. for named pipes I'm not sure yet. perhaps just periodically check health with PeekNamedPipe.

          wlad Vladislav Vaintroub added a comment - - edited Right. I had a vague idea of having something like poll() loop that would only wait for POLLERR (polldf.events=0, if this works) for active connections, and once error is returned, initiate KILL_CONNECTION with new kill_errno = ER_NETWORK_DISCONNECT . That's because "kill connection" works much better generally, to get rid of a connection. THD::killed is getting periodically checked, and KILL interrupts lock waits as well, at least usually it does usually, so converting network error to KILL might be a good idea. Needs something on Windows, too, There is WSAEventSelect to detect socket closure,. for named pipes I'm not sure yet. perhaps just periodically check health with PeekNamedPipe.
          danblack Daniel Black added a comment - - edited

          Yes, I like that too. Its almost sounding 1/2 way to async programming.

          danblack Daniel Black added a comment - - edited Yes, I like that too. Its almost sounding 1/2 way to async programming.

          People

            Unassigned Unassigned
            ktchow_ogcio Chow King Tak
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.