Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21345

Deadlocks when two transactions update the same row at the same time

    XMLWordPrintable

    Details

      Description

      Hi guys,

      I encountered a pretty weird problem that is hard to interpret. I tried to reproduce it but with no luck.

      Definitions of tables involved

      CREATE TABLE `task_instance` (
        `task_id` varchar(250) NOT NULL,
        `dag_id` varchar(250) NOT NULL,
        `execution_date` datetime(6) NOT NULL,
        `start_date` datetime(6) DEFAULT NULL,
        `end_date` datetime(6) DEFAULT NULL,
        `duration` float DEFAULT NULL,
        `state` varchar(20) DEFAULT NULL,
        `try_number` int(11) DEFAULT NULL,
        `hostname` varchar(1000) DEFAULT NULL,
        `unixname` varchar(1000) DEFAULT NULL,
        `job_id` int(11) DEFAULT NULL,
        `pool` varchar(50) DEFAULT NULL,
        `queue` varchar(50) DEFAULT NULL,
        `priority_weight` int(11) DEFAULT NULL,
        `operator` varchar(1000) DEFAULT NULL,
        `queued_dttm` datetime(6) DEFAULT NULL,
        `pid` int(11) DEFAULT NULL,
        `max_tries` int(11) DEFAULT '-1',
        PRIMARY KEY (`task_id`,`dag_id`,`execution_date`),
        KEY `ti_dag_state` (`dag_id`,`state`),
        KEY `ti_pool` (`pool`,`state`,`priority_weight`),
        KEY `ti_state_lkp` (`dag_id`,`task_id`,`execution_date`,`state`),
        KEY `ti_state` (`state`),
        KEY `ti_job_id` (`job_id`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8
       
      CREATE TABLE `dag_run` (
        `id` int(11) NOT NULL AUTO_INCREMENT,
        `dag_id` varchar(250) DEFAULT NULL,
        `execution_date` datetime(6) DEFAULT NULL,
        `state` varchar(50) DEFAULT NULL,
        `run_id` varchar(250) DEFAULT NULL,
        `external_trigger` tinyint(1) DEFAULT NULL,
        `conf` blob,
        `end_date` datetime(6) DEFAULT NULL,
        `start_date` datetime(6) DEFAULT NULL,
        PRIMARY KEY (`id`),
        UNIQUE KEY `dag_id` (`dag_id`,`execution_date`),
        UNIQUE KEY `dag_id_2` (`dag_id`,`run_id`),
        KEY `dag_id_state` (`dag_id`,`state`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8
       
      CREATE TABLE `log` (
        `id` int(11) NOT NULL AUTO_INCREMENT,
        `dttm` datetime(6) DEFAULT NULL,
        `dag_id` varchar(250) DEFAULT NULL,
        `task_id` varchar(250) DEFAULT NULL,
        `event` varchar(30) DEFAULT NULL,
        `execution_date` datetime(6) DEFAULT NULL,
        `owner` varchar(500) DEFAULT NULL,
        `extra` text,
        PRIMARY KEY (`id`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8
      

      ------------------------
      LATEST DETECTED DEADLOCK
      ------------------------
      191216  8:52:09
      *** (1) TRANSACTION:
      TRANSACTION 4AB6056BF, ACTIVE 0 sec fetching rows
      mysql tables in use 2, locked 2
      LOCK WAIT 5 lock struct(s), heap size 1248, 5 row lock(s)
      MySQL thread id 256467507, OS thread handle 0x7f973d815700, query id 14205100829 127.0.0.1 airflow Sending data
      UPDATE task_instance, dag_run SET task_instance.state=NULL WHERE task_instance.dag_id IN ('A_Dag', 'B_Dag') AND task_instance.state IN ('queued', 'scheduled') AND dag_run.dag_id = task_instance.dag_id AND dag_run.execution_date = task_instance.execution_date AND dag_run.state != 'running'
      *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
      RECORD LOCKS space id 0 page no 112745 n bits 128 index `PRIMARY` of table `airflow`.`task_instance` trx id 4AB6056BF lock_mode X locks rec but not gap waiting
      *** (2) TRANSACTION:
      TRANSACTION 4AB6056BC, ACTIVE 0 sec updating or deleting
      mysql tables in use 1, locked 1
      4 lock struct(s), heap size 1248, 2 row lock(s), undo log entries 2
      MySQL thread id 256620571, OS thread handle 0x7f973d531700, query id 14205100833 192.168.1.1 airflow Updating
      UPDATE task_instance SET start_date='2019-12-16 08:51:35.486074', state='running', try_number=1, hostname='192.168.1.1', unixname='airflow', job_id=119942, operator='PythonOperator', pid=60059 WHERE task_instance.task_id = 'X_Task' AND task_instance.dag_id = 'X_Dag' AND task_instance.execution_date = '2019-12-16 08:50:00'
      *** (2) HOLDS THE LOCK(S):
      RECORD LOCKS space id 0 page no 112745 n bits 128 index `PRIMARY` of table `airflow`.`task_instance` trx id 4AB6056BC lock_mode X locks rec but not gap
      *** (2) WAITING FOR THIS LOCK TO BE GRANTED:
      RECORD LOCKS space id 0 page no 112079 n bits 232 index `ti_state` of table `airflow`.`task_instance` trx id 4AB6056BC lock_mode X locks rec but not gap waiting
      *** WE ROLL BACK TRANSACTION (1)
      

      general.log

      191216  8:52:09
                      256467507 Query SELECT 1
                      256620571 Connect       airflow@192.168.1.1 as anonymous on airflow
                      256620571 Query set autocommit=0
                      256467507 Query UPDATE task_instance, dag_run SET task_instance.state='failed' WHERE task_instance.dag_id IN ('A_Dag', 'B_Dag') AND task_instance.state IN ('up_for_retry') AND dag_run.dag_id = task_instance.dag_id AND dag_run.execution_date = task_instance.execution_date AND dag_run.state != 'running'
                      256620571 Query SELECT 1
                      256620571 Query SELECT task_instance.try_number AS task_instance_try_number, task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, task_instance.execution_date AS task_instance_execution_date, task_instance.start_date AS task_instance_start_date, task_instance.end_date AS task_instance_end_date, task_instance.duration AS task_instance_duration, task_instance.state AS task_instance_state, task_instance.max_tries AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, task_instance.unixname AS task_instance_unixname, task_instance.job_id AS task_instance_job_id, task_instance.pool AS task_instance_pool, task_instance.queue AS task_instance_queue, task_instance.priority_weight AS task_instance_priority_weight, task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, task_instance.pid AS task_instance_pid FROM task_instance WHERE task_instance.task_id = 'X_Task' AND task_instance.dag_id = 'X_Dag' AND task_instance.execution_date = '2019-12-16 08:50:00'
                      256467507 Query commit
                      256467507 Query rollback
                      256467507 Query SELECT 1
                      256620571 Query INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES ('2019-12-16 08:51:35.491292', 'X_Dag', 'X_Task', 'running', '2019-12-16 08:50:00', 'Airflow', NULL)
                      256467507 Query UPDATE task_instance, dag_run SET task_instance.state=NULL WHERE task_instance.dag_id IN ('A_Dag', 'B_Dag') AND task_instance.state IN ('queued', 'scheduled') AND dag_run.dag_id = task_instance.dag_id AND dag_run.execution_date = task_instance.execution_date AND dag_run.state != 'running'
                      256620571 Query UPDATE task_instance SET start_date='2019-12-16 08:51:35.486074', state='running', try_number=1, hostname='192.168.1.1', unixname='airflow', job_id=119942, operator='PythonOperator', pid=60059 WHERE task_instance.task_id = 'X_Task' AND task_instance.dag_id = 'X_Dag' AND task_instance.execution_date = '2019-12-16 08:50:00'
                      256620571 Query commit
                      256620571 Query rollback
                      256620571 Quit
      

      BTW: The problem happens more frequent if task_instance is large.

      BR//Zhidong

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              marko Marko Mäkelä
              Reporter:
              0x4ec7 Zhidong
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated: