[MDEV-25720] LOAD XML performance with indented XML / many whitespaces Created: 2021-05-18  Updated: 2021-05-18

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.2.38, 10.3.29, 10.4.19, 10.5.10, 10.6.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Julian Jacobsen Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File dummy_data.xml.gz     File dummy_data_no_whitespaces.xml.gz    

 Description   

LOAD XML performance is getting exponentially worse with data set size for XML files with many whitespaces/indentation/pretty-printing, like XML generated by mysql --xml:

CREATE TABLE `employees` (
  `emp_no` int(11) NOT NULL,
  `birth_date` date NOT NULL,
  `first_name` varchar(14) NOT NULL,
  `last_name` varchar(16) NOT NULL,
  `gender` enum('M','F') NOT NULL,
  `hire_date` date NOT NULL,
  PRIMARY KEY (`emp_no`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
 
TRUNCATE TABLE employees; LOAD XML LOCAL INFILE 'dummy_data_no_whitespaces.xml' INTO TABLE employees;
-- takes a few seconds
 
TRUNCATE TABLE employees; LOAD XML LOCAL INFILE 'dummy_data.xml' INTO TABLE employees;
-- takes > 5 minutes, maybe hours

This had already been fixed once in 2017:
https://github.com/MariaDB/server/commit/3b562dcf6e5423d41d41ef416c18187c3a946d9e
https://github.com/MariaDB/server/commit/8c7e9aab054360ec192ce3cffb2c25aa16e25f10

But seems to have been (accidentally?) reverted since.


Generated at Thu Feb 08 09:39:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.