Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25720

LOAD XML performance with indented XML / many whitespaces

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • 10.2.38, 10.3.29, 10.6.0, 10.4.19, 10.5.10
    • None
    • None
    • None

    Description

      LOAD XML performance is getting exponentially worse with data set size for XML files with many whitespaces/indentation/pretty-printing, like XML generated by mysql --xml:

      CREATE TABLE `employees` (
        `emp_no` int(11) NOT NULL,
        `birth_date` date NOT NULL,
        `first_name` varchar(14) NOT NULL,
        `last_name` varchar(16) NOT NULL,
        `gender` enum('M','F') NOT NULL,
        `hire_date` date NOT NULL,
        PRIMARY KEY (`emp_no`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
       
      TRUNCATE TABLE employees; LOAD XML LOCAL INFILE 'dummy_data_no_whitespaces.xml' INTO TABLE employees;
      -- takes a few seconds
       
      TRUNCATE TABLE employees; LOAD XML LOCAL INFILE 'dummy_data.xml' INTO TABLE employees;
      -- takes > 5 minutes, maybe hours
      

      This had already been fixed once in 2017:
      https://github.com/MariaDB/server/commit/3b562dcf6e5423d41d41ef416c18187c3a946d9e
      https://github.com/MariaDB/server/commit/8c7e9aab054360ec192ce3cffb2c25aa16e25f10

      But seems to have been (accidentally?) reverted since.

      Attachments

        1. dummy_data.xml.gz
          7.13 MB
          Julian Jacobsen
        2. dummy_data_no_whitespaces.xml.gz
          7.03 MB
          Julian Jacobsen

        Activity

          People

            Unassigned Unassigned
            jj_compositiv Julian Jacobsen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.