Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6166

Store all of LOAD DATA INFILE's warnings

    XMLWordPrintable

Details

    • Task
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • None
    • None

    Description

      When we do bulk imports, we need the full list of warnings to:

      • log the data modifications done by MariaDB
      • get statistics about the data quality.

      The data is from external sources and often has quality problems, but we want to load it into the database as far as possible with sane types, we don't want to impose strict format checks. But we want to know what got lost.

      And we need the speed of LOAD DATA INFILE.

      Of course we could try to mimic MariaDB's type and range checks etc., but that is too error prone, the target is too big and moving, so we can only do some basic checks.

      show warnings or get diagnostic can only show up to 65535 warnings, and because there can be many warnings per row, we don't know in advance, if we can load the whole file at once or just blocks of 100 lines or so (which is much slower).

      So I implemented an algorithm:-
      1) block_size = all lines of file
      2) load block_size lines
      3) if @@warning_count <= 65535 go to next block, else delete block from table, determine new, smaller, block_size using an estimation of warnings per line, goto 2)
      ...

      Why can't LOAD DATA INFILE just log the warnings into a table with a predefined structure or stop loading the file an return the file position when max_error_count warnings occured?

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              RalfNeubauer Ralf Neubauer
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.