When we do bulk imports, we need the full list of warnings to:
log the data modifications done by MariaDB
get statistics about the data quality.
The data is from external sources and often has quality problems, but we want to load it into the database as far as possible with sane types, we don't want to impose strict format checks. But we want to know what got lost.
And we need the speed of LOAD DATA INFILE.
Of course we could try to mimic MariaDB's type and range checks etc., but that is too error prone, the target is too big and moving, so we can only do some basic checks.
show warnings or get diagnostic can only show up to 65535 warnings, and because there can be many warnings per row, we don't know in advance, if we can load the whole file at once or just blocks of 100 lines or so (which is much slower).
So I implemented an algorithm:-
1) block_size = all lines of file
2) load block_size lines
3) if @@warning_count <= 65535 go to next block, else delete block from table, determine new, smaller, block_size using an estimation of warnings per line, goto 2)
...
Why can't LOAD DATA INFILE just log the warnings into a table with a predefined structure or stop loading the file an return the file position when max_error_count warnings occured?
There is another thing: We parse the messages for information like affected columns, line numbers, and incriminating values, the rest of the messages (possibly together with the column) is nice as some kind of message class for counting similar messages. Of course this depends on the message language... And not every message contains affected column and line number.
Even with GET DIAGNOSTICS one has to parse the message, and it seems to be too slow to query millions of messages.
Ralf Neubauer
added a comment - There is another thing: We parse the messages for information like affected columns, line numbers, and incriminating values, the rest of the messages (possibly together with the column) is nice as some kind of message class for counting similar messages. Of course this depends on the message language... And not every message contains affected column and line number.
Even with GET DIAGNOSTICS one has to parse the message, and it seems to be too slow to query millions of messages.
Hi, the issue seems interesting, I would like to work on it, please suggest me on how to get started with it, I have build the server code and have tried to implement the LOAD DATA INFILE statement for imports to mimic the issue.
Debjyoti Ghosh
added a comment - Hi, the issue seems interesting, I would like to work on it, please suggest me on how to get started with it, I have build the server code and have tried to implement the LOAD DATA INFILE statement for imports to mimic the issue.
Storing the warnings is not a problem. The problem is actually telling the client the warning count. This is done using two bytes in the protocol layer. Increasing this cause compatibility issues. Changing this will require some thought and design work.
Andrew Hutchings (Inactive)
added a comment - Storing the warnings is not a problem. The problem is actually telling the client the warning count. This is done using two bytes in the protocol layer. Increasing this cause compatibility issues. Changing this will require some thought and design work.
I partly understand this, but if it would be possible to set max_error_count to a value above 65535, it would – as it always does – just change the behaviour of SHOW WARNINGS, SHOW ERRORS and SHOW COUNT WARNINGS – the two bytes in the protocol could just be capped to 65535 as they always have been – and they can already differ from SHOW COUNT WARNINGS depending on the value of error_count (just in the other direction). warning_count could behave either way, it is not limited to two bytes, I assume.
If a new way of returning the warnings would be introduced, like a table in INFORMATION_SCHEMA or a table you have to create yourself whose name you can set (e.g. LOAD DATA LOCAL INFILE 'file_name' INTO TABLE tbl_name LOG ERRORS INTO TABLE warning_tbl_name or with a session variable error_log_table) then that would be completely independent of said bytes in the protocol and warning_count – the only decision to make would be if SHOW WARNINGS etc. would return an empty result if you redirected the warnings and errors to an explicitly set table.
Mit freundlichen Grüßen
Ralf Neubauer
Referent
Forschungsbereich Arzneimittelinformationssysteme und Analysen
Wissenschaftliches Institut der AOK
AOK-Bundesverband
AOK-Bundesverband
Rosenthaler Straße 31
10178 Berlin
www.wido.de
AOK - Die Gesundheitskasse. Gesundheit in besten Händen.
Ralf Neubauer
added a comment - Hi,
I partly understand this, but if it would be possible to set max_error_count to a value above 65535, it would – as it always does – just change the behaviour of SHOW WARNINGS, SHOW ERRORS and SHOW COUNT WARNINGS – the two bytes in the protocol could just be capped to 65535 as they always have been – and they can already differ from SHOW COUNT WARNINGS depending on the value of error_count (just in the other direction). warning_count could behave either way, it is not limited to two bytes, I assume.
If a new way of returning the warnings would be introduced, like a table in INFORMATION_SCHEMA or a table you have to create yourself whose name you can set (e.g. LOAD DATA LOCAL INFILE 'file_name' INTO TABLE tbl_name LOG ERRORS INTO TABLE warning_tbl_name or with a session variable error_log_table) then that would be completely independent of said bytes in the protocol and warning_count – the only decision to make would be if SHOW WARNINGS etc. would return an empty result if you redirected the warnings and errors to an explicitly set table.
Mit freundlichen Grüßen
Ralf Neubauer
Referent
Forschungsbereich Arzneimittelinformationssysteme und Analysen
Wissenschaftliches Institut der AOK
AOK-Bundesverband
Tel.: 030 34646-2583
Fax: 030 34646-332583
E-Mail: ralf.neubauer@wido.bv.aok.de
AOK-Bundesverband
Rosenthaler Straße 31
10178 Berlin
www.wido.de
AOK - Die Gesundheitskasse. Gesundheit in besten Händen.
People
Unassigned
Ralf Neubauer
Votes:
1Vote for this issue
Watchers:
5Start watching this issue
Dates
Created:
Updated:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":892.3999996185303,"ttfb":207.10000038146973,"pageVisibility":"visible","entityId":36012,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":32,"apdex":1,"journeyId":"28c8e385-79df-4115-a987-04ca94cd6d87","navigationType":0,"readyForUser":988.3999996185303,"redirectCount":0,"resourceLoadedEnd":1025.6999998092651,"resourceLoadedStart":213.39999961853027,"resourceTiming":[{"duration":208.20000076293945,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":213.39999961853027,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":213.39999961853027,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":421.6000003814697,"responseStart":0,"secureConnectionStart":0},{"duration":208.30000019073486,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/2bf333562ca6724060a9d5f1535471f6/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":213.69999980926514,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":213.69999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":422,"responseStart":0,"secureConnectionStart":0},{"duration":216.80000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":213.80000019073486,"connectEnd":213.80000019073486,"connectStart":213.80000019073486,"domainLookupEnd":213.80000019073486,"domainLookupStart":213.80000019073486,"fetchStart":213.80000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":213.80000019073486,"responseEnd":430.6000003814697,"responseStart":430.6000003814697,"secureConnectionStart":213.80000019073486},{"duration":297.3999996185303,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/2bf333562ca6724060a9d5f1535471f6/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true","startTime":214,"connectEnd":214,"connectStart":214,"domainLookupEnd":214,"domainLookupStart":214,"fetchStart":214,"redirectEnd":0,"redirectStart":0,"requestStart":214,"responseEnd":511.3999996185303,"responseStart":511.3999996185303,"secureConnectionStart":214},{"duration":301.30000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":214.30000019073486,"connectEnd":214.30000019073486,"connectStart":214.30000019073486,"domainLookupEnd":214.30000019073486,"domainLookupStart":214.30000019073486,"fetchStart":214.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":214.30000019073486,"responseEnd":515.6000003814697,"responseStart":515.6000003814697,"secureConnectionStart":214.30000019073486},{"duration":301.6000003814697,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":214.39999961853027,"connectEnd":214.39999961853027,"connectStart":214.39999961853027,"domainLookupEnd":214.39999961853027,"domainLookupStart":214.39999961853027,"fetchStart":214.39999961853027,"redirectEnd":0,"redirectStart":0,"requestStart":214.39999961853027,"responseEnd":516,"responseStart":516,"secureConnectionStart":214.39999961853027},{"duration":301.79999923706055,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":214.60000038146973,"connectEnd":214.60000038146973,"connectStart":214.60000038146973,"domainLookupEnd":214.60000038146973,"domainLookupStart":214.60000038146973,"fetchStart":214.60000038146973,"redirectEnd":0,"redirectStart":0,"requestStart":214.60000038146973,"responseEnd":516.3999996185303,"responseStart":516.3999996185303,"secureConnectionStart":214.60000038146973},{"duration":358,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":214.89999961853027,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":214.89999961853027,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":572.8999996185303,"responseStart":0,"secureConnectionStart":0},{"duration":301.80000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":215,"connectEnd":215,"connectStart":215,"domainLookupEnd":215,"domainLookupStart":215,"fetchStart":215,"redirectEnd":0,"redirectStart":0,"requestStart":215,"responseEnd":516.8000001907349,"responseStart":516.8000001907349,"secureConnectionStart":215},{"duration":357.9000005722046,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":215.19999980926514,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":215.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":573.1000003814697,"responseStart":0,"secureConnectionStart":0},{"duration":302,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":215.39999961853027,"connectEnd":215.39999961853027,"connectStart":215.39999961853027,"domainLookupEnd":215.39999961853027,"domainLookupStart":215.39999961853027,"fetchStart":215.39999961853027,"redirectEnd":0,"redirectStart":0,"requestStart":215.39999961853027,"responseEnd":517.3999996185303,"responseStart":517.3999996185303,"secureConnectionStart":215.39999961853027},{"duration":789.8000001907349,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":221.19999980926514,"connectEnd":221.19999980926514,"connectStart":221.19999980926514,"domainLookupEnd":221.19999980926514,"domainLookupStart":221.19999980926514,"fetchStart":221.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":221.19999980926514,"responseEnd":1011,"responseStart":1011,"secureConnectionStart":221.19999980926514},{"duration":790,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":221.39999961853027,"connectEnd":221.39999961853027,"connectStart":221.39999961853027,"domainLookupEnd":221.39999961853027,"domainLookupStart":221.39999961853027,"fetchStart":221.39999961853027,"redirectEnd":0,"redirectStart":0,"requestStart":221.39999961853027,"responseEnd":1011.3999996185303,"responseStart":1011.3999996185303,"secureConnectionStart":221.39999961853027},{"duration":95.69999980926514,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":598.3000001907349,"connectEnd":598.3000001907349,"connectStart":598.3000001907349,"domainLookupEnd":598.3000001907349,"domainLookupStart":598.3000001907349,"fetchStart":598.3000001907349,"redirectEnd":0,"redirectStart":0,"requestStart":598.3000001907349,"responseEnd":694,"responseStart":694,"secureConnectionStart":598.3000001907349},{"duration":180.69999980926514,"initiatorType":"link","name":"https://jira.mariadb.org/s/50bc9be5bfead1a25e72c1a9338c94f6-CDN/lu2cib/820016/12ta74/e108c7645258ccb43280ed3404e3e949/_/download/contextbatch/css/com.atlassian.jira.plugins.jira-development-integration-plugin:0,-_super,-jira.view.issue,-jira.global,-jira.general,-jira.browse.project,-project.issue.navigator,-atl.general/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":844.8000001907349,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":844.8000001907349,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1025.5,"responseStart":0,"secureConnectionStart":0},{"duration":180.30000019073486,"initiatorType":"link","name":"https://jira.mariadb.org/s/d5715adaadd168a9002b108b2b039b50-CDN/lu2cib/820016/12ta74/be4b45e9cec53099498fa61c8b7acba4/_/download/contextbatch/css/jira.project.sidebar,-_super,-project.issue.navigator,-jira.general,-jira.browse.project,-jira.view.issue,-jira.global,-atl.general,-com.atlassian.jira.projects.sidebar.init/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":845.3999996185303,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":845.3999996185303,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1025.6999998092651,"responseStart":0,"secureConnectionStart":0},{"duration":166.19999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/e65b778d185daf5aee24936755b43da6/_/download/contextbatch/js/browser-metrics-plugin.contrib,-_super,-project.issue.navigator,-jira.view.issue,-atl.general/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":846.1999998092651,"connectEnd":846.1999998092651,"connectStart":846.1999998092651,"domainLookupEnd":846.1999998092651,"domainLookupStart":846.1999998092651,"fetchStart":846.1999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":846.1999998092651,"responseEnd":1012.3999996185303,"responseStart":1012.3999996185303,"secureConnectionStart":846.1999998092651},{"duration":166.89999961853027,"initiatorType":"script","name":"https://jira.mariadb.org/s/e0bf5781d46ea69fb123572974cf39de-CDN/lu2cib/820016/12ta74/e108c7645258ccb43280ed3404e3e949/_/download/contextbatch/js/com.atlassian.jira.plugins.jira-development-integration-plugin:0,-_super,-jira.view.issue,-jira.global,-jira.general,-jira.browse.project,-project.issue.navigator,-atl.general/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true","startTime":846.6000003814697,"connectEnd":846.6000003814697,"connectStart":846.6000003814697,"domainLookupEnd":846.6000003814697,"domainLookupStart":846.6000003814697,"fetchStart":846.6000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":846.6000003814697,"responseEnd":1013.5,"responseStart":1013.5,"secureConnectionStart":846.6000003814697},{"duration":174.30000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/s/097ae97cb8fbec7d6ea4bbb1f26955b9-CDN/lu2cib/820016/12ta74/be4b45e9cec53099498fa61c8b7acba4/_/download/contextbatch/js/jira.project.sidebar,-_super,-project.issue.navigator,-jira.general,-jira.browse.project,-jira.view.issue,-jira.global,-atl.general,-com.atlassian.jira.projects.sidebar.init/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true","startTime":846.8000001907349,"connectEnd":846.8000001907349,"connectStart":846.8000001907349,"domainLookupEnd":846.8000001907349,"domainLookupStart":846.8000001907349,"fetchStart":846.8000001907349,"redirectEnd":0,"redirectStart":0,"requestStart":846.8000001907349,"responseEnd":1021.1000003814697,"responseStart":1021.1000003814697,"secureConnectionStart":846.8000001907349},{"duration":254.10000038146973,"initiatorType":"script","name":"https://www.google-analytics.com/analytics.js","startTime":885.1999998092651,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":885.1999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1139.3000001907349,"responseStart":0,"secureConnectionStart":0}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":60,"responseStart":207,"responseEnd":218,"domLoading":211,"domInteractive":1083,"domContentLoadedEventStart":1083,"domContentLoadedEventEnd":1134,"domComplete":1859,"loadEventStart":1859,"loadEventEnd":1862,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1047.8999996185303},{"name":"bigPipe.sidebar-id.end","time":1048.6999998092651},{"name":"bigPipe.activity-panel-pipe-id.start","time":1048.8000001907349},{"name":"bigPipe.activity-panel-pipe-id.end","time":1050.1999998092651},{"name":"activityTabFullyLoaded","time":1153.6000003814697}],"measures":[],"correlationId":"194616b9e71c68","effectiveType":"4g","downlink":10,"rtt":0,"serverDuration":94,"dbReadsTimeInMs":11,"dbConnsTimeInMs":19,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
There is another thing: We parse the messages for information like affected columns, line numbers, and incriminating values, the rest of the messages (possibly together with the column) is nice as some kind of message class for counting similar messages. Of course this depends on the message language... And not every message contains affected column and line number.
Even with GET DIAGNOSTICS one has to parse the message, and it seems to be too slow to query millions of messages.