Hi guys, could be nice a natural sorting inside mariadb
today some guys use SQL procedures/functions to execute this cpu intensive task
php released a nice function to do this job, maybe we could port part of it and implement in udf function, and release as a default udf function?
functions:
1) natural sort, used in ORDER BY / GROUP BY
2) natural sort compare, could be used as an operator, like the "SOUNDS LIKE" operator, in other words, the function that return a canonical form of the string could be used to compare, something that could be rewrite "field NATURAL LIKE value" to "natual(field)=natural(value)"
While I understand the intention behind it and thus don't want to file a bug, I think there is a distinct possibility that some users will find the natural sort of float numbers rather unnatural.
Especially when the returned values are FLOAT, and the client aligns them as such, it is quite difficult to see the result as natural.
MariaDB [test]> createorreplacetable t (f float);
Query OK, 0 rows affected (0.055 sec)
MariaDB [test]> insertinto t (f) values (0.5), (0.33), (0.044);
Query OK, 3 rows affected (0.013 sec)
Records: 3 Duplicates: 0 Warnings: 0
MariaDB [test]> select * from t orderby natural_sort_key(f);
+-------+
| f |
+-------+
| 0.5 |
| 0.33 |
| 0.044 |
+-------+
3 rowsinset (0.001 sec)
It is slightly better when the values are actually strings, at least they are formatted as such, and with some effort one can see the logic:
MariaDB [test]> createorreplacetable t (f varchar(16));
Query OK, 0 rows affected (0.053 sec)
MariaDB [test]> insertinto t (f) values (0.5), (0.33), (0.044);
Query OK, 3 rows affected (0.012 sec)
Records: 3 Duplicates: 0 Warnings: 0
MariaDB [test]> select * from t orderby natural_sort_key(f);
+-------+
| f |
+-------+
| 0.5 |
| 0.33 |
| 0.044 |
+-------+
3 rowsinset (0.001 sec)
Same for negative numbers.
I wonder what existing best practices say about it, whether the natural sort, when applied to numbers, should consider the numeric order being natural.
Elena Stepanova
added a comment - While I understand the intention behind it and thus don't want to file a bug, I think there is a distinct possibility that some users will find the natural sort of float numbers rather unnatural.
Especially when the returned values are FLOAT, and the client aligns them as such, it is quite difficult to see the result as natural.
MariaDB [test]> create or replace table t (f float );
Query OK, 0 rows affected (0.055 sec)
MariaDB [test]> insert into t (f) values (0.5), (0.33), (0.044);
Query OK, 3 rows affected (0.013 sec)
Records: 3 Duplicates: 0 Warnings: 0
MariaDB [test]> select * from t order by natural_sort_key(f);
+ -------+
| f |
+ -------+
| 0.5 |
| 0.33 |
| 0.044 |
+ -------+
3 rows in set (0.001 sec)
It is slightly better when the values are actually strings, at least they are formatted as such, and with some effort one can see the logic:
MariaDB [test]> create or replace table t (f varchar (16));
Query OK, 0 rows affected (0.053 sec)
MariaDB [test]> insert into t (f) values (0.5), (0.33), (0.044);
Query OK, 3 rows affected (0.012 sec)
Records: 3 Duplicates: 0 Warnings: 0
MariaDB [test]> select * from t order by natural_sort_key(f);
+ -------+
| f |
+ -------+
| 0.5 |
| 0.33 |
| 0.044 |
+ -------+
3 rows in set (0.001 sec)
Same for negative numbers.
I wonder what existing best practices say about it, whether the natural sort, when applied to numbers , should consider the numeric order being natural.
It is certainly possible to design a transformation scheme for fractions, and negatives, it is just not done currently. Some of that, sorting leading zeros in predictable way, and fractions( with either dot or comma as separators) were in some earlier versions of the patch, and were taken out, for the sake of simplicity.
We currently sort like ICU - positive numbers only, no fractions, ignore leading zeros, no other customizations. We sort slightly better than ICU in one aspect, because they have some in-built limitations about how long a numeric string can be, 256 digits or something like that, I don't recall exactly. We sort slightly worse than ICU in i18n department, because we only consider 0-9 to be digits, and they handle more decimal digits from different scripts.
But there is no best practice, as far as I understand, and no common definition of natural sort, except that "we intermix strings and numbers, and expect numbers to be sorted as numbers, not ASCII".
There are a couple of aspects when people want to sort strings as numbers
i18n decimal numbers : 0-9 vs decimal numbers in non-latin scripts
fractions separators- dot vs comma vs whatever is used in some arabic scripts. Sometimes both dot and comma are used, e.g in Switzerland.
fractions - whether to support scientific notation with exponent ?
thousands separators - 20'000'000 - is this the same as 20000000? Is 20 000 000 the same as 20000000,00? Those are tricky, as they are can be locale dependent, and alternating, with comma and dot, but do not have to, and people would just separate with space.
negatives
decimal numbers - should they sort as-is ( bigger than punctuation, less that letters), or less than anything else, including punctuations, or larger than anything else? This is related to script ordering in UCA algorithm.
Vladislav Vaintroub
added a comment - - edited It is certainly possible to design a transformation scheme for fractions, and negatives, it is just not done currently. Some of that, sorting leading zeros in predictable way, and fractions( with either dot or comma as separators) were in some earlier versions of the patch, and were taken out, for the sake of simplicity.
We currently sort like ICU - positive numbers only, no fractions, ignore leading zeros, no other customizations. We sort slightly better than ICU in one aspect, because they have some in-built limitations about how long a numeric string can be, 256 digits or something like that, I don't recall exactly. We sort slightly worse than ICU in i18n department, because we only consider 0-9 to be digits, and they handle more decimal digits from different scripts.
What you can be interested in , Python's natsort has some customizations to sort, and allegedly handles fractions, and negatives , when told to handle it, via parameter https://natsort.readthedocs.io/en/master/api.html#the-ns-enum
But there is no best practice, as far as I understand, and no common definition of natural sort, except that "we intermix strings and numbers, and expect numbers to be sorted as numbers, not ASCII".
There are a couple of aspects when people want to sort strings as numbers
i18n decimal numbers : 0-9 vs decimal numbers in non-latin scripts
fractions separators- dot vs comma vs whatever is used in some arabic scripts. Sometimes both dot and comma are used, e.g in Switzerland.
fractions - whether to support scientific notation with exponent ?
thousands separators - 20'000'000 - is this the same as 20000000? Is 20 000 000 the same as 20000000,00? Those are tricky, as they are can be locale dependent, and alternating, with comma and dot, but do not have to, and people would just separate with space.
negatives
decimal numbers - should they sort as-is ( bigger than punctuation, less that letters), or less than anything else, including punctuations, or larger than anything else? This is related to script ordering in UCA algorithm.
And of course, natural sort sorts strings. Anything can be converted to string, but it does not mean people should be trying to convert numbers to string, and to natural_sort_key, to sort numbers.
Vladislav Vaintroub
added a comment - And of course, natural sort sorts strings. Anything can be converted to string, but it does not mean people should be trying to convert numbers to string, and to natural_sort_key, to sort numbers.
As far as testing is concerned, it can be pushed into 10.7 (as of preview-10.7-MDEV-4742-natural-sort cfec49f1dd7241582f03cbcd395e65a9c2ecf4ce).
Documentation in the KB needs to be updated/enhanced, but it shouldn't be an obstacle to pushing into main.
As explained earlier in the comments, some decisions remain a subject to further consideration based on users' feedback.
Elena Stepanova
added a comment - - edited As far as testing is concerned, it can be pushed into 10.7 (as of preview-10.7- MDEV-4742 -natural-sort cfec49f1dd7241582f03cbcd395e65a9c2ecf4ce).
Documentation in the KB needs to be updated/enhanced, but it shouldn't be an obstacle to pushing into main.
As explained earlier in the comments, some decisions remain a subject to further consideration based on users' feedback.
It produces a key strings based on an input string, such that sorting by the key string results in a natural sort. They key strings can be stored for re-use, making the process efficient.
I think it might have more "features" than the one you describe above: It handles negatives (though this could be improved), leading zeros, leading +, version numbers or IP addresses, decimals, and use of commas as thousand separators, and also preserves total sort ordering (i.e. no two different strings nat-sort as equivalent).
Feel free to borrow it, if you like.
Dean Trower
added a comment - - edited I wrote a nat-sort function for MariaDB/MySQL a few years ago, documented here: https://stackoverflow.com/a/58154535/999120
It produces a key strings based on an input string, such that sorting by the key string results in a natural sort. They key strings can be stored for re-use, making the process efficient.
I think it might have more "features" than the one you describe above: It handles negatives (though this could be improved), leading zeros, leading +, version numbers or IP addresses, decimals, and use of commas as thousand separators, and also preserves total sort ordering (i.e. no two different strings nat-sort as equivalent).
Feel free to borrow it, if you like.
People
Vladislav Vaintroub
roberto spadim
Votes:
2Vote for this issue
Watchers:
13Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":1199.7999997138977,"ttfb":238.09999990463257,"pageVisibility":"visible","entityId":24500,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"d374441c-ac4a-43d8-90d1-f620269a4710","navigationType":0,"readyForUser":1419.7999997138977,"redirectCount":0,"resourceLoadedEnd":1487,"resourceLoadedStart":256.59999990463257,"resourceTiming":[{"duration":113.30000019073486,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":256.59999990463257,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":256.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":369.90000009536743,"responseStart":0,"secureConnectionStart":0},{"duration":113.40000009536743,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":256.7999997138977,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":256.7999997138977,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":370.19999980926514,"responseStart":0,"secureConnectionStart":0},{"duration":177,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":257.09999990463257,"connectEnd":257.09999990463257,"connectStart":257.09999990463257,"domainLookupEnd":257.09999990463257,"domainLookupStart":257.09999990463257,"fetchStart":257.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":257.09999990463257,"responseEnd":434.09999990463257,"responseStart":434.09999990463257,"secureConnectionStart":257.09999990463257},{"duration":314.7000002861023,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":257.19999980926514,"connectEnd":257.19999980926514,"connectStart":257.19999980926514,"domainLookupEnd":257.19999980926514,"domainLookupStart":257.19999980926514,"fetchStart":257.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":257.19999980926514,"responseEnd":571.9000000953674,"responseStart":571.9000000953674,"secureConnectionStart":257.19999980926514},{"duration":320,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":257.40000009536743,"connectEnd":257.40000009536743,"connectStart":257.40000009536743,"domainLookupEnd":257.40000009536743,"domainLookupStart":257.40000009536743,"fetchStart":257.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":257.40000009536743,"responseEnd":577.4000000953674,"responseStart":577.4000000953674,"secureConnectionStart":257.40000009536743},{"duration":320.69999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":257.59999990463257,"connectEnd":257.59999990463257,"connectStart":257.59999990463257,"domainLookupEnd":257.59999990463257,"domainLookupStart":257.59999990463257,"fetchStart":257.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":257.59999990463257,"responseEnd":578.2999997138977,"responseStart":578.2999997138977,"secureConnectionStart":257.59999990463257},{"duration":320.90000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":257.7999997138977,"connectEnd":257.7999997138977,"connectStart":257.7999997138977,"domainLookupEnd":257.7999997138977,"domainLookupStart":257.7999997138977,"fetchStart":257.7999997138977,"redirectEnd":0,"redirectStart":0,"requestStart":257.7999997138977,"responseEnd":578.6999998092651,"responseStart":578.6999998092651,"secureConnectionStart":257.7999997138977},{"duration":392,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":258,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":258,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":650,"responseStart":0,"secureConnectionStart":0},{"duration":321.09999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":258.09999990463257,"connectEnd":258.09999990463257,"connectStart":258.09999990463257,"domainLookupEnd":258.09999990463257,"domainLookupStart":258.09999990463257,"fetchStart":258.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":258.09999990463257,"responseEnd":579.1999998092651,"responseStart":579.1999998092651,"secureConnectionStart":258.09999990463257},{"duration":391.7999997138977,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":258.40000009536743,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":258.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":650.1999998092651,"responseStart":0,"secureConnectionStart":0},{"duration":321.2999997138977,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":258.5,"connectEnd":258.5,"connectStart":258.5,"domainLookupEnd":258.5,"domainLookupStart":258.5,"fetchStart":258.5,"redirectEnd":0,"redirectStart":0,"requestStart":258.5,"responseEnd":579.7999997138977,"responseStart":579.7999997138977,"secureConnectionStart":258.5},{"duration":1094.7000002861023,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":267.2999997138977,"connectEnd":267.2999997138977,"connectStart":267.2999997138977,"domainLookupEnd":267.2999997138977,"domainLookupStart":267.2999997138977,"fetchStart":267.2999997138977,"redirectEnd":0,"redirectStart":0,"requestStart":267.2999997138977,"responseEnd":1362,"responseStart":1362,"secureConnectionStart":267.2999997138977},{"duration":1106.4000000953674,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":267.2999997138977,"connectEnd":267.2999997138977,"connectStart":267.2999997138977,"domainLookupEnd":267.2999997138977,"domainLookupStart":267.2999997138977,"fetchStart":267.2999997138977,"redirectEnd":0,"redirectStart":0,"requestStart":267.2999997138977,"responseEnd":1373.6999998092651,"responseStart":1373.6999998092651,"secureConnectionStart":267.2999997138977},{"duration":169.40000009536743,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":682.5999999046326,"connectEnd":682.5999999046326,"connectStart":682.5999999046326,"domainLookupEnd":682.5999999046326,"domainLookupStart":682.5999999046326,"fetchStart":682.5999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":682.5999999046326,"responseEnd":852,"responseStart":852,"secureConnectionStart":682.5999999046326},{"duration":372.09999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/e65b778d185daf5aee24936755b43da6/_/download/contextbatch/js/browser-metrics-plugin.contrib,-_super,-project.issue.navigator,-jira.view.issue,-atl.general/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":1114.9000000953674,"connectEnd":1114.9000000953674,"connectStart":1114.9000000953674,"domainLookupEnd":1114.9000000953674,"domainLookupStart":1114.9000000953674,"fetchStart":1114.9000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":1114.9000000953674,"responseEnd":1487,"responseStart":1487,"secureConnectionStart":1114.9000000953674},{"duration":596.9000000953674,"initiatorType":"script","name":"https://www.google-analytics.com/analytics.js","startTime":1192,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":1192,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1788.9000000953674,"responseStart":0,"secureConnectionStart":0}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":32,"responseStart":238,"responseEnd":260,"domLoading":255,"domInteractive":1675,"domContentLoadedEventStart":1675,"domContentLoadedEventEnd":1779,"domComplete":2287,"loadEventStart":2287,"loadEventEnd":2288,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1560.5999999046326},{"name":"bigPipe.sidebar-id.end","time":1576.2999997138977},{"name":"bigPipe.activity-panel-pipe-id.start","time":1577.4000000953674},{"name":"bigPipe.activity-panel-pipe-id.end","time":1592.0999999046326},{"name":"activityTabFullyLoaded","time":1814.0999999046326}],"measures":[],"correlationId":"9ff8436c519ac8","effectiveType":"4g","downlink":9.4,"rtt":0,"serverDuration":123,"dbReadsTimeInMs":19,"dbConnsTimeInMs":28,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
While I understand the intention behind it and thus don't want to file a bug, I think there is a distinct possibility that some users will find the natural sort of float numbers rather unnatural.
Especially when the returned values are FLOAT, and the client aligns them as such, it is quite difficult to see the result as natural.
Records: 3 Duplicates: 0 Warnings: 0
| f |
| 0.5 |
| 0.33 |
| 0.044 |
It is slightly better when the values are actually strings, at least they are formatted as such, and with some effort one can see the logic:
Records: 3 Duplicates: 0 Warnings: 0
| f |
| 0.5 |
| 0.33 |
| 0.044 |
Same for negative numbers.
I wonder what existing best practices say about it, whether the natural sort, when applied to numbers, should consider the numeric order being natural.