Details
-
Task
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
Description
Hi guys, could be nice a natural sorting inside mariadb
today some guys use SQL procedures/functions to execute this cpu intensive task
php released a nice function to do this job, maybe we could port part of it and implement in udf function, and release as a default udf function?
functions:
1) natural sort, used in ORDER BY / GROUP BY
2) natural sort compare, could be used as an operator, like the "SOUNDS LIKE" operator, in other words, the function that return a canonical form of the string could be used to compare, something that could be rewrite "field NATURAL LIKE value" to "natual(field)=natural(value)"
i didn't found (2) in internet, but it's used in php, and the (1) should use it in some place to order/group correctly
--------------
example of (2):
http://www.php.net/manual/en/function.strnatcmp.php
https://github.com/php/php-src/blob/master/ext/standard/strnatcmp.c
--------------
example of (1)
Here a example in STORED PROCEDURE/FUNCTIONS for ORDER BY, but not exaust tested:
source: http://stackoverflow.com/questions/153633/natural-sort-in-mysql
DROP FUNCTION IF EXISTS `udf_FirstNumberPos`;
|
DELIMITER ;;
|
CREATE FUNCTION `udf_FirstNumberPos` (`instring` varchar(4000))
|
RETURNS int
|
LANGUAGE SQL
|
DETERMINISTIC
|
NO SQL
|
SQL SECURITY INVOKER
|
BEGIN
|
DECLARE position int;
|
DECLARE tmp_position int;
|
SET position = 5000;
|
SET tmp_position = LOCATE('0', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
|
SET tmp_position = LOCATE('1', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
|
SET tmp_position = LOCATE('2', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
|
SET tmp_position = LOCATE('3', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
|
SET tmp_position = LOCATE('4', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
|
SET tmp_position = LOCATE('5', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
|
SET tmp_position = LOCATE('6', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
|
SET tmp_position = LOCATE('7', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
|
SET tmp_position = LOCATE('8', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
|
SET tmp_position = LOCATE('9', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
|
|
IF (position = 5000) THEN RETURN 0; END IF;
|
RETURN position;
|
END
|
;;
|
|
DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
|
DELIMITER ;;
|
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50))
|
RETURNS varchar(4000)
|
LANGUAGE SQL
|
DETERMINISTIC
|
NO SQL
|
SQL SECURITY INVOKER
|
BEGIN
|
DECLARE sortString varchar(4000);
|
DECLARE numStartIndex int;
|
DECLARE numEndIndex int;
|
DECLARE padLength int;
|
DECLARE totalPadLength int;
|
DECLARE i int;
|
DECLARE sameOrderCharsLen int;
|
|
SET totalPadLength = 0;
|
SET instring = TRIM(instring);
|
SET sortString = instring;
|
SET numStartIndex = udf_FirstNumberPos(instring);
|
SET numEndIndex = 0;
|
SET i = 1;
|
SET sameOrderCharsLen = LENGTH(sameOrderChars);
|
|
WHILE (i <= sameOrderCharsLen) DO
|
SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
|
SET i = i + 1;
|
END WHILE;
|
|
WHILE (numStartIndex <> 0) DO
|
SET numStartIndex = numStartIndex + numEndIndex;
|
SET numEndIndex = numStartIndex;
|
|
WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
|
SET numEndIndex = numEndIndex + 1;
|
END WHILE;
|
|
SET numEndIndex = numEndIndex - 1;
|
|
SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);
|
|
IF padLength < 0 THEN
|
SET padLength = 0;
|
END IF;
|
|
SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));
|
|
SET totalPadLength = totalPadLength + padLength;
|
SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, LENGTH(instring) - numEndIndex));
|
END WHILE;
|
|
RETURN sortString;
|
END
|
;;
|
Attachments
Issue Links
- causes
-
MDEV-26786 Inserting NULL into base column breaks NATURAL_SORT_KEY column
-
- Closed
-
-
MDEV-26787 Document that NATURAL_SORT_KEY virtual column should be longer than the base column (and other notes)
-
- Closed
-
-
MDEV-26796 Natural sort does not work for utf32/utf16/ucs2
-
- Closed
-
-
MDEV-26806 Server crash in Charset::charset / Item_func_natural_sort_key::val_str
-
- Closed
-
- relates to
-
MDEV-23400 Add UCA case sensitive accent sensitive collations for Unicode character sets
-
- Closed
-
- links to
While I understand the intention behind it and thus don't want to file a bug, I think there is a distinct possibility that some users will find the natural sort of float numbers rather unnatural.
Especially when the returned values are FLOAT, and the client aligns them as such, it is quite difficult to see the result as natural.
Records: 3 Duplicates: 0 Warnings: 0
| f |
| 0.5 |
| 0.33 |
| 0.044 |
It is slightly better when the values are actually strings, at least they are formatted as such, and with some effort one can see the logic:
Records: 3 Duplicates: 0 Warnings: 0
| f |
| 0.5 |
| 0.33 |
| 0.044 |
Same for negative numbers.
I wonder what existing best practices say about it, whether the natural sort, when applied to numbers, should consider the numeric order being natural.