[MDEV-25678] SOUNDEX() Algorithm Returns Incorrect Results for Letters With The Same Soundex Code and Vowel Separators Created: 2021-05-14  Updated: 2021-05-20  Resolved: 2021-05-20

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Venkata Sai Dhakshesh Kolli Assignee: Sergei Golubchik
Resolution: Not a Bug Votes: 0
Labels: soundex


 Description   

MariaDB's SOUNDEX() algorithm returns incorrect results when used on character sequences which have only one consonant soundex coding separated by vowels and a starting letter with the same coding as the consonant:

Ex. The proper result for SOUNDEX("Popoff") is P110: Popoff-> P01011 -> P110 (since zeros are dropped and side by side same letters are reduced). Instead, your system returns P000, which is the same result as SOUNDEX("P"), which is clearly ridiculous.

I have not seen any proper Soundex() which operates in the way yours currently does.



 Comments   
Comment by Sergei Golubchik [ 2021-05-20 ]

This is not a bug.
https://mariadb.com/kb/en/soundex/

  • This function implements the original Soundex algorithm, not the more popular enhanced version (also described by D. Knuth). The difference is that original version discards vowels first and duplicates second, whereas the enhanced version discards duplicates first and vowels second.

Also https://en.wikipedia.org/wiki/Soundex describes the difference between the two SOUNDEX variants.

Generated at Thu Feb 08 09:39:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.