Navigation
Learn About
Developing With
Ingres Talk
Information
Toolbox
Views
Soundex dm
From Ingres Community Wiki
Contents |
References
Design Document is available here.
Unit test is available ...
SIR Reference: 122320
Description
An implementation of the Daitch-Mokotoff Soundex algorithm (see http://www.avotaynu.com/soundex.html)
Advantages over soundex()
The normal soundex is an implementation of the Russell Soundex (patented 1917). It has two problems:
- It returns a 4 character string.
This leads to false positives in matches on long words with the same root. For example the words 'Nichols' and 'Nicholson', both return a soundex code of N242. The Daitch-Mokotoff soundex returns a minimum of 6 characters, this allows for much longer words and hence it can easily seperate cases such as in the above example.
- soundex_dm('Nichols') == 658400,648400
- soundex_dm('Nicholson') == 658460,648460
The strings have no matching elements.
- It returns only a single code for each sound pattern.
The Daitch-Mokotoff recognises that a letter combination in a word may have several sound possibilities. Hence it may return more than one code as seen in the above example.
Return
It returns one or more 6 character digits in a comma seperated list. Each element may have leading zeroes. The list is not sorted, but each element is unique.
The code may return upto 16 such elements in a varchar string.
For example: Word soundex_dm(Word) Moskowitz 645740 Peterson 739460,734600 Jackson 154600,454600,145460,445460
Rules on words
- The words are converted to uppercase for the soundex generation.
- Leading and embedded whitespace is ignored. This allows for multiple word surnames such as 'De Souza', which would be treated as 'desouza'.
- With the exception of the hyphen '-', apostrophe and period '.' character, the word(s) are terminated by the first non alpha character encountered.
These exceptions allow for common punctuation encountered in many names and place names.
For example:
- smyth-brown would be treated as smythbrown.
- O'brien would be treated as obrien
- St.Kilda would be treated as Stkilda
Note that the first occurence of any of these characters is ignored, but a subsequent occurrence would terminate the word.
Ingres Enhancement Number
- Trak # 385
- SIR number 122320
Notes
Examples
Sample test is available here.
Documentation
Daitch-Mokotoff Soundex Function
The SOUNDEX_DM function uses the Daitch-Mokotoff phonetic algorithm for finding similar sounding strings.
The Daitch-Mokotoff soundex function can return multiple soundex codes for a name. It returns one or more six-character codes in a comma-separated varchar string, up to a maximum of 16 codes. For example:
| Name | Codes Returned |
|---|---|
| Moskowitz | 645740 |
| Peterson | 739460,734600 |
| Jackson | 154600,454600,145460,445460 |
The function syntax is:
- SOUNDEX_DM('string')

