Login Register Actian.com  

Actian Community Wiki

Navigation
Learn About
Developing With
Ingres Talk
Information
Toolbox

Soundex dm

From Ingres Community Wiki

Jump to: navigation, search

Contents

References

Design Document is available here.

Unit test is available ...

SIR Reference: 122320

Description

An implementation of the Daitch-Mokotoff Soundex algorithm (see http://www.avotaynu.com/soundex.html)

Advantages over soundex()

The normal soundex is an implementation of the Russell Soundex (patented 1917). It has two problems:

  • It returns a 4 character string.

This leads to false positives in matches on long words with the same root. For example the words 'Nichols' and 'Nicholson', both return a soundex code of N242. The Daitch-Mokotoff soundex returns a minimum of 6 characters, this allows for much longer words and hence it can easily seperate cases such as in the above example.

soundex_dm('Nichols') == 658400,648400
soundex_dm('Nicholson') == 658460,648460

The strings have no matching elements.

  • It returns only a single code for each sound pattern.

The Daitch-Mokotoff recognises that a letter combination in a word may have several sound possibilities. Hence it may return more than one code as seen in the above example.

Return

It returns one or more 6 character digits in a comma seperated list. Each element may have leading zeroes. The list is not sorted, but each element is unique.

The code may return upto 16 such elements in a varchar string.

For example: Word soundex_dm(Word) Moskowitz 645740 Peterson 739460,734600 Jackson 154600,454600,145460,445460

Rules on words

  • The words are converted to uppercase for the soundex generation.
  • Leading and embedded whitespace is ignored. This allows for multiple word surnames such as 'De Souza', which would be treated as 'desouza'.
  • With the exception of the hyphen '-', apostrophe and period '.' character, the word(s) are terminated by the first non alpha character encountered.

These exceptions allow for common punctuation encountered in many names and place names.

For example:

smyth-brown would be treated as smythbrown.
O'brien would be treated as obrien
St.Kilda would be treated as Stkilda

Note that the first occurence of any of these characters is ignored, but a subsequent occurrence would terminate the word.

Ingres Enhancement Number

  • Trak # 385
  • SIR number 122320

Notes

Examples

Sample test is available here.

Documentation

Daitch-Mokotoff Soundex Function

The SOUNDEX_DM function uses the Daitch-Mokotoff phonetic algorithm for finding similar sounding strings.

The Daitch-Mokotoff soundex function can return multiple soundex codes for a name. It returns one or more six-character codes in a comma-separated varchar string, up to a maximum of 16 codes. For example:

Name Codes Returned
Moskowitz 645740
Peterson 739460,734600
Jackson 154600,454600,145460,445460

The function syntax is:

SOUNDEX_DM('string')
Personal tools
© 2011 Actian Corporation. All Rights Reserved