Soundex (Rexx)

From LiteratePrograms
Jump to: navigation, search
Other implementations: Rexx | Sed

This is an implementation of the SOUNDEX phonetic-matching algorithm. The goal of the algorithm is to encode names with the same pronunciation to the same value, allowing them to be compared without regard to spelling variations. It works best for English surnames — there are other algorithms (e.g., NYSIIS) and variations on SOUNDEX that work better for other names.

While true to the definition of SOUNDEX, this version is not the exact algorithm published so many years ago in Knuth or patented by Russell and Odell. That algorithm was heavily loop-based and didn't perform well in an interpreted language. This version produces the same results much more quickly, relying on some REXX-isms like Space() and Translate().

[edit] Algorithm

1. We begin by accepting the first letter of the name as it is. This is the only letter in the input that will appear as a letter in the output.

<<Take first letter>>=
Result = Left(Source, 1)

2. Next, we convert each character in the name into a numeral that groups similar-sounding letters together. The groupings are as follows. Note that the original Soundex algorithm doesn't use "0" or "7". We use "0" to mark letters that we will eventually delete but must retain for the moment as separators. Anything which must be deleted but is not a separator is marked with a "7" then immediately removed. We do this using the REXX Translate(Input, ToTable, FromTable, PadChar) function, which will replace all characters in Input that are in FromTable with the corresponding characters in ToTable, and will replace any characters not in FromTable with PadChar.

Numeral Letters
0 A, E, I, O, U, Y
1 B, F, P, V
2 C, G, J, K, Q, S, X, Z
3 D, T
4 L
5 M, N
6 R
7 H, W
<<Convert letters to numerals>>=
SoundexNum = "01230127022455012623017202"

Source = Translate(Source, SoundexNum, "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "7")

Now take out the sevens.

<<Remove all 7s>>=
Source = Space(Translate(Source, " ", "7"), 0)

3. We then replace sequences of the same numeral with a single instance followed by the requisite number of zeroes.

<<Make multiple digits single>>=
Do I = 1 to Length(Source)
   Do J = I+1 to Length(Source) While Substr(Source, I, 1) = Substr(Source, J, 1)
      Source = Left(Source, J-1) || "0" || Substr(Source, J+1)

4. Almost done, we now drop the first numeral (i.e., the first letter which we handled at the start) and remove any zeroes in the rest. The Translate(Source, " ", "0") function will return Source with any zeroes replaced with spaces, and the Space(..., 0) function will delete any spaces in that result.

<<Remove zeroes after first position>>=
Result = Result || Space(Translate(Substr(Source, 2), " ", "0"), 0)

5. Lastly, we pad the result on the right to six characters in length using zeroes. Unlike previous zeroes, these will survive into the result.

<<Fill on right with zeroes>>=
Result = Left(Result, 6, "0")

[edit] As a function

With a little code to pull this all together into a nice function, we're done:

<<Soundex Function>>=
/* ------------------------------------------------------------------------ */
/* code = Soundex(name)                                                     */
/*                                                                          */
/* Compute and return the SOUNDEX code corresponding to the specified name. */
/* ------------------------------------------------------------------------ */
Soundex: Procedure
   Source = Arg(1)

   Take first letter
   Convert letters to numerals
   Remove all 7s
   Make multiple digits single
   Remove zeroes after first position
   Fill on right with zeroes

Return Result

[edit] Main program for testing

The following main program will repeatedly prompt the user to supply a name and compute and display its SOUNDEX equivalent until cancelled by the user.

Do Forever
   Call LineOut , "Enter the name: "
   Name = LineIn()
   If Name = "" then Leave
   Say "SOUNDEX value = " || Soundex(Name)
Exit 0

Soundex Function
Download code