New York State Identification and Intelligence System (Rexx)

From LiteratePrograms
Jump to: navigation, search

This is an implementation of the NYSIIS phonetic-matching algorithm. The goal of the algorithm is to encode names with the same pronunciation to the same value, allowing them to be compared without regard to spelling variations. Unlike SOUNDEX, it works well for a variety of European and Hispanic surnames.

The algorithm is taken from Robert L. Taft, "Name Search Techniques", New York State Identification and Intelligence System. It yields an alpha key which is filled or rounded to 10 characters.

[edit] Algorithm

1. Inspect the first characters of the name and replace certain sequences. Note that the K to C replacement is not performed if the KN to NN replacement is.

From To
MAC MCC
KN NN
K C
PH, PF FF
SCH SSS
<<Translate first characters of name>>=
Select
   When Left(Source, 3) = "MAC" then Source = "MCC" || Substr(Source, 4)
   When Left(Source, 3) = "SCH" then Source = "SSS" || Substr(Source, 4)
   When Left(Source, 2) = "KN" then  Source = "NN"  || Substr(Source, 3)
   When Left(Source, 2) = "PH" | ,
        Left(Source, 2) = "PF" then  Source = "FF"  || Substr(Source, 3)
   When Left(Source, 1) = "K" then   Source = "C"   || Substr(Source, 2)
   Otherwise Nop
End

2. Replace certain sequences at the end of the name.

From To
EE Y
IE Y
DT, RT, RD, NT, ND D
<<Translate last characters of name>>=
Ending = Right(Source, 2)
If Ending = "EE" | Ending = "IE" then ,
   Source = Left(Source, Length(Source)-2) || "Y"
If Ending = "DT" | Ending = "RT" | Ending = "RD" | Ending = "NT" | ,
      Ending = "ND" then ,
   Source = Left(Source, Length(Source)-2) || "D"

3. Begin accumulating the result by accepting the first letter of the name as it is.

<<Accept first letter>>=
Result = Left(Source, 1)

4. Translate the remaining letters according to a set of rules, processing one letter at a time:

<<Translate remaining characters by rules>>=
Do Cursor = 2 to Length(Source)
   Apply rules to one letter group
   If Char <> Right(Result, 1) then ,
      Result = Result || Char
End
<<Define constants>>=
Vowels = "AEIOU"
<<Apply rules to one letter group>>=
Chars = Substr(Source, Cursor, 3)
Char = Left(Chars, 1)
Select

a. Replace EV with AF and replace all other A, E, I, O, U with A.

<<Apply rules to one letter group>>=
   When Left(Chars, 2) = "EV" then  Char = "AF"
   When Pos(Char, Vowels) > 0 then  Char = "A"

b. Replace Q with G, Z with S, and M with N.

<<Apply rules to one letter group>>=
   When Char = "Q" then             Char = "G"
   When Char = "Z" then             Char = "S"
   When Char = "M" then             Char = "N"

c. Replace KN with N and replace all other K with C.

<<Apply rules to one letter group>>=
   When Left(Chars, 2) = "KN" then  Char = "N"
   When Char = "K" then             Char = "C"

d. Replace SCH with SSS and PH with FF.

<<Apply rules to one letter group>>=
   When Left(Chars, 3) = "SCH" then Char = "SSS"
   When Left(Chars, 2) = "PH" then  Char = "FF"

e. Replace H with the previous letter if the previous or next letter is not a vowel.

<<Apply rules to one letter group>>=
   When Char = "H" then Do
      If Find(Substr(Source, Cursor-1, 1), Vowels) = 0 then ,
        Char = Substr(Source, Cursor-1, 1)
      Else If Find(Substr(Chars, 2, 1), Vowels) = 0 then ,
         Char = Substr(Chars, 2, 1)
   End

f. Replace W with the previous letter if the it is a vowel.

<<Apply rules to one letter group>>=
   When Cursor = "W" then ,
      If Find(Substr(Source, Cursor-1, 1),Vowels) > 0 then ,
        Char = Substr(Source, Cursor-1, 1)

g. Append the resulting letter to the key if differs from the last letter.

<<Apply rules to one letter group>>=
   Otherwise Nop
End /* Select */
Source = Left(Source, Cursor-1) || Char || Substr(Source, Cursor+1)

5. If the last letter is S, remove it.

<<Remove a trailing S>>=
If Right(Result, 1) = "S" then ,
   Result = Left(Result, Length(Result)-1)

6. If the last letters are AY, replace them with Y.

<<Replace trailing AY with Y>>=
If Right(Result, 2) = "AY" then ,
   Result = Left(Result, Length(Result)-2) || "Y"

7. If the last letter is A, remove it.

<<Remove trailing A>>=
If Right(Result, 1) = "A" then ,
   Result = Left(Result, Length(Result)-1)

8. Lastly, truncate the result to 10 characters in length.

<<Truncate to 10 characters>>=
Result = Left(Result, 10)

[edit] As a function

With a little code to pull this all together into a nice function, we're done:

<<NYSIIS Function>>=
/* ---------------------------------------------------------------------- */
/* code = NYSIIS(name)                                                    */
/*                                                                        */
/* Compute and return the NYSIIS code corresponding to the specfied name. */
/* ---------------------------------------------------------------------- */
NYSIIS: Procedure
   Source = Arg(1)

   Define constants
   Translate first characters of name
   Translate last characters of name
   Accept first letter
   Translate remaining characters by rules
   Remove a trailing S
   Replace trailing AY with Y
   Remove trailing A
   Truncate to 10 characters

Return Result

[edit] Main program for testing

The following main program will repeatedly prompt the user to supply a name and compute and display its NYSIIS equivalent until presented with an empty line.

<<nysiis_test.rexx>>=
Do Forever
   Call LineOut , "Enter the name: "
   Name = LineIn()
   If Name = "" then Leave
   Say "NYSIIS value = " || NYSIIS(Name)
End
Exit 0

NYSIIS Function
Download code
hijacker
hijacker
hijacker
hijacker