Cory O'Daniel – These are just words Software development, thoughts, and randomness

30Dec/090

Ruby NYSIIS Implementation

This is another thing in my bag of stuff in my ruby core extensions. NYSIIS is a phonetic algorithm that is a little more accurate than the traditional Soundex algorithm. (Note: if you need a Soundex algorithm for ruby, look here.)

I frequently use this in my ActiveRecord and DataMapper models of people or users. I store a NYSIIS of the first and last names of all users to account for misspellings when others are searching for people.

Examples:
O'Daniel → ODANAL
O'Donnel → ODANAL
Cory → CARY
Corey → CARY
Kory → CARY

So if you were searching for me and spelled my name "Corey ODonnel", me "Cory ODaniel" would still be in your result set.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
class String
  def nysiis
    str = self.upcase
    str.strip!      
    str.gsub!(/[^A-Z ]/,"")
    str.gsub!(/ +(JR|SR)$/,"")
    str.gsub!(/ +(I|V|X|L|C|D|M)+$/,"")
    str.gsub!(/ /,"")
 
    # 1. Translate first characters of name: 
    # => MAC → MCC, KN → NN, K → C, PH → FF, PF → FF, SCH → SSS        
    {
      /^MAC/      => "MCC",
      /^KN/       => "NN",
      /^K/        => "C",
      /^(PH|PF)/  => "FF",
      /SCH/       => "SSS"
    }.each do |r,s|
      break if str.sub!(r,s)
    end
 
    # 2. Translate last characters of name: 
    # => EE → Y, IE → Y, DT, RT, RD, NT, ND → D
    str.sub!(/(EE|IE)$/,"Y")
    str.sub!(/(DT|RT|RD|NT|ND)$/,"D")
 
    # 3. First character of key = first character of name.
    first_char = str[0,1]
    str = str[1,str.length]
 
    # 4. Translate remaining characters by following rules, 
    #    incrementing by one character each time:
    # => EV → AF else A, E, I, O, U → A
    # => Q → G, Z → S, M → N
    # => KN → NN else K → C
    # => SCH → SSS, PH → FF
    # => H → If previous or next is nonvowel, previous.
    # => W → If previous is vowel, previous. (A is the only vowel left)    
    str.gsub!(/EV/, "AF")
    str.gsub!(/[AEIOU]/,"A")
    str.gsub!(/Q/, "G")
    str.gsub!(/Z/, "S")
    str.gsub!(/M/, "N")
    str.gsub!(/KN/, "NN")
    str.gsub!(/K/, "C")
    str.gsub!(/SCH/, "SSS")
    str.gsub!(/PH/, "FF")
    str.gsub!(/([^AEIOU])H/, $1) if $1
    str.gsub!(/(.)H[^AEIOU]/, $1) if $1
    str.gsub!(/AW/, "A")
 
    # 4. CONTINUED
    # => Add current to key if current is not same as the last key character.
    str.squeeze!     #everything was done in place, so squeeze out the duplicates
    str = first_char + str
 
    # 5. If last character is S, remove it.
    # 6. If last characters are AY, replace with Y.
    # 7. If last character is A, remove it.
    str.sub!(/(S|A)$/,"")
    str.sub!(/AY$/,"Y")
 
    return str
  end
 
end

Wanna use it?

1
2
3
# include file from above
"Cory".nysiis #=> "CARY"
"O'Daniel".nysiis #=> "ODANAL"

Yay, now you are NYSIIS. Congrats.

Post to Twitter Post to Digg Post to Facebook Post to Reddit Post to StumbleUpon

Comments (0) Trackbacks (0)

No comments yet.


Leave a comment


No trackbacks yet.