30Dec/090
Ruby NYSIIS Implementation
This is another thing in my bag of stuff in my ruby core extensions. NYSIIS is a phonetic algorithm that is a little more accurate than the traditional Soundex algorithm. (Note: if you need a Soundex algorithm for ruby, look here.)
I frequently use this in my ActiveRecord and DataMapper models of people or users. I store a NYSIIS of the first and last names of all users to account for misspellings when others are searching for people.
Examples:
O'Daniel → ODANAL
O'Donnel → ODANAL
Cory → CARY
Corey → CARY
Kory → CARY
So if you were searching for me and spelled my name "Corey ODonnel", me "Cory ODaniel" would still be in your result set.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | class String def nysiis str = self.upcase str.strip! str.gsub!(/[^A-Z ]/,"") str.gsub!(/ +(JR|SR)$/,"") str.gsub!(/ +(I|V|X|L|C|D|M)+$/,"") str.gsub!(/ /,"") # 1. Translate first characters of name: # => MAC → MCC, KN → NN, K → C, PH → FF, PF → FF, SCH → SSS { /^MAC/ => "MCC", /^KN/ => "NN", /^K/ => "C", /^(PH|PF)/ => "FF", /SCH/ => "SSS" }.each do |r,s| break if str.sub!(r,s) end # 2. Translate last characters of name: # => EE → Y, IE → Y, DT, RT, RD, NT, ND → D str.sub!(/(EE|IE)$/,"Y") str.sub!(/(DT|RT|RD|NT|ND)$/,"D") # 3. First character of key = first character of name. first_char = str[0,1] str = str[1,str.length] # 4. Translate remaining characters by following rules, # incrementing by one character each time: # => EV → AF else A, E, I, O, U → A # => Q → G, Z → S, M → N # => KN → NN else K → C # => SCH → SSS, PH → FF # => H → If previous or next is nonvowel, previous. # => W → If previous is vowel, previous. (A is the only vowel left) str.gsub!(/EV/, "AF") str.gsub!(/[AEIOU]/,"A") str.gsub!(/Q/, "G") str.gsub!(/Z/, "S") str.gsub!(/M/, "N") str.gsub!(/KN/, "NN") str.gsub!(/K/, "C") str.gsub!(/SCH/, "SSS") str.gsub!(/PH/, "FF") str.gsub!(/([^AEIOU])H/, $1) if $1 str.gsub!(/(.)H[^AEIOU]/, $1) if $1 str.gsub!(/AW/, "A") # 4. CONTINUED # => Add current to key if current is not same as the last key character. str.squeeze! #everything was done in place, so squeeze out the duplicates str = first_char + str # 5. If last character is S, remove it. # 6. If last characters are AY, replace with Y. # 7. If last character is A, remove it. str.sub!(/(S|A)$/,"") str.sub!(/AY$/,"Y") return str end end |
Wanna use it?
1 2 3 | # include file from above "Cory".nysiis #=> "CARY" "O'Daniel".nysiis #=> "ODANAL" |
Yay, now you are NYSIIS. Congrats.