How to check for similar records?

Hello, i want to create an Table with Business Partners. To prevent that a Business Partner is not created a second time with a different spelled name, i want to programm a validate trigger that checks the database for similar records. Does anyone have an idea how to programm this in c/side? thanx sven

Use of a name encoding algorithm called SOUNDEX is often sufficient for identifying a misspelled duplicate name entry. An exerpt from a 1999 paper by Andrew Coates entitled “Matchmaker, Mitchmoker - is that a Match?” describing the SOUNDEX algorithm follows: SOUNDEX SOUNDEX is a phonetic coding algorithm that ignores many of the unreliable components of names, but by doing so reports more matches. The rules for coding a name are (from Newcombe): The first letter of the name is used in its un-coded form to serve as the prefix character of the code. (The rest of the code is numerical.) Thereafter, W and H are ignored entirely. A, E, I, 0, U, Y are not assigned a code number, but do serve as “separators” (see Step 5). Other letters of the name are converted to a numerical equivalent: B, P, F, V 1 D, T 3 L 4 M, N 5 R 6 All other consonants C, G, J, K, Q, S, X, Z) 2 B, P, F, V 1 There are two exceptions: letters that follow prefix letters which would, if coded, have the same numerical code, are ignored in all cases unless a ‘separator’ (see Step 3) precedes them. The second letter of any pair of consonants having the same code number is likewise ignored, i.e. unless there is a ‘separator’ between them in the name. The final SOUNDEX code consists of the prefix letter plus three numerical characters. Longer codes are truncated to this length, and shorter codes are extended to it by adding zeros. Examples of names with the same SOUNDEX code are shown in Table 1. ANDERSON, ANDERSEN A536 BERGMANS, BRIGHAM B625 BIRK, BERQUE, BIRCK B620 FISHER, FISCHER F260 LAVOIE, LEVOY L100 LLWELLYN L450 Dave Studebaker das@libertyforever.com Liberty Grove Software A Navision Services Partner

The topic of soundex and duplicate adresses is a very complex one. As Dave pointed out replacing a text-string (name) by its phonetic equivalent is the best to find same names typed differently. However it does not solve the problem of abbreviations such as “General Insurance” <-> “Gen. Insurance”. I had developed a tool for finding duplicate addresses about 13 years ago in our DOS-Based contact management. During the following 10 years the rountines have been refined and been implemented at 140 customers sites. Later in 1999 I rewrote it to C/SIDE. So far there are about 10 Navision customers using my tool. I even offered it to Navision DK but they refused as they had already implemented their own (primitive) functionality in Attain Contact management. The whole functionality to check for a duplicate adress has to go over 3 steps. Dave explained step 1: The comarison of two strings (Adress1.Name <-> Adress2.Name). However, the problem goes deeper: Step 1 is a soundex function which only compares two text strings and returns a result between 0.0 and 1.0 where 1 is a complete match. Apple/Strawberry → 0.0 Meier/Mayr → 1.0 Step 2: If you need to check for duplicate adresses, you need not only to check one field (name) but several fields (name, street, Zip, Phone). The reason is simple: You might have the same name but different places and therefore legally two different addresses. Examples: Citibank, New York Citibank, Los Angeles … have a 100% match in the name but are two different addresses. Robert Smith, Penny Lane 25, 54321, Metropolis R. Smith, Penny Lane 25, 54321, Metropolis … is the same address even though the Soundex for the name would only give about 55% match. 3.) Step 2 provides a soundex for every relevant field. To get an overall soundex for all the fields compared, you need to set a weight for all fields. Such as:


AdressSoundex :=
(
  3*Soundex (Address1.Name, Address2.Name) +
  2*Soundex (Address1.Street, Address2.Street) +
  1*Soundex (Address1.ZIP, Address2.ZIP) +
  1*Soundex (Address1.Place, Address2.Place) +
  5*Soundex (Address1.Phone, Address2.Phone)
) / 12

In this example you can clearly see that a Phone number is a much better indicator for a duplicate address than the Zip-Code. After all the above routine will give you a soundex-Value between 0.0 and 1.0 for the comparison of two addresses. The last point is that the factors set to weight the soundex’ of each field have to be parametrised as they depend of the quality of adresses. Some examples: Company A is renting appartments in New York. Therefore in 99% of the cases the Place will be “New York”. The Zip-Code will be similar and the phone-numbers (if area-code is included) will be very similar as well. Besides, we have many people (= different addresses) which live in the same street. Therefore in this case you would put the greatest weight on the name. Company B is mainly working for Banks all over the States: The Name-field will often read Citibank or Chase Manhattan as these banks are main customers of Company B. In this case you would only give a very low weight to the name and a greater weight to Zip, Place and Phone. — Am I allowed to advertise at this place? Yes, my soundex routines (providing that you mean duplicate addresses) are for sale. So feel free to contact me if you need them. With best regards from Switzerland Marcus Fabian

quote:


Originally posted by fabian … Am I allowed to advertise at this place? Yes, my soundex routines (providing that you mean duplicate addresses) are for sale. So feel free to contact me if you need them. …


Hello Marcus, i’m interested in your soundex routines. please contact me at hekleru@einhell.de for the conditions.

Uwe, I would just like to point out that Marcus’ post is from October 2001 and to my knowledge, Marcus hasn’t posted here for the last couple of months (don’t know if he still visits this place from time to time…). May I suggest that you try to contact him personally by email? [;)]

quote:


Originally posted by xorph May I suggest that you try to contact him personally by email? [;)]


i already tryed to contact marcus by email a few weeks ago. but until now i get no answer.