Fuzzy String Matching in Excel and Access: Longest Common Subsequence

You’ve already seen a post on fuzzy string matching. As is intuitively clear, there is no right or wrong solution for this problem, because there is no universal definition on how to account for string similarity.

Computer scientists often speak of subsequences of a sequence: take a subset of all indices of a sequence, sort them in ascending order and map to corresponding elements of the sequence – you got yourself a subsequence (if you qualify this further – proper subset of indices would mean proper subsequence). How can we use this in name matching, (for example)?

Suppose we have two cells in Excel, one with value ‘John’ and one with ‘Joohan’ (two erroneous insertions). The longest common subsequence is then ‘John’ and has length 4. This length equals to the length of first sequence but not the second, so it’s not a perfect match. One possible way to account for disparity is to divide the length of the common subsequence by the length of the first sequence and add to this ratio the same ratio for the second sequence. So, in case of perfect match, both ratios are 1 and the resultant metric is 2. We need to normalize the metric so that, in case of perfect match, the metric is 1. Straightforward way to normalize is to weigh each component with respect to its total contribution, i.e. total length of both sequences.  The following manipulations show what the algorithm reduces to:

lgs / len1 * len1 / (len1 + len2) + lgs / len2 * len2 / (len1 + len2) = 2 * lgs / (len1 + len2)

The algorithm below is directly from Excel’s VBA editor but could be used in Access (as usual):

' ======================
' Provides length of longest common subsequence
'
' Parameters:
' s1, s2 - sequences to be compared
'
' ======================
Public Function getLongestCommonSubsequenceLength(ByVal s1 As String, ByVal s2 As String) As Integer
	Dim m As Integer
	Dim n As Integer
	m = Len(s1)
	n = Len(s2)
	Dim b(0 To 100, 0 To 100) As Integer
	Dim c(0 To 100, 0 To 100) As Integer
	Dim i As Integer
	Dim j As Integer
	For i = 1 To m
		c(i, 0) = 0
	Next i
	For j = 1 To n
		c(0, j) = 0
	Next j
	For i = 1 To m
		For j = 1 To n
		  If Mid(s1, i, 1) = Mid(s2, j, 1) Then
			 c(i, j) = c(i - 1, j - 1) + 1
			 b(i, j) = 1

		  ElseIf c(i - 1, j) >= c(i, j - 1) Then
			 c(i, j) = c(i - 1, j)
			 b(i, j) = 2
		  Else
			 c(i, j) = c(i, j - 1)
			 b(i, j) = 3
		  End If
		Next j
	Next i
'
	getLongestCommonSubsequenceLength = c(m, n)
End Function
' ======================
' Provides fuzzy comparison score for string matching
' using longest common subsequence
'
' Parameters:
' s1, s2 - sequences to be compared
' ======================
Public Function compareTwoSequences(ByVal s1 As String, ByVal s2 As String) As Double
	Dim l1 As Integer
	Dim l2 As Integer
	Dim l3 As Integer
	l1 = Len(s1)
	l2 = Len(s2)
'	
	l3 = getLongestCommonSubsequenceLength(s1, s2)
'	
'	
	compareTwoSequences = 2 * l3 / (l1 + l2)
End Function

LongestCommonSubsequence.bas
Advertisements

~ by Monsi.Terdex on August 29, 2013.

4 Responses to “Fuzzy String Matching in Excel and Access: Longest Common Subsequence”

  1. Monsi.Terdex, thank you very much for posting this and also your other post on the other fuzzy compare algorithm.

    Your code worked great.

    We combined your two algorithms into one function and utlized it to do fuzzy matching on data for one of our clients with close to 100% accuracy.

    Public Function CombinedFuzzyCompare(s1 As String, s2 As String) As Double
    Dim FuzzyCompareScore As Double
    Dim LongestCommonSubstringScore As Double

    FuzzyCompareScore = FuzzyCompare(s1, s2)
    LongestCommonSubstringScore = compareTwoSequences(s1, s2)

    CombinedFuzzyCompare = (FuzzyCompareScore + LongestCommonSubstringScore) / 2
    End Function

    Thanks again, and keep up the great posts.

    • Stephan,

      Thanks for such a warm praise — indeed nice to hear that my solutions work. I will soon post about clustering address records using fuzzy matching algorithm (same as the one you refer to below).

      Monsi

      • Hello Thanks for great post. These are really helping me.

        I have a standard list of product/services. It’s a list of Federal contract product/services list. For Federal contracts there are several vendors/suppliers who are giving these product/services. Now the problem is I am trying to match the product/services for vendor product/services available from the website of the vendor.
        Eg: Vendor Name : Advantech
        Services which they are offering downloaded from internet.

        Economic/Business Analysis
        Integrated Financial Planning and Management – all phases PPBE
        Acquisition and Contract Strategy and Support
        Scheduling and Cost Management
        Systems/Software Planning, Research and Development
        Testing and Evaluation Services
        Technical Documentation
        Production and Installation Management
        Enterprise Architecture and Integration
        Network Operations and Security
        Database Management, Engineering and Mining
        Configuration Management
        Technical Support Services
        Training Services and Support
        Financial Management
        Life Cycle Services
        Systems Engineering
        Rapid Protoying
        Program Integration
        Decision Analytics
        C4ISR
        Risk Management
        Enterprise Architecture and System Integration
        Environmental Engineering
        Construction Management
        Engineering Support Services
        Program Management
        Project Management
        Information Security Audits, Testing and Assessments
        Information Assurance and Accreditation
        Cyber Software Integration
        Cyber Analysis and Operations
        Cyberwarfare Support
        Cryptology
        Threat Analysis and Vulnerability Assessment
        Integrated Remediation
        Network Security Design and Architecture
        Intelligence Training Support
        Unmanned Aerial Surveillance Operations
        Systems Engineering
        Information Operations
        Command Center Staffing

        I need to match the services with the standard product/services master list.
        FPDS contract product/services master list.

        ADP SUPPORT EQUIPMENT
        ADPE SYSTEM CONFIGURATION
        MINI AND MICRO COMPUTER CONTROL DEVICES
        PUNCHED CARD EQUIPMENT
        BEARINGS, ANTIFRICTION, UNMOUNTED
        RESIDENTIAL BUILDINGS
        RESTORATION
        UTILITIES
        WAREHOUSE BUILDINGS
        ADP CENTRAL PROCESSING UNIT (CPU, COMPUTER), ANALOG
        ARCHITECT AND ENGINEERING- GENERAL: LANDSCAPING, INTERIOR LAYOUT, AND DESIGNING
        ARCHITECT AND ENGINEERING- GENERAL: MANAGEMENT ENGINEERING
        ARCHITECT AND ENGINEERING- GENERAL: MARINE ENGINEERING
        ARCHITECT AND ENGINEERING- GENERAL: MECHANICAL SYSTEMS
        ARCHITECT AND ENGINEERING- GENERAL: OTHER
        ARCHITECT AND ENGINEERING- GENERAL: PLUMBING SYSTEMS
        ARCHITECT AND ENGINEERING- GENERAL: PRODUCTION ENGINEERING
        ARCHITECT AND ENGINEERING- GENERAL: STRUCTURAL ENGINEERING
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER CONSERVATION AND DEVELOPMENT FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER EDUCATIONAL BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER HOSPITAL BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER INDUSTRIAL BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER NON-BUILDING FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER RESIDENTIAL BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER UTILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER WAREHOUSE BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: PARKING FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: PENAL FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: EPG FACILITIES – SOLAR
        ARCHITECT AND ENGINEERING- CONSTRUCTION: EPG FACILITIES – WIND
        ARCHITECT AND ENGINEERING- CONSTRUCTION: EXHIBIT DESIGN (NON-BUILDING)
        ARCHITECT AND ENGINEERING- CONSTRUCTION: FAMILY HOUSING FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: FOOD OR GRAIN STORAGE BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: FUEL STORAGE BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: FUEL SUPPLY FACILITIES
        ADP SOFTWARE
        IT AND TELECOM- ANNUAL SOFTWARE MAINTENANCE SERVICE PLANS
        IT AND TELECOM- INTEGRATED HARDWARE/SOFTWARE/SERVICES SOLUTIONS, PREDOMINANTLY SERVICES
        INSTALLATION OF EQUIPMENT- ADP EQUIPMENT/SOFTWARE/SUPPLIES/SUPPORT EQUIPMENT
        LEASE OR RENTAL OF EQUIPMENT- ADP EQUIPMENT/SOFTWARE/SUPPLIES/SUPPORT EQUIPMENT
        MAINT/REPAIR/REBUILD OF EQUIPMENT- ADP EQUIPMENT/SOFTWARE/SUPPLIES/SUPPORT EQUIPMENT
        MODIFICATION OF EQUIPMENT- ADP EQUIPMENT/SOFTWARE/SUPPLIES/SUPPORT EQUIPMENT
        EQUIPMENT AND MATERIALS TESTING- ADP EQUIPMENT/SOFTWARE/SUPPLIES/SUPPORT EQUIPMENT
        INSPECTION- ADP EQUIPMENT/SOFTWARE/SUPPLIES/SUPPORT EQUIPMENT
        OTHER QC/TEST/INSPECT- ADP EQUIPMENT/SOFTWARE/SUPPLIES/SUPPORT EQUIPMENT
        QUALITY CONTROL- ADP EQUIPMENT/SOFTWARE/SUPPLIES/SUPPORT EQUIPMENT
        ARCHITECT AND ENGINEERING- CONSTRUCTION: AIR TRAFFIC CONTROL TOWERS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: AIRPORT RUNWAYS AND TAXIWAYS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: AIRPORT SERVICE ROADS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: AIRPORT TERMINALS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: AMMUNITION FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: AMMUNITION STORAGE BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: CANALS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: CONFERENCE SPACE AND FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: DAMS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: DINING FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: DREDGING FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: ELECTRONIC AND COMMUNICATIONS FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: EPG FACILITIES – COAL
        ARCHITECT AND ENGINEERING- CONSTRUCTION: EPG FACILITIES – GAS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: EPG FACILITIES – GEOTHERMAL
        ARCHITECT AND ENGINEERING- CONSTRUCTION: EPG FACILITIES – HYDRO
        ARCHITECT AND ENGINEERING- CONSTRUCTION: EPG FACILITIES – OTHER, INCLUDING TRANSMISSION
        ARCHITECT AND ENGINEERING- CONSTRUCTION: EPG FACILITIES – SOLAR
        ARCHITECT AND ENGINEERING- CONSTRUCTION: EPG FACILITIES – WIND
        ARCHITECT AND ENGINEERING- CONSTRUCTION: EXHIBIT DESIGN (NON-BUILDING)
        ARCHITECT AND ENGINEERING- CONSTRUCTION: FAMILY HOUSING FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: FOOD OR GRAIN STORAGE BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: FUEL STORAGE BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: FUEL SUPPLY FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: GOVERNMENT-OWNED CONTRACTOR-OPERATED (GOCO) R&D FACS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: GOVT-OWNED CTR-OPERATED (GOCO) ENVIRONMENTAL LABORATORIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: GOVT-OWNED GOVT-OPERATED (GOGO) ENVIRONMENTAL LABORATORIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: HEATING AND COOLING PLANTS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: HIGHWAYS, ROADS, STREETS, BRIDGES, AND RAILWAYS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: HOSPITALS AND INFIRMARIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: LABORATORIES AND CLINICS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: MAINTENANCE BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: MISCELLANEOUS BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: MISSILE SYSTEM FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: MUSEUMS AND EXHIBITION BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OFFICE BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OPEN STORAGE FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER ADMINISTRATIVE FACILITIES/SERVICE BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER AIRFIELD STRUCTURES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER CONSERVATION AND DEVELOPMENT FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER EDUCATIONAL BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER HOSPITAL BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER INDUSTRIAL BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER NON-BUILDING FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER RESIDENTIAL BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER UTILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: OTHER WAREHOUSE BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: PARKING FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: PENAL FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: POLLUTION ABATEMENT AND CONTROL FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: PRODUCTION BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: RADAR AND NAVIGATIONAL FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: RECREATIONAL BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: RECREATIONAL FACILITIES (NON-BUILDING)
        ARCHITECT AND ENGINEERING- CONSTRUCTION: RELIGIOUS FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: RESTORATION OF REAL PROPERTY (PUBLIC OR PRIVATE)
        ARCHITECT AND ENGINEERING- CONSTRUCTION: SCHOOLS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: SEWAGE AND WASTE FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: SHIP CONSTRUCTION AND REPAIR FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: TANK AUTOMOTIVE FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: TESTING AND MEASUREMENT BUILDINGS
        ARCHITECT AND ENGINEERING- CONSTRUCTION: TROOP HOUSING FACILITIES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: TUNNELS AND SUBSURFACE STRUCTURES
        ARCHITECT AND ENGINEERING- CONSTRUCTION: UNIMPROVED REAL PROPERTY (LAND)
        ARCHITECT AND ENGINEERING- GENERAL: MANAGEMENT ENGINEERING
        ARCHITECT AND ENGINEERING- GENERAL: MARINE ENGINEERING
        ARCHITECT AND ENGINEERING- GENERAL: MECHANICAL SYSTEMS
        ARCHITECT AND ENGINEERING- GENERAL: OTHER
        ARCHITECT AND ENGINEERING- GENERAL: PLUMBING SYSTEMS
        ARCHITECT AND ENGINEERING- GENERAL: PRODUCTION ENGINEERING
        ARCHITECT AND ENGINEERING- GENERAL: STRUCTURAL ENGINEERING
        R&D- AGRICULTURE: INSECT AND DISEASE CONTROL (ENGINEERING DEVELOPMENT)
        R&D- AGRICULTURE: MARKETING (ENGINEERING DEVELOPMENT)
        R&D- AGRICULTURE: OTHER (ENGINEERING DEVELOPMENT)
        R&D- AGRICULTURE: PRODUCTION (ENGINEERING DEVELOPMENT)
        R&D- COMMUNITY SERVICE/DEVELOPMENT: OTHER (ENGINEERING DEVELOPMENT)
        R&D- COMMUNITY SERVICE/DEVELOPMENT: URBAN (ENGINEERING DEVELOPMENT)
        R&D- COMMUNITY SVC/DEVELOP: FIRE PREVENTION/CONTROL (ENGINEERING DEVELOPMENT)
        R&D- DEFENSE OTHER: AMMUNITION (ENGINEERING DEVELOPMENT)
        R&D- DEFENSE OTHER: CONSTRUCTION (ENGINEERING DEVELOPMENT)
        R&D- DEFENSE OTHER: OTHER (ENGINEERING DEVELOPMENT)
        R&D- DEFENSE OTHER: SERVICES (ENGINEERING DEVELOPMENT)
        R&D- DEFENSE OTHER: SUBSISTENCE (ENGINEERING DEVELOPMENT)
        R&D- DEFENSE OTHER: TEXTILES/CLOTHING/EQUIPAGE (ENGINEERING DEVELOPMENT)
        R&D- DEFENSE SYSTEM: AIRCRAFT (ENGINEERING DEVELOPMENT)
        R&D- DEFENSE SYSTEM: ELECTRONICS/COMMUNICATION EQUIPMENT (ENGINEERING DEVELOPMENT)
        R&D- DEFENSE SYSTEM: MISCELLANEOUS HARD GOODS (ENGINEERING DEVELOPMENT)
        R&D- DEFENSE SYSTEM: MISSILE/SPACE SYSTEMS (ENGINEERING DEVELOPMENT)
        R&D- DEFENSE SYSTEM: SHIPS (ENGINEERING DEVELOPMENT)
        R&D- DEFENSE SYSTEM: TANK/AUTOMOTIVE (ENGINEERING DEVELOPMENT)
        R&D- DEFENSE SYSTEM: WEAPONS (ENGINEERING DEVELOPMENT)
        R&D- ECONOMIC GROWTH: MANUFACTURING TECHNOLOGY (ENGINEERING DEVELOPMENT)
        R&D- ECONOMIC GROWTH: PRODUCT/SERVICE IMPROVEMENT (ENGINEERING DEVELOPMENT)
        R&D- EDUCATION: EDUCATIONAL (ENGINEERING DEVELOPMENT)
        R&D- ENERGY: CONSERVATION (ENGINEERING DEVELOPMENT)
        R&D- ENERGY: NUCLEAR (ENGINEERING DEVELOPMENT)
        R&D- ENERGY: OTHER (ENGINEERING DEVELOPMENT)
        R&D- ENERGY: PETROLEUM (ENGINEERING DEVELOPMENT)
        R&D- ENERGY: SOLAR/PHOTOVOLTAIC (ENGINEERING DEVELOPMENT)
        R&D- ENERGY: WIND (ENGINEERING DEVELOPMENT)
        R&D- ENVIRONMENTAL PROTECTION: AIR POLLUTION (ENGINEERING DEVELOPMENT)
        R&D- ENVIRONMENTAL PROTECTION: OTHER (ENGINEERING DEVELOPMENT)
        R&D- ENVIRONMENTAL PROTECTION: WATER POLLUTION (ENGINEERING DEVELOPMENT)
        R&D- GENERAL SCIENCE/TECHNOLOGY: ENGINEERING (ADVANCED DEVELOPMENT)
        R&D- GENERAL SCIENCE/TECHNOLOGY: ENGINEERING (APPLIED RESEARCH/EXPLORATORY DEVELOPMENT)
        R&D- GENERAL SCIENCE/TECHNOLOGY: ENGINEERING (BASIC RESEARCH)
        R&D- GENERAL SCIENCE/TECHNOLOGY: ENGINEERING (COMMERCIALIZED)
        R&D- GENERAL SCIENCE/TECHNOLOGY: ENGINEERING (ENGINEERING DEVELOPMENT)
        R&D- GENERAL SCIENCE/TECHNOLOGY: ENGINEERING (MANAGEMENT/SUPPORT)
        R&D- GENERAL SCIENCE/TECHNOLOGY: ENGINEERING (OPERATIONAL SYSTEMS DEVELOPMENT)
        R&D- GENERAL SCIENCE/TECHNOLOGY: ENVIRONMENTAL SCIENCES (ENGINEERING DEVELOPMENT)
        R&D- GENERAL SCIENCE/TECHNOLOGY: LIFE SCIENCES (ENGINEERING DEVELOPMENT)
        R&D- GENERAL SCIENCE/TECHNOLOGY: MATHEMATICAL/COMPUTER SCIENCES (ENGINEERING DEVELOPMENT)
        R&D- GENERAL SCIENCE/TECHNOLOGY: OTHER (ENGINEERING DEVELOPMENT)
        R&D- GENERAL SCIENCE/TECHNOLOGY: PHYSICAL SCIENCES (ENGINEERING DEVELOPMENT)
        R&D- HOUSING: HOUSING (ENGINEERING DEVELOPMENT)
        R&D- INCOME SECURITY: EMPLOYMENT (ENGINEERING DEVELOPMENT)
        R&D- INCOME SECURITY: INCOME MAINTENANCE (ENGINEERING DEVELOPMENT)
        R&D- MEDICAL: BIOMEDICAL (ENGINEERING DEVELOPMENT)
        R&D- MEDICAL: HEALTH SERVICES (ENGINEERING DEVELOPMENT)
        R&D- MEDICAL: OTHER (ENGINEERING DEVELOPMENT)
        R&D- MEDICAL: REHABILITATIVE ENGINEERING (ADVANCED DEVELOPMENT)
        R&D- MEDICAL: REHABILITATIVE ENGINEERING (APPLIED RESEARCH/EXPLORATORY DEVELOPMENT)
        R&D- MEDICAL: REHABILITATIVE ENGINEERING (BASIC RESEARCH)
        R&D- MEDICAL: REHABILITATIVE ENGINEERING (ENGINEERING DEVELOPMENT)
        R&D- MEDICAL: REHABILITATIVE ENGINEERING (MANAGEMENT/SUPPORT)
        R&D- MEDICAL: REHABILITATIVE ENGINEERING (OPERATIONAL SYSTEMS DEVELOPMENT)
        R&D- MEDICAL: SPECIALIZED MEDICAL SERVICES (ENGINEERING DEVELOPMENT)
        R&D- MINING: MINING SAFETY (ENGINEERING DEVELOPMENT)
        R&D- MINING: SUBSURFACE MINING EQUIPMENT (ENGINEERING DEVELOPMENT)
        R&D- MODAL TRANSPORTATION: AIR (ENGINEERING DEVELOPMENT)
        R&D- MODAL TRANSPORTATION: RAIL (ENGINEERING DEVELOPMENT)
        R&D- NATURAL RESOURCE: ATMOSPHERIC (ENGINEERING DEVELOPMENT)
        R&D- NATURAL RESOURCE: MARINE AND OCEANOGRAPHIC (ENGINEERING DEVELOPMENT)
        R&D- NATURAL RESOURCE: MARINE FISHERIES (ENGINEERING DEVELOPMENT)
        R&D- OTHER RESEARCH AND DEVELOPMENT (ENGINEERING DEVELOPMENT)
        R&D- OTHER TRANSPORTATION: HIGHWAYS, ROADS, AND BRIDGES (ENGINEERING DEVELOPMENT)
        R&D- OTHER TRANSPORTATION: NAVIGATION AND NAVIGATIONAL AIDS (ENGINEERING DEVELOPMENT)
        R&D- OTHER TRANSPORTATION: OTHER GENERAL (ENGINEERING DEVELOPMENT)
        R&D- SOCIAL SERVICES: OTHER (ENGINEERING DEVELOPMENT)
        R&D- SPACE: AERONAUTICS/SPACE TECHNOLOGY (ENGINEERING DEVELOPMENT)
        R&D- SPACE: COMMERCIAL PROGRAMS (ENGINEERING DEVELOPMENT)
        R&D- SPACE: FLIGHT (ENGINEERING DEVELOPMENT)
        R&D- SPACE: OPERATIONS, TRACKING AND DATA ACQUISITION (ENGINEERING DEVELOPMENT)
        R&D- SPACE: OTHER (ENGINEERING DEVELOPMENT)
        R&D- SPACE: SCIENCE/APPLICATIONS (ENGINEERING DEVELOPMENT)
        R&D- SPACE: STATION (ENGINEERING DEVELOPMENT)
        SUPPORT- PROFESSIONAL: ENGINEERING/TECHNICAL
        SYSTEMS ENGINEERING SERVICES

        How can i process this? Can u help me in processing through excel/R.

        Thanks in advance.

  2. Sandeep HC,

    Use the CombinedFuzzyCompare function which utilizes both of Monsi.Terdex’s fuzzy compare algorithms. Use it to assign a score to the Vendor’s product/service as compared to the master list. For each Vendor product/service, return the item with the highest score — and the score also — from the master list. Determine what the threshold score should be. In other words, anything under a certain score (even if it’s the highest score) would be ruled out as a match.

    You’d also have to write some additional Excel VBA to loop through calling the CombinedFuzzyCompare function.

    Regards,

    –Stephan Ip

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Normal Boy

Nothing out of the ordinary

Data Engineering Blog

Compare different philosophies, approaches and tools for Analytics.

%d bloggers like this: