On a recent assignment for setting up lead generation, we took on an existing customer database to build a statistical model to score a leads database. The client does business in 7 countries, or so they said and I believed them. But they quickly added a caveat, “no one has looked at our database in awhile.” First, we looked at their billing country field. This had been an open text field in their Salesforce.com system that could be edited by just about anyone. What we found was amazing. There were so many variations for each country that unique values quickly proliferated to 107. Misspelling, case difference, punctuation and abbreviations all conspired to create many versions of the same country! Our first order of the day was to identify the obvious countries and group them, followed by corrections to the remaining data. This took 107 countries down to the correct 7. Then the country field was locked and going forward, a drop-down menu of countries is being used to prevent this proliferation from happening again. For statistical modelers, why is cleaning data important for sales intelligence? First of all, we want to point out that even a simple field can pose a challenge. In the absence …Read More