Do you sell in 7 or 107 countries? Data cleansing & predictive analytics

Are you doing business in 107 countries? Or 7? Data cleansing matters

Data Hygiene for Predictive Analytics and AI

On a recent assignment where we set up lead generation in Salesforce, we used an existing customer database to build a statistical model to score a leads database.

The client does business in 7 countries. Or so they said, and I believed them.

But they quickly added a caveat: “no one has looked at our database in awhile.”

First, we looked at their billing country field. This was an open text field in Salesforce that could be edited by anyone. What we found was amazing. There were so many variations for each country that unique values quickly grew to 107. Misspellings, case differences, punctuation and abbreviations all added up to create many versions of the same country. Data chaos!

Our first task was to identify the obvious countries, and group them. Then we corrected the remaining data. This took 107 countries down to the correct 7.

Controls reduce chaos.

How to prevent this from happening again? You need to add controls to reduce the chaos.

Our client locked the country field, and set up a picklist menu of countries. (You should do a picklist for any field with standard, predictable values.)

Now you might think, they worked with that situation for awhile. And it didn’t affect Salesforce users too much. So why was it a problem?

Well, clean data is necessary for sales intelligence, especially if you plan to model or score your data.

Even a simple field can pose a challenge.

A field should not have multiple values that mean the same thing. For example the state field should not have “Illinois,” “IL” and “Ill.” This could lead to one of the values being missed when grouping states for analysis. Important data could be left out, and the analytic results would then be misleading.

As another example, if there is a shipping country and a billing country, chances are both should be the same. So correct the data, then carry over the values into the other field so you have fewer blank and invalid rows.

Poor quality data can be a clue that processes are broken.

When you see a data problem, it’s time to be a detective. Explore if this same problematic logic is being used elsewhere, if an algorithm was coded incorrectly in another process, or if the wrong data was fed in. Assume that this problem might lie in more than one location.

Instill a proactive, ongoing culture that data cleanliness is everyone’s responsibility.

People who process data, create the data, and consume the data all have a responsibility to keep an eye out for data quality. Like the TSA sign says in the airport: “If you see something, say something.”

If something doesn’t look right, escalate it to your Salesforce or CRM admin, IT, or analytics team to review it.

Also, do not assume anything “should be obvious.” Or, lay blame for bad data. As you can see from the simple example above about the country field, even the most basic data hygiene should be approached with respect. As my friend’s father, an electrician, said: “The day I stop fearing electricity is the day I will stop working.”

Leave a Reply

Your email address will not be published. Required fields are marked *