When it Comes to Demographic Marketing Segmentation, Simple is Beautiful
Thursday, July 10th, 2008A few days ago in my post Restaurant Loyalty Programs: 10 steps to understanding your most valuable customers I talked about how to do a rudimentary best customer analysis using census demographic data. One thing I did not mention, however, is the hidden danger in using census data: the temptation to overdo it.
If you followed the steps I outlined, you may have seen that the American Factfinder site offers mountains of demographic data for every zip code in the country. You might be tempted to try to use all of it in your analysis. Resist the temptation and strive to perform your demographic segmentation using as few pieces data as possible. Why?
1) Smaller data sets are easier to manage
Data integration is 90% of modeling. The more you can cut down integration by ignoring inconsequential data the less time it will take to get actionable results.
2) A kitchen sink approach can lead to overfitting
Overfitting simply means that the data is being sliced so thinly that randomness is causing you to see things that aren’t really there. While this isn’t quite as important if you’re doing the quick and dirty analysis I outlined, it becomes quite important when using more quantitative techniques.
How do you know if you’re seeing overfitting? If weird “pockets” of data exist, you may be a victim. For example, if the data show that customers who live in zip codes where males comprise between 50.31 and 50.35 of the population are twice as valuable as zip codes where they are outside that range, you have probably overfit the data.
3) Smaller models are easier to act upon
When deciding whether to keep a variable, consider what action you would propose a company to take if the variable turned out to be meaningful. For example, if you have a data set that can tell you that your best customers live in areas where public transport usage is high, you could consider advertising on subways and busses to attract more high value customers. On the other hand, if your data set tells you that your best customers have commutes between 15-30 minutes, could you act on that? If not, drop that piece of data.
Simple is beautiful when it comes to demographic modeling. Try to resist the temptation to throw all the data available at a segmentation model. You, your database guy, and your marketing staff will be glad you did.