Archive for the ‘Predictive Analytics’ Category

Measuring a Predictive Model’s Email Marketing Results - Part II

Thursday, July 24th, 2008

On Monday, I began a discussion about how Istobe evaluates the ROI from email marketing campaigns based on our predictive models. At the end of my post, I promised a discussion about other factors that we take into account when evaluating the lift. And…voila. Today we unveil those factors: the email influence zone and opt-outs, and we discuss how Istobe accounts for them in our lift calculations.

Email influence zone
Sometimes referred to as decay rate in the catalog industry, the email influence zone (EIZ) - not unlike the catalog influence zone (CIZ) - is essentially the time period after an email is sent. And we assume that each succeeding day after the email is received has less effect than the day before. Thus, the moniker decay rate. Catalogers have believed for years that their catalogs have a carry-over influence: the catalog accounts for many web purchases. In fact, this is very reason that catalogers are loathe to cut the number of catalogs that they ship. Even to those customers who have never purchased from the catalog itself. We believe this is also true of email marketing.

Basically, the idea behind the EIZ is that an email offer has an effect on online purchases that have no other obvious origin and which relate to the product that we predicted. For example, if our models predict that shoes are the likely next product for a particular customer and that customer purchases shoes online five days after receiving an email that advertises shoes, then we can assume that the email - and our product recommendation - influenced the customer’s purchase. Our model gets credit for a small percentage of this purchase even though the purchase didn’t come directly from an email click-through. The EIZ period that we calculate differs per client depending on the frequency with which our clients send emails.

Opt-outs on the Istobe watch
If we’re going to give ourselves some of the credit for purchases that occur in non-email channels, we also have to take a hit for bad events that occur during our watch. The bad event that Istobe tracks carefully is email opt-out. We track whether the opt-out rate goes up during our watch. If it does, we have to assume that next-best offer has somehow turned customers off. If the opt-out rate does go up, we deduct a portion of our lift because we believe that we were responsible for that incline in opt-out rate. We’re responsible for that small piece of customer attrition.

Taken together with the variables I spoke about last time, these are just four factors that we constantly adjust in determining how successful we are on behalf of clients. And we’re always looking for new ways to perceive actual lift. If you have new ideas for evaluating predictive-model efficacy, please email me. I’d love to talk about them.

Defining Success: Lift, Support, and Confidence

Tuesday, July 22nd, 2008

 

I want to take a minute and build off of Matt’s post from yesterday. While lift, confidence, and support may sound like terms that are more applicable to therapy sessions, they are actually the metrics that we use to rely the trust we have in our models. Many people we talk with are familiar with these terms on a certain level, but when pressed, their understanding boils down to the following: higher is usually good and lower is usually bad. I wanted to use this post to define these metrics a little more thoroughly and talk about how they’re calculated.  Hopefully, readers will come away with a better understanding of what they mean and exactly how they’re used.

Support

In order to talk about more complex terms like lift and gain, we need to first start with the basics: support.  Support, sometimes referred to as the cover, is the number of data points (customers, transactions, etc.) that meet a set of rules and/or assumptions.  If I do a market basket analysis and find that customers who buy milk also buy cereal, the support would be the number of customers in the sample set where this holds true. Obviously, you can only estimate the value of the support number when given the size of the total sample population which is why we have our next metric: confidence.

Confidence

Since a rule with a support of 900 looks good when the sample size is 1,000 and not so good when the sample size is 1,000,000, we need a way to easily figure out whether or not our support is significant. Confidence is a ratio that takes the support number and divides it by the number of instances where the rule may hold true (or to be more exact - where the antecedent of our rule holds true).  For instance, in our milk/cereal example above, confidence would be the total number of customers who bought milk and cereal divided by the total number of customers that bought milk.  While it’s true that the higher the confidence the more reliable the rule, it is important to note that knowing the total sample size and the support value as well as the confidence is necessary to get an accurate picture of the rules significance in regards to the total population.

Benchmark

I define benchmark here because it makes it easier to explain both gain and lift.  Benchmark is the total number of items (customers, transactions, etc.) that meet an outcome divided by the total number of items in the database. Let’s go back to the milk/cereal example. Since cereal is the outcome that we are trying to predict, the benchmark would be the total number of transactions where cereal was purchased over the total number of transactions in the database. In layman’s terms, if we were randomly picking 100 items out of the database, it is the percentage of those items where the outcome would hold true. Benchmark is valuable because it puts a lower bound on the value of a model. If a model can do better than the benchmark value, then it provides real value to the customer.

Lift

The most common term that is used in statistics and especially analytics is lift.  Lift is a way to measure how much better a model is over benchmark. It is defined as the confidence divided by the benchmark and any value that is greater that one suggest that there is some usefulness to the rule. Many applications show lift in a chart. In these instances, the total population is divided into deciles - ten even groups - into which members are placed based on their predicted probability of response. The highest responders are put into decile 1, etc.  Lift is then calculated for each of these deciles and plotted on a line chart.

Hopefully, this provided a little more insight into how we calculate the value of a model. Next time, I’ll run through a complete example to show how these are calculated in practice.

Measuring a Predictive Model’s Email Marketing Results - Part I

Monday, July 21st, 2008

Istobe develops predictive models that recommend which products to market to customers via email and which are the best times to market those products. But how does Istobe measure the actual ROI returned by these models? The Istobe team burns many cycles discussing measurement techniques for the lift that we are delivering to our clients. And we’re constantly updating the formulae that we use to evaluate how our predictive models actually perform in production. Ultimately, the measured lift that we generate is the result of another model where we tie in the relevant factors according to different weights. What are the relevant factors? Read on.

Our model vs. current practice or our model vs. the naive approach
This actually isn’t a debate among us but it’s the most important part of understanding what kind of monetary benefit we’re actually delivering to the customer. Oftentimes, a model’s output will simply deliver lift in contrast with the naive approach. That is, the model will assume that our client is, at worst, merely flipping a coin in terms of the next-best product for their customer. Or, at best, the model assumes that the client’s customers will likely want the most popular product. So our models self-reflexively examine their benefit against these two benchmarks. However, when it comes time to actually measure how much better our model is, we always measure against our clients’ current practices. The assumption is that our clients already have a smart strategy for targeting their customers. So we get their rules for targeting their customers and then figure out how much better our models are at generating the right type of product offering.

Our model’s email timing vs. typical email timing
Email timing is starting to get a lot of traction at Istobe these days. After all, if the email is never opened then it doesn’t matter if the product that our clients are offering is a better fit for a set of customers or not. And there are better and worse times to send emails if you want them to be opened. So we take into account the timing that we suggest vs. the normal send times of these emails. Basically, timing is just another part of our models’ output. The models take into account the whole path for purchasing a product and getting an email to the right person at the right time is the first step in that process. When we track the Istobe improvement, we build email open rate into our evaluation and track how much lift we give our clients by understanding how many more opens and click-throughs our models were responsible for.

That’s about enough for today but I’ll talk about two other evaluation factors on Thursday that are a little more arcane: Email influence zone and opt-out rate.

Does your email response rate depend on how many emails you send?

Thursday, July 17th, 2008

Maybe. But I can guarantee your revenue per customer does. And not in the way that you might believe.  There is strong evidence that reducing email in an intelligent way actually increases your revenue per customer.

Just yesterday one of my colleagues asked me whether, in addition to the weekly timing of an email send, the quantity of emails sent to one person mattered. In other words, is there a limit to the email offers that a marketer should send? The intuitive answer is: of course. If we look at catalogs alone, consumer dissatisfaction with this method of direct marketing is at an all-time high. After all, no less than six websites have sprung up that allow consumers to opt out of catalogs. You’d have to have a powerful argument for me to believe that overzealous emailers are perceived any differently than overzealous catalogers.

My partner Doug Bright has already spent some time fleshing out this hidden cost of excessive email. So I’ll just add some more beef to his already meaty argument. In March, 2006, noted marketing researcher Dr. V Kumar, along with Rajkumar Venkatesan and Werner Reinartz came out with an article entitled “Knowing What to Sell, When, and to Whom.” You can see the abstract here at the Harvard Business Review. The article is utterly fantastic; you should get a hold of it.

What does this have to do with overemailing? Well, at the end of the article, the authors reveal an interesting, yet tangential, finding about email in their research. They found that purchase increases were tied to marketing communication in a strange way. It was not linear. In other words, more communication did not continually yield more purchasing. Instead, the authors found that above a certain threshold of communication, customers were put off. To quote the authors, “Clearly, many companies may be actively damaging their customer revenues in attempts to make sure that no opportunity for a sale is missed.”

The upshot is that they found that a data-driven approach to reducing marketing communication leads to “not only lower costs but to a revenue increase per customer.” When then tested this hypothesis using data-driven models and A/B testing at two client sites, the reduced communication strategy outperformed the traditional “blast ‘em” approach on both occasions. How much did it outperform the “blast ‘em” approach? I’m glad you asked, because these are the truly staggering numbers. For the B2B firm they worked with, the potential profit based on $1600 of additional revenue per customer, came to $320 million in additional profit. Now the cynical might say that this was mostly a reduction in cost. And I would have to admit that’s true. However, what the authors found was that the revenues for all product groups still increase, meaning that customers were spending, on average, $365 more with the reduced communication schedule. Similarly, at the financial services firm they worked with, the authors found an increase of $400 per customer using this data-based communication schedule.

To me, these results are unequivocal: sending too many emails not only is a waste of time and labor, it also hampers your sales. We all know it’s tempting to equate activity with results. But it may be better to turn your attention toward an intelligent use of your data to figure out who you really need to email and how many times you should email them.

How Day to Day Data Becomes Predictive Intelligence

Wednesday, June 11th, 2008

Although predictive analytics systems have become more popular in the last couple of years the term and the systems themselves still have a great deal of mysticism behind their definitions and operations. In this post I will reveal the best-practices based process we follow when delivering our predictive analytics solution in an effort to remove some of the mysticism surrounding these valuable systems.

Let me set the stage by defining what predictive analytics is and what information is needed. As the name suggests, predictive analytics systems attempt to forecast trends and behavior based on historical information. Essentially they predict what will happen given past experience. A good marketing example is product bundling or cross selling. If many customers are buying Blue-Ray DVD players and a Spider-Man DVD then the predictive analytics system will report the correlation and possibly drive a new campaign to offer a movie-player bundle.

Not surprisingly, a predictive analytics solution is built on a foundation of data, specifically operational data. Operational data is a collective term having several definitions but for now we will define it as any data originating from a business operations system. Customer order information, on-line shopping activity and direct-mail responses are all examples of operational data.

There you have it. Predictive systems use your operational data to prophesize the future. In our case we are forecasting customer specific trends and predicting how your customers will behave in various marketing scenarios. Now that we know what we are dealing with let’s get into how your data is turned into a valuable analytics solution. The first step involves finding the right data to work with.

Data Selection and Retrieval

As you can imagine even a small business generates vast amounts of operational data so we must filter out the noise by locating and identifying the data relevant to our predictive analytics solution. Just like preparing to buy groceries this step requires a human to review the available data sources (on-line traffic logs, historical orders, and customer portfolios) and then grade each data source by fidelity and quality. The data grading checklist used by Istobe is too comprehensive to discuss in this post but here are some example questions to help you do the same:

  • Is the data redundant (e.g., do multiple account or customer numbers exist?)
  • Is the data updated by a human or a machine?
  • What is the data’s lifetime? Or how long does the data stay intact?
  • If the data is related to another source how is the relation made?
  • Does the data drive any business decisions or is it directly used in any reports?

After each data source is graded we can start to figure out what to keep and how to improve it. For the data sources that we want to keep it is usually necessary to filter out dirty data by running it through a cleansing process. You may be surprised to hear that your data is probably very dirty but even in 3rd party systems dirty data exists. Imagine these scenarios and you should get a feel for the hundreds of other ways dirty data can get inserted into your data sources:

  • Users trying out new features in a CRM system
  • Test data inserted for quality control
  • Data entry errors
  • Historical data that was updated but never removed
  • System upgrades or merges

In its most basic form the cleansing process sets out to eliminate the dirt by:

  • Standardizing specific values e.g. date and time formats
  • Removing duplicate information
  • Removing inconsistent data (e.g., orders which were never completed)

The Data Selection and Retrieval phase is the most intrusive (as it requires collaboration between the custodian(s) of the data and the group building the predictive analytics system) but it is also the most important as it sets the foundation from which everything else is built.

In the next post I will discuss how the cleansed data is used in the Knowledge Creation phase.