Loyalty Program Metrics

a.k.a. Membership Program Metrics. I needed some inspiration for some analysis I’m doing around member retention right now, and thought I’d glance at some good ol’ fashioned listicles to start.

Love this list of metrics to track around loyalty programs, which I’m adapting to Member Retention http://blog.cxloyalty.com/9-rewards-program-metrics-you-should-be-tracking. My comments are in red:

  1. Activation rate (a.k.a. Acquisition Rate)–is the number of new members joining your loyalty program. Even longstanding rewards programs should be adding new members periodically. Low or drastically declining activation rates indicate a troubled program. 
  1. Usage rate (earning velocity) – Are members using the program? Keep track of how often they make purchases or perform other activities to earn points[…]. For us, this would translate into: are members using their discounts? are they opening member e-zines? 
  1. Time to first spend–is how long it takes new members to spend their points (hint: the shorter the better). New members spending points quickly is a good indicator of high purchase frequency (more purchases = more points to spend) and that they see value in the program. Again, a pretty direct correlation here. While our members don’t accrue points, there is a time-to-first spend here that would be how long after their purchase/renewal of membership. I would also be interested in a time-to-second spend, since a lot of people join or renew specifically to get discounts on a particular performance — so that first spend would be automatic. The real test is how long it takes them to use the membership again.
  1. Frequency – refers to both current and desired member behaviors and is used to set program benchmarks and goals. Current behaviors are what your customers are doing now, such as frequency of purchase or frequency of interaction. Desired behaviors are what you want members to do – like “increase frequency of visits, spend or engagement by 20%. How often are members buying tickets? How often are they opening their emails? 
  2. Breakage rate –is the percentage of points issued that are unspent (divide total number of points unspent by total points ever issued). Breakage is inevitable, but a rate higher 50% is a red flag your members aren’t using the program (read: aren’t engaged). I don’t think this one is directly applicable to us, but I’m interested in the idea of creating a proxy point system for breakage rate — a check in moment 3 months, 6 months, 9 months, 12 months after the membership is purchased in which the percentage of “worth” of their membership is calculated. If they bought a $65 membership, have they gotten their money’s worth in discounts and free admission yet? Filing this away into a “maybe set this up later” ideas bank. 🙂
  1. Churn rate – or attrition rate, is the rate at which customers leave a business, or in this case, a rewards program. Long periods of inactivity (usually 18 months) or cancellations are measures of negative churn. Yep.
  1. Inactivity rate – is a measure of the time elapsed since a member’s last interaction with the program. Keep an eye on inactivity rates – prolonged inactivity leads to negative churn. Similar to frequency, but measured in time as opposed to # of events. This is pretty novel, and something I’d like to set up.
  1. Redemption – Usage rate is important, but if your members are holding onto all their points (read: not redeeming), that means they’re not engaging. Redemption is the “aha” moment when a member cashes in hard-earned points for something they want – it’s when they experience the value in your program and your brand. Not sure that this is applicable, since again, we don’t use points.
  1. Advocacy – Beyond your Net Promoter Score (which measures whether someone would recommend your business), social sites like Yelp and Foursquare help you go a step above that to see whether someone actually did recommend your business and what they had to say about it. You can also monitor what people are saying about you and track brand engagement with analytics tools built into social sites like Facebook, Twitter and LinkedIn. These candid comments on social can help keep positive momentum going among your fans’ networks, or help you understand where improvements can be made. Great idea, I’d love to work with our social media department to measure whether or not our members are advocating for us on Facebook, Twitter.

Aim for the “bell curve”
The best way for loyalty program managers to track program success is to use all the program data and loyalty metrics listed above. A quick snapshot of successful rewards programs should look like a bell curve: The lower section of the curve represents low usage or inactive members; the middle is engaged users; the end is super-users. If most of your users are on the left end of the curve, your program is in trouble. Learn more about signs of a troubled program here.


Coursera Data Science Math Skills

Today in my Data math class, we reviewed the Pythagorean formula (an old favorite from middle school). When you have a right triangle, the length of the hypotenuse (the long, diagonal) can be found by this formula:

z^2= x^2 + y^2
also known as:
z=the square root of x^2 + y^2

for use b/n 2 points A(x1,y1) and B(x2,y2):
distance = square root of (x2-x1)^2 + (y2-y1)^2

This can be used to find out what the distance is between two points in a graph. So, if you have two  points on a graph: PointA(4,8) and PointB(6,9). You can measure the distance between the A and B is the square root of (6-4)^2 + (9-8)^2

This is very applicable to data science, because of what you can do with those distances:

Nearest Neighbors:

When you have multiple points on a graph, you can calculate the distance between all of them, then determine the order of the Nearest Neighbors: So say you have 4 points A, B, C, D. A is closer to B than C, and closer to C than D. The nearest neighbor to A is B, the next is C, the next is D. This idea is used a lot in Machine Learning (one of the main methods in supervised learning). “If A had to be most like one of these, it would be B”


When you have a lot of points near each other on a graph, you can group them loosely in a cluster. Distance is a good way of expressing membership in a cluster. Points in the same cluster can be said to have a small distance from each other, relative to other cluster make-ups.


Calculating slope — different from calculating distance. The formula for two points A(x,y) and B(c,d).

slope(m) = (y2-y1)/(x2-x1)
slope = (y axis) / (x axis)
slope = RISE/RUN

Handy for determining whether or not a given point outside of your current line will or will not be on this line if it continues in it’s current trajectory.

Point-Slope formula:

Finding the unique point where the line intercepts the y-axis (0,b)

point-slope formula (equation to determine a line)
y = mx + b
(x,y) = coordinates on the line
b is the y-value of the y-intercept

Supervised Learning is all about figuring out what the function is, given a series of input/output pairs, so you can then use that function to predict what unknown input/output pairs look like. So functions are incredibly powerful in machine learning.


I’m taking this Coursera course right now, which was recommended to me by a peer at a CRM conference I went to recently. The class is a refresher course on probability, statistics, and other basic math skills needed to work in data science:


I got a lot of value out of the end of the Week 1 lessons when they visualized variance. Variance is a measure of how spread out each data point is from each other. You calculate it by figuring out how far away from the mean each number is. This is important, because you can have 2 data sets with the exact same mean, but with very different inner relationships.

For example, x{1,5,12} and y{5,6,7} both have a mean of 6. But if you were to plot them on a number line, you would see that all the datapoints in y are clustered together, as opposed to x, in which the datapoints are spread very far apart from each other.

Loved that visualization. Really helps cement a concept I knew was important but couldn’t quite put my finger on (yes this makes me sound like an idiot, I’m okay with sounding like an idiot in the name of feeling joy from learning).

Naive Bayes Intro to Python Class

Took a class at Galvanize tonight

source: https://www.analyticsvidhya.com/blog/2015/09/naive-bayes-explained/

P(a): the probability event “a” occurs
P(a,b): probability events “a” and “b” both occur
P(a|b): probability event “a” occurs, given that event “b” occurs

P(a,b) = P(b,a)
P(a|b) != P(b|a)

Naive Bayes — the example in this class is “given a section of text” what is the probability that this text is from Buzzfeed, New York Times, or Fox. Only works one way. You can’t use the same algorithm to determine what words are in a piece of text given the news source.



P(a|b)P(b) = P(b|a)P(a)

P(a|b) = P(b|a)P(a) / P(b)

posterior = likelihood * prior / evidence

^^^^^using the definitions above and simple algebra^^^^
Prior belief (prior knowledge about the world)
–> this is what sets apart Baysian probability

I’m working with open answer survey data right now and am trying to figure out how to tag each response. This could help with that (given the words they used, how likely is this person to have given a “9” in the NPS). To determine: what kinds of things does Naive Bayes work for, what does it not work for?

You can use this in medical testing (I was just going over this in the Coursera math class yesterday!). The idea of false negatives and false positives is strongly at work:

Probability that you have a condition, given that you tested positive for it on a test
P(pos|cond)*P(cond) / P(pos)
P(pos|cond)*P(cond) / P(pos|cond) * P(cond) + P(pos |no-cond) * P(no-cond)

Machine Learning! What we’re learning tonight is an example of Supervised Learning (as opposed Unsupervised). Unsupervised learning is the kind of clustering I’ll want to do (given all this customer data, let’s have the machine create clusters of types of users). This class is about tagged data. I’m thinking it would be very helpful to the field of the arts to build an unsupervised ML algorithm that takes un-tagged data and makes predictions about it, given ticket history, etc.

Naive Bayes: Class | Feature

So, for one feature (e.g. the word “quinoa”):
P(class|feature) = P(feature|class) * P(class) /P(feature)
——don’t worry about the denominator for a minute—-
=(#feat in class / # total in class) * (# class/#total data points)
^^^^”(number of quinoas in NYT/total words in NYT) * (total # all articles in NYT/total # articles)

>>>after you run all of your classes through this, your classifier will choose a “winner”

For two features ( has the word quinoa, and was posted after midnight, and mentioned Obama 3 times, and had 2 or more paragraphs, and was shared 30 times etc.  ):

P (feature | class) * P(class)
P(c, F1, F2, …, Fn)(“joint”)
=P(c) * P(F1 | c) * P(/f2 | c, F1) * P(F3 | c, F1, F2) * …… this could go on forever and ever and ever! Such a pain…

But if we assume independence of features given class (e.g. # of shares does not correlate to # of quinoas, given NYT)  (this is the Naive part) :
P(Fi | c, Fj)=P(Fi|c)
=P(c) * P(F1 | c) * P(F2 | c) * P(F3|c) *…. (as many factors as you need)
^^^Probability of class * Probability of quinoa, given NYT * Probability of Obama, given NYT

Now that we can calculate the numerator for some class, we caluclate it for every class and compare them. The largest one is the most likely class, and therefore the class we should predict.

For each class c:
pseudo_prob = P(c) * P(F1|c) * P(F2|c) * P(F3|c)…..

“Training” is the process of counting up occurrences (all the little terms in “(#feat in class/#in class) * (#class/#total data points)”) so we can use them to predict. In some/most models/algorithms, training is much, much more complicated.




My Current Education Path

R Programming in Datacamp — because it’s highly rated on datasciguide, there’s a lot of free content, and I got the first month for $9.99. I like the format, but the teaching style leaves a little to be desired. Still, I’m going to get through the Data Analyst track, and gain what I can from it, and then dive back in elsewhere.

Data Science Math Skills on Coursera — because it was recommended to me by a peer at the Tessitura conference, because it’s free, and because I keep running into the question “I think I understand the theory behind this correlation equation, but what if I’m doing it wrong?”

I’m also listening to data science (and R specifically) legend Roger Peng’s podcast about data science called Not So Standard Deviations. I’m a big fan of podcasts like this because it helps to hear all the language about this relatively new field for me put into context. So when someone casually drops the idea of “K Nearest Neighbor” –a machine learning algorithm I’ve been introduced to but don’t quite understand yet –in the context of discussing a story of AI technologies currently on the market, it helps me contextualize the world a little better. Other data podcasts I’ve listened to and can recommend include:

  • Partially Derivative — casual shop talk about data science ideas popping up in the media
  • Becoming a Data Scientist — interviews with data experts in various fields about how they got where they are today, what tools they prefer to use, and what tips they have for newbies in the field
  • Heap Analytics — similar to the above. interviews with tech industry data people about their work

What’s Next

  • read the book The Cartoonists Guide to Statistics which was recommended to me several times over at the conference I attended last week
  • take the Udemy course “Machine Learning A-Z: Hands on Python and R in Data Science”. again, I jumped on a sale and purchased this class on the cheap. But it also combines my 3 major interests right now: firming up my foundational R skills, getting introduced to Python, and dipping my toes into Machine Learning.
  • attending various data Meet Ups — in topics ranging from programming on the Alexa api to how to get started in R.


In Progress DA Starter Guide

Summer’s Recommended Data Analytics Starter Guide
–free course to figure out how to install R on your computer
–SWIRL package to get started
–Datacamp for basic/intermediate
——find a good tidyverse lesson
Becoming a Data Scientist
Heap Analytics
Roger Pengs’ podcast…
find data events near you!

9 Social Media Metrics to Monitor

I attended a webinar today through AMA on social media metrics to monitor, since we’re always trying to figure out what to cull from the giant data resource that is Facebook Ads and Google Analytics. Some notes:

  • you must understand the WHY behind the WHAT
    • what = sharing, net sentiment, passion, over time, vs. competition
    • why = understand consumers, opinions, emotions, behaviors, themes

They propose 9 metrics:

  1. Mentions
    • # of mentions (total and by channel)
    • whether they are positive or negative (just had a vendor call today with a BI tool that has built their own algorithm for determining what the tone of the mention was!)
    • how passionate the mentions are
  2. Engagement
    • where and when are consumers TALKING TO and ABOUT your organization in social media? –> why? because you want to respond
    • different types of engagement:
      1. owned = people responding to your posts;
      2. partnered = artists + clients talking about you;
      3. earned = people talking about you independent of any posts you’ve made (the most important channel)
  3. Sentiment
    • what your consumers like and dislike about you. allows you to concur or commiserate with their opinions. if people love you, you want to know why, and vice versa
    •  lots to measure/pay attention to here: volume of opinions, what’s the attitude? who is driving the conversation (are they an influencer? where are they posting?), how strongly do they feel? how big is it in relation to other conversations/topics?
    • love some of the methods they suggest re: measuring sentiment:
    • wordclouds
  4. Brand Passion
    • how people describe you, and the energy behind the words they’re using. your organization is “ok” or “good” instead of “awesome”
    • learn what excites your customers most and leverage those insights to increase their loyalty (and find the right pricepoint)
  5. Detractors
    • spot detractors early on. mitigate issues proactively by engaging with detractors
  6. Influencers
    • who are your biggest fans? they deserve to be recognized.
    • how passionate are they? which channels are they on? how influential are they (how many followers do they have?)
  7. Content
    • not all content will resonate with your audience — which is why you have to have multiple approaches to content; which is why you need to segment audiences and provide content specific to each segment
    • are informational articles, funny memes, instagram stories important to your audience? which content drives the most engagement? and where?
    • segment, segment, segment
    • which content is doing best, how to market it to the segments it’s doing best with
  8. Channel
    • which channel is making the most impact.
    • also look at quality of traffic — if visitors “bounce” quickly or don’t convert or aren’t as relevant to you, that channel may not be worthwhile
  9. Share of Voice
    1. are you being mentioned more than your competitors?
    2. it’s not enough how your brand is doing, you’ve got to know where you stand against other brands in your industry. monitoring has to include mentions of your top competitors so you understand where you fall in the hierarchy
    3. look at what people are saying about other organizations — those same opinions might be actionable by your organization. is a museum next door being blasted for no gender-neutral bathrooms or rude guards? make sure you evaluate your bathrooms and guards to make sure if that same person came to you, they wouldn’t have the same problem — e.g.


For Loops vs. While Loops

I’m taking the Intermediate R class in Datacamp right now, and it feels like while loops and for loops do pretty much the same thing, it’s just that their syntax is a little different.

So I looked around a bit on Google to see if anyone could explain the difference and I like this example: https://www.quora.com/What-is-the-difference-between-while-loop-and-for-loop-What-is-an-example-of-this

  • While loop is used in situations where we do not know how many times loop needs to be excuted beforehand.
  • For loop is used where we already know about the number of times loop needs to be excuted. Typically for a index used in iteration.

I feel like programming teachers always glance over really crucial information like this. It can be hard to take the step outside of the practice examples given and determine how to apply the program to your own real data and questions. Thank goodness for other curious programmers on Quora, Reddit, and other public forums.

Create a free website or blog at WordPress.com.

Up ↑