Codeflix, a streaming video startup, is interested in measuring their user churn rate. In this project, we'll be helping them answer these questions about their churn:
- How many months has the company been operating?
- Which months do you have enough information to calculate a churn rate?
- What segments of users exist?
- What is the overall churn trend since the company started?
- Which segment of users should the company focus on expanding?
First, let's determine the range of months of data provided.
Let's run this query.
SELECT MIN(subscription_start), MAX(subscription_end)
FROM subscriptions;
We'll be calculating the churn rate for both segments (87 and 30) over the first 3 months of 2017 (we can't calculate it for December, since there are no subscription_end values yet).
To get started, create a temporary table of months.
WITH months AS (
SELECT
'2017-01-01' AS first_day,
'2017-01-31' AS last_day
UNION
SELECT
'2017-02-01' AS first_day,
'2017-02-29' AS last_day
UNION
SELECT
'2017-03-01' AS first_day,
'2017-03-31' AS last_day
)
SELECT *
FROM months;
This is what our temporary table looks like.
We will then join this with our existing table
subscriptions.
WITH months AS (
...
),
cross_join AS (
SELECT *
FROM subscriptions
CROSS JOIN months
)
SELECT *
FROM cross_join
LIMIT 5;
This is the joined table looks like. The
subscriptions table on the left, two more columns added for the
months.
Let's check the active users for segments
87 and
30 by creating another temporary table. We'll also create one for cancelled subscriptions for both segment.
WITH months AS (
...
),
cross_join AS (
...
),
status AS (
SELECT id,
first_day AS month,
CASE
WHEN (segment = 87
AND (subscription_start < first_day)
AND(subscription_end > first_day
OR subscription_end IS NULL)
) THEN 1
ELSE 0
END AS is_active_87,
CASE
WHEN (segment = 30
AND (subscription_start < first_day)
AND(subscription_end > first_day
OR subscription_end IS NULL)
) THEN 1
ELSE 0
END AS is_active_30,
CASE
WHEN (segment = 87
AND (subscription_end BETWEEN first_day
AND last_day)) THEN 1
ELSE 0
END AS is_canceled_87,
CASE
WHEN (segment = 30
AND (subscription_end BETWEEN first_day
AND last_day)) THEN 1
ELSE 0
END AS is_canceled_30
FROM cross_join
)
SELECT *
FROM status
LIMIT 5;
Here's the result for this query,
0 means
false AND
1 means
true. This will be useful later in calculating the churn rate.
Let's create another temporary table to calculate the
SUM of active and canceled subscriptions for each segment, for each month.
WITH months AS (
...
),
cross_join AS (
...
),
status AS (
...
)
status_aggregate AS(
SELECT month,
SUM(is_active_87) AS sum_active_87,
SUM(is_active_30) AS sum_active_30,
SUM(is_canceled_87) AS sum_canceled_87,
SUM(is_canceled_30) AS sum_canceled_30
FROM status
GROUP BY month
)
It's now time to find out their churn rates.
WITH months AS (
...
),
cross_join AS (
...
),
status AS (
...
)
status_aggregate AS(
...
)
SELECT month,
(1.0 * sum_canceled_87 / sum_active_87)
AS churn_rate_87,
(1.0 * sum_canceled_30 / sum_active_30)
AS churn_rate_30
FROM status_aggregate;
Here's the result.
Segment 30 has the lowest churn rates for the first three months. That means less people are unsubscribing and that is better because that's how we generate sales. We want people to continue their subscription.
Show Entire Code