SQL大厂面试真题笔记(牛客网)
01 某音短视频
SQL1 各视频平均完播率
1、明确问题
有两张表:用户-视频互动表tb_user_video_log和短视频信息表tb_video_info
计算2021年里有播放记录的每个视频的完播率(结果保留三位小数),并按完播率降序排序
-- ①2021年
--② 每个视频的完播率(保留三位小数)
-- 完播视频即(duration>=end_time-start_time)
-- 完播率 = 完播视频数/视频播放总次数
-- ③保留三位小数
方式1:round(number,3)
-- ④按完播率降序排序(ASC)
2、解题思路
第一步: 表连接,查找出基本信息
第二步: 计算每个视频的完播率
3、SQL语句
我查询时一般习惯一步一步的来,查看得出的结果是否有问题,再进行下一步。
WITH T
SELECT v1.id,v1.video_id,TIMESTAMPDIFF(SECOND,start_time,end_time) "play_time",duration
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE YEAR(release_time) = 2021)
SELECT video_id,
ROUND(SUM(CASE WHEN duration <= play_time THEN 1 ELSE 0 END)/COUNT(id),3) AS "avg_comp_play_rate"
FROM T
GROUP BY video_id
ORDER BY avg_comp_play_rate DESC;
SQL语句优化之后,也可以一步到位,不需要用with...as
SELECT v1.video_id,
ROUND(SUM(IF(TIMESTAMPDIFF(SECOND,start_time,end_time)>=duration,1,0))/COUNT(v1.id),3) AS "avg_comp_play_rate"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE YEAR(release_time) = 2021
GROUP BY v1.video_id
ORDER BY avg_comp_play_rate DESC;
4、题目总结
1)表连接left join
2)case...when...then...end
这里也可以用if
3)时间差函数TIMESTAMPDIFF(interval,start_time,end_time)
interval可以选{second,minute,day,month,year}
4)round(m,d)m指具体的数,d指保留几位。round会四舍五入。
5)group by分组
6)order by降序
7)with new_name as
SQL2 平均播放进度大于60%的视频类别
1、明确问题
有两张表:用户-视频互动表tb_user_video_log和短视频信息表tb_video_info
计算各类视频的平均播放进度,将进度大于60%的类别输出。
-- 播放进度 = 播放时长/视频时长*100%
-- ①按tag分组
-- ②不同tag的平均播放进度
-- 平均播放进度 = avg(播放进度)按tag分组后求平均
-- ③输出平均播放进度大于60%的类别
2、解题思路
第一步 :表连接,查找出基本信息
第二步 :计算平均播放进度
3、SQL语句
解法1:
WITH T AS(
SELECT v1.uid,v2.tag,TIMESTAMPDIFF(SECOND,v1.start_time,v1.end_time)/v2.duration AS play_rate
FROM tb_user_video_log AS v1
LEFT JOIN tb_video_info v2 ON v1.video_id = v2.video_id)
SELECT tag,CONCAT(ROUND(AVG(play_rate)*100,2),"%") AS playtime_rate FROM T
GROUP BY tag
HAVING AVG(play_rate) >0.6;
解法2:
SELECT v2.tag,CONCAT(ROUND(AVG(TIMESTAMPDIFF(SECOND,v1.start_time,v1.end_time)/v2.duration),2)*100,"%") AS playtime_rate
FROM tb_user_video_log AS v1
LEFT JOIN tb_video_info v2 ON v1.video_id = v2.video_id
GROUP BY v2.`tag`
HAVING AVG(TIMESTAMPDIFF(SECOND,v1.start_time,v1.end_time)/v2.duration) >0.6;
4、题目总结
有了SQL1的经验,SQL2做起来,思路会更清晰一些。
1)表连接-- left join
2)concat(str1,str2,....)字符串连接函数,可以将字符串连接成一个字符串。MySQL中是不能直接生成百分数格式的,所以需要用到concat函数。
3)round()四舍五入取小数位
4)TIMESTAMPDIFF时间差函数
5)group by
6)having
7)替换函数
REPLACE(str,from_str,to_str)
substring_index(str,delim,count)
substring_index(“待截取有用部分的字符串”,“截取数据依据的字符”,截取字符的位置N)
SQL3 每类视频近一个月的转发量/率
1、明确问题
有两张表:用户-视频互动表tb_user_video_log和短视频信息表tb_video_info
问题:统计在有用户互动的最近一个月(按包含当天在内的近30天算,比如10月31日的近30天为10.2~10.31之间的数据)中,每类视频的转发量和转发率(保留3位小数)。
PS:这道题的问法本身就存在问题,这里根据解释,理解成最大时间(当天)在内的近三十天算起。
-- ①最近一个月(按30算)
-- ②每类视频(按tag分组)——>group by tag
-- ③每类视频的转发量
-- ④每类视频的转发率(保留3位小数)
转发率 = 转发量/播放量
2、解题思路
第一步 :表连接,筛选所需字段
SELECT v1.video_id,v2.tag,v1.uid,v1.if_retweet
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE TIMESTAMPDIFF(SECOND,v1.start_time,v1.end_time) > 0#筛选出播放的
AND DATEDIFF((SELECT MAX(DATE(end_time)) FROM tb_user_video_log),DATE(end_time)) <= 30#筛选一月内的数据
第二步 :计算转发量和转发率
3、SQL语句
SELECT v2.tag,SUM(if_retweet) AS "retweet_cut",
ROUND(SUM(if_retweet)/COUNT(uid),3) AS "retweet_rate"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE TIMESTAMPDIFF(SECOND,v1.start_time,v1.end_time) > 0#筛选出播放的
AND DATEDIFF((SELECT MAX(DATE(end_time)) FROM tb_user_video_log),DATE(end_time)) <= 30#筛选一月内的数据
GROUP BY v2.tag;
4、题目总结
这道题相对重要的点是WHERE条件子句。
1)时间差函数DATEDIFF(date1,date2)
SQL4 每个创作者每月的涨粉率及截止当前的总粉丝量
1、明确问题
有两张表:用户-视频互动表tb_user_video_log和短视频信息表tb_video_info
问题 :计算2021年里每个创作者每月的涨粉率及截止当月的总粉丝量
注 :
- 涨粉率=(加粉量 - 掉粉量) / 播放量。结果按创作者ID、总粉丝量升序排序。
- if_follow-是否关注 为1表示用户观看视频中关注了视频创作者,为0表示此次互动前后关注状态未发生变化,为2表示本次观看过程中取消了关注。
-- ①2021年
-- ②每个创作者(author)及每个月分组
-- ③计算涨粉率和当月的粉丝总量
2、解题思路
第一步 :表连接,筛选、提取关键字段和数据;
第二步 :计算
这里没有办法同时计算出涨粉率和累计的粉丝总量,所以可以采取子查询分步查询。
3、SQL语句
WITH
AS (SELECT author,DATE_FORMAT(end_time,'%Y-%m') MONTH,
ROUND(SUM(
CASE WHEN if_follow = 0 THEN 0
WHEN if_follow = 1 THEN 1
ELSE -1
END)/COUNT(uid),3) fans_growth_rate,
SUM(
CASE WHEN if_follow = 0 THEN 0
WHEN if_follow = 1 THEN 1
ELSE -1
END) add_fans
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE YEAR(end_time) = 2021#筛选在2021年的数据
AND TIMESTAMPDIFF(SECOND,start_time,end_time) > 0
GROUP BY author,MONTH)
SELECT author,MONTH,fans_growth_rate,SUM(add_fans) over(PARTITION BY author ORDER BY MONTH) total_fans
FROM T
ORDER BY author,total_fans DESC;
解法2:使用if
WITH T AS (
SELECT v2.author,DATE_FORMAT(v1.start_time,"%Y-%m") AS date_month,SUM(IF(if_follow=1,1,IF(if_follow=2,-1,0)))/COUNT(v1.uid) AS fans_growth_rate,SUM(IF(if_follow=1,1,IF(if_follow=2,-1,0))) AS add_fans
FROM tb_user_video_log AS v1
LEFT JOIN tb_video_info AS v2 ON v1.`video_id` = v2.`video_id`
WHERE TIMESTAMPDIFF(SECOND,start_time,end_time) > 0
GROUP BY author,date_month)
SELECT author,date_month,fans_growth_rate,SUM(add_fans) over(PARTITION BY author ORDER BY date_month) AS total_fans
FROM T
ORDER BY author,total_fans;
4、题目总结
1)case...when...then...end
用if也可以。
2)date_format() :时间格式函数,可以将时间转化成特定的格式;
4)with...as 结构的子查询
SQL5 国庆期间每类视频点赞量和转发量
1、明确问题
有两张表 :用户-视频互动表tb_user_video_log和短视频信息表tb_video_info
问题 :统计2021年国庆头3天每类视频每天的近一周总点赞量和一周内最大单天转发量,结果按视频类别降序、日期升序排序。假设数据库中数据足够多,至少每个类别下国庆头3天及之前一周的每天都有播放记录。
-- ①2021年
-- ②国庆头3天(2021-10-01,2021-10-02,2021-10-03)
-- ③每类视频每天(2021-10-01,2021-10-02,2021-10-03)的近一周点赞量和一周内单天最大转发量
-- ④按照视频类别降序、日期升序排序
2、解题思路
我直接是一天一天的算的,牛客网上有另一个思路,是从2021-10-01这个一天开始往前数六行。
用到的窗口函数的 窗口子句 。但我个人觉得这样不是很严谨,加入中间有某一天的数据是空的,那么会导致数据出现问题。(可能是我多虑哈~但最后还是采用最保守,也最严谨的方式。)
第一步 :表连接,提取所需信息
SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like),SUM(if_retweet)
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
GROUP BY v1.video_id,tag,DATE(end_time);
第二步 :计算2021-10-01近一周的点赞量和一周内单天最大转发量
用子查询
SELECT tag,'2021-10-01' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-01',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag;
第三步 :替换时间分别计算2021-10-02和2021-10-03的近一周点赞量和一周内单天最大转发量
在第二步的基础上改一下时间即可;
#2021-10-02
SELECT tag,'2021-10-02' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-02',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag;
#2021-10-03
SELECT tag,'2021-10-03' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-03',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag;
第四步 :union all连接结果。
SELECT tag,'2021-10-01' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-01',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag
UNION ALL
SELECT tag,'2021-10-02' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-02',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag
UNION ALL
SELECT tag,'2021-10-03' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-03',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag;
3、SQL语句
SELECT tag,'2021-10-01' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-01',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag
UNION ALL
SELECT tag,'2021-10-02' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-02',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag
UNION ALL
SELECT tag,'2021-10-03' AS "date",SUM(if_like) total_if_like,MAX(if_retweet) max_if_retweet
(SELECT v1.video_id,tag,DATE(end_time) AS "date",SUM(if_like) AS "if_like",SUM(if_retweet) AS "if_retweet"
FROM tb_user_video_log v1
LEFT JOIN tb_video_info v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF('2021-10-03',DATE(end_time)) BETWEEN 0 AND 6
GROUP BY v1.video_id,tag,DATE(end_time)) T
GROUP BY tag;
4、题目总结
其他的地方和以前没有太大差别
1)UNION ALL :查询结果不排重合并
2)查询中插入新字段并赋值
select 'value' as column_name from table_name;
SQL6 近一个月发布的视频中热度最高的top3视频
1、明确问题
有两张表:用户-视频互动表tb_user_video_log和短视频信息表tb_video_info
问题 :找出近一个月发布的视频中热度最高的top3视频。按热度降序排列;
①近一个月发布的视频,即max(end_time)-release_time<30;
②热度计算公式
热度=(a*视频完播率+b*点赞数+c*评论数+d*转发数)*新鲜度
新鲜度=1/(最近无播放天数+1)
③最近播放日期以end_time-结束观看时间为准,假设为T,则最近一个月按照[T-29,T]闭区间统计。
④结果热度保留为整数,并按热度降序排序。
2、解题思路
第一步 :查找tb_user_video_log最大时间
SELECT MAX(DATE_FORMAT(end_time,'%Y-%m-%d')) T FROM tb_user_video_log;
第二步:查找符合时间范围的视频video_id
SELECT video_id FROM tb_video_info
WHERE DATEDIFF((SELECT T FROM (SELECT MAX(DATE_FORMAT(end_time,'%Y-%m-%d')) T FROM tb_user_video_log) AS t1),DATE_FORMAT(release_time,'%Y-%m-%d')) BETWEEN 0 AND 29;
第三步:在第二步的基础上,计算完播率、转发量、点赞数、评论数、未播放天数
SELECT v1.video_id,SUM(IF(TIMESTAMPDIFF(SECOND,start_time,end_time) >= duration,1,0))/COUNT(uid) play_rate,SUM(if_like) like_cnt,SUM(if_retweet) retweet_cnt,SUM(IF(comment_id IS NULL,0,1))ccomment_cnt,
DATEDIFF((SELECT MAX(DATE_FORMAT(end_time,'%Y-%m-%d')) T FROM tb_user_video_log),MAX(end_time)) no_play_day
FROM tb_user_video_log AS v1
LEFT JOIN tb_video_info AS v2 ON v1.`video_id` = v2.`video_id`
GROUP BY video_id;
第四步:在第三步的基础上计算热度
SELECT v1.video_id,ROUND((100*(SUM(IF(TIMESTAMPDIFF(SECOND,start_time,end_time) >= duration,1,0))/COUNT(uid))+5*(SUM(if_like))+2*(SUM(if_retweet))+3*(SUM(IF(comment_id IS NULL,0,1))))/(1+DATEDIFF((SELECT MAX(DATE_FORMAT(end_time,'%Y-%m-%d')) T FROM tb_user_video_log),MAX(end_time)))) hot_index
FROM tb_user_video_log AS v1
LEFT JOIN tb_video_info AS v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF((SELECT T FROM (SELECT MAX(DATE_FORMAT(end_time,'%Y-%m-%d')) T FROM tb_user_video_log) AS t1),DATE_FORMAT(v2.release_time,'%Y-%m-%d')) BETWEEN 0 AND 29
GROUP BY video_id
ORDER BY hot_index DESC;
3、SQL语句
SELECT v1.video_id,ROUND((100*(SUM(IF(TIMESTAMPDIFF(SECOND,start_time,end_time) >= duration,1,0))/COUNT(uid))+5*(SUM(if_like))+2*(SUM(if_retweet))+3*(SUM(IF(comment_id IS NULL,0,1))))/(1+DATEDIFF((SELECT MAX(DATE_FORMAT(end_time,'%Y-%m-%d')) T FROM tb_user_video_log),MAX(end_time)))) hot_index
FROM tb_user_video_log AS v1
LEFT JOIN tb_video_info AS v2 ON v1.`video_id` = v2.`video_id`
WHERE DATEDIFF((SELECT T FROM (SELECT MAX(DATE_FORMAT(end_time,'%Y-%m-%d')) T FROM tb_user_video_log) AS t1),DATE_FORMAT(v2.release_time,'%Y-%m-%d')) BETWEEN 0 AND 29
GROUP BY video_id
ORDER BY hot_index DESC;
4、题目总结
这个题目难点在于公式太长,要计算的指标太多,理出来就清晰了。
没有涉及到新的知识点。
02 用户增长场景(某度信息流)
SQL1 2021年11月每天的人均浏览文章时长
1、明确问题
问题:统计2021年11月每天的人均浏览文章时长(秒数),结果保留1位小数,并按时长由短到常排序。
①2021年11月每天的人均浏览时长;每天的人均浏览时长=浏览文章时长之和/浏览用户id数
②结果保留1位小数;
③时长由短到长排序;
2、解题思路
先计算每天的文章浏览文章时长之和,计算每日的浏览用户数。
3、SQL语句
SELECT DATE_FORMAT(out_time,'%Y-%m-%d') date_time,ROUND(SUM(TIMESTAMPDIFF(SECOND,in_time,out_time))/COUNT(DISTINCT uid),1) avg_view_len_sec
FROM tb_user_log
GROUP BY date_time
ORDER BY avg_view_len_sec;
4、题目总结
round():小数点取一位
group by:聚合分组
order by :排序
SQL2 每篇文章同一时刻最大在看人数
1、明确问题
①统计每篇文章同一时刻最大在看人数,如果同一时刻有进入也有离开时,
②先记录用户数增加再记录减少,
③结果按最大人数降序。
2、解题思路
这道题的解题思路就是,先对in_time的记录标记为1,out_time的记录标记为-1,然后将两张表结合,按照artical_id和时间进行排序。这样就很明显可以看到同一篇文章,同一时刻再看人数了。
SQL刷题笔记5——【时间区间内的用户存在量】 - 知乎 (zhihu.com)
3、SQL语句
#连接表
SELECT artical_id,MAX(max_uv) max_uv
FROM(
WITH T AS (
SELECT artical_id,in_time dt,1 `status`
FROM tb_user_log
UNION ALL
SELECT artical_id,out_time dt,-1 `status`
FROM tb_user_log)
SELECT artical_id,dt,SUM(`status`) over(PARTITION BY artical_id ORDER BY dt) max_uv FROM T) AS t2
GROUP BY artical_id;
4、题目总结
union all:将个查询结果按行拼接。
group by:聚合分组。
窗口函数。可参考以下文章 SQL学习笔记7——【聚合函数】 - 知乎 (zhihu.com)
注:重点是想清楚,为什么标记了状态才能求出每一时刻同时在看人数呢?
SQL3 2021年11月每天新用户的次日留存率
1、明确问题
问题:统计2021年11月每天新用户的次日留存率(保留2位小数)
注:次日留存率为当天新增的用户数中第二天又活跃的用户数占比。
注:如果in_time-进入时间和out_time-离开时间跨天了,在两天里都记为该用户活跃过,结果按日期升序。
PS:总觉得牛客网的题目,有时候会模糊,这里只讲大概的留存率解题思路。
对留存率问题比较费解的,可以参考B站上的教程
2、解题思路
①先计算每日新用户数:计算每个用户的最小in_time。
SELECT uid,MIN(DATE_FORMAT(in_time,'%Y-%m-%d')) AS new_user_time
FROM tb_user_log
WHERE DATE_FORMAT(in_time,'%Y-%m-%d') BETWEEN '2021-11-01' AND '2021-11-30'
GROUP BY uid;
②计算新用户的第二日留存情况
从结果中可以看出,新用户第二日的留存情况。
SELECT new_user_time,t1.uid,t2.uid next_day_uid
FROM (SELECT uid,MIN(DATE_FORMAT(in_time,'%Y-%m-%d')) AS new_user_time
FROM tb_user_log
WHERE DATE_FORMAT(in_time,'%Y-%m-%d') BETWEEN '2021-11-01' AND '2021-11-30'
GROUP BY uid) AS t1
LEFT JOIN tb_user_log t2
ON (t1.uid=t2.uid) AND (DATE_FORMAT(t2.in_time,'%Y-%m-%d') = new_user_time + INTERVAL '1' DAY);
③计算新用户留存率
WITH T AS (
SELECT new_user_time,t1.uid,t2.uid next_day_uid
FROM (SELECT uid,MIN(DATE_FORMAT(in_time,'%Y-%m-%d')) AS new_user_time
FROM tb_user_log
WHERE DATE_FORMAT(in_time,'%Y-%m-%d') BETWEEN '2021-11-01' AND '2021-11-30'
GROUP BY uid) AS t1
LEFT JOIN tb_user_log t2
ON (t1.uid=t2.uid) AND (DATE_FORMAT(t2.in_time,'%Y-%m-%d') = new_user_time + INTERVAL '1' DAY))
SELECT new_user_time dt,ROUND(COUNT(next_day_uid)/COUNT(uid),2) uv_left_rate
FROM T
GROUP BY new_user_time
ORDER BY dt;
3、SQL语句
WITH T AS (
SELECT new_user_time,t1.uid,t2.uid next_day_uid
FROM (SELECT uid,MIN(DATE_FORMAT(in_time,'%Y-%m-%d')) AS new_user_time
FROM tb_user_log
WHERE DATE_FORMAT(in_time,'%Y-%m-%d') BETWEEN '2021-11-01' AND '2021-11-30'
GROUP BY uid) AS t1
LEFT JOIN tb_user_log t2
ON (t1.uid=t2.uid) AND (DATE_FORMAT(t2.in_time,'%Y-%m-%d') = new_user_time + INTERVAL '1' DAY))
SELECT new_user_time dt,ROUND(COUNT(next_day_uid)/COUNT(uid),2) uv_left_rate
FROM T
GROUP BY new_user_time
ORDER BY dt;
4、题目总结
用户留存率是许多面试题中的经典,建议多思考,多练习。掌握题目逻辑是关键。
df + INTERVAL '1' DAY。为日期加上一天。
SQL4 统计活跃间隔对用户分级结果
1、明确问题
①问题:统计活跃间隔对用户分级后,各活跃等级用户占比,结果保留两位小数,且按占比降序排序。
②用户等级标准简化为:
#忠实用户(近七天活跃过且非新晋用户)
#新晋用户(近7天新增)
#沉睡用户(近7天未活跃但更早前活跃过)
#流失用户(近30天未活跃但更早前活跃过)
#近七天即[T-6,T]
#今天默认为'2021-11-04'
2、解题思路
第一步:计算总的用户数
SELECT COUNT(DISTINCT uid) total_user_cnt FROM tb_user_log;
第二步:计算每一个用户的活跃日期间隔与活跃次数
SELECT uid,TIMESTAMPDIFF(DAY,DATE(out_time),'2021-11-04') interval_time,COUNT(uid) over(PARTITION BY uid) user_cnt
FROM tb_user_log;
第三步:给用户打标签
SELECT uid,
CASE WHEN user_cnt >1 AND MIN(interval_time) over (PARTITION BY uid) <= 7 THEN '忠实用户'
WHEN user_cnt = 1 AND interval_time <= 7 THEN '新晋用户'
WHEN MIN(interval_time) over (PARTITION BY uid) > 30 THEN '流失用户'
WHEN MIN(interval_time) over (PARTITION BY uid)BETWEEN 7 AND 30 THEN '沉睡用户'
END AS tag
FROM (SELECT uid,TIMESTAMPDIFF(DAY,DATE(out_time),'2021-11-04') interval_time,COUNT(uid) over(PARTITION BY uid) user_cnt
FROM tb_user_log) AS t1;
第四步:计算用户等级占比
WITH T AS (
SELECT uid,
CASE WHEN user_cnt >1 AND MIN(interval_time) over (PARTITION BY uid) <= 7 THEN '忠实用户'
WHEN user_cnt = 1 AND interval_time <= 7 THEN '新晋用户'
WHEN MIN(interval_time) over (PARTITION BY uid) > 30 THEN '流失用户'
WHEN MIN(interval_time) over (PARTITION BY uid)BETWEEN 7 AND 30 THEN '沉睡用户'
END AS tag
FROM (SELECT uid,TIMESTAMPDIFF(DAY,DATE(out_time),'2021-11-04') interval_time,COUNT(uid) over(PARTITION BY uid) user_cnt
FROM tb_user_log) AS t1)
SELECT tag,ROUND(COUNT(DISTINCT uid)/(SELECT COUNT(DISTINCT uid) total_user_cnt FROM tb_user_log),2) ratio
FROM T
GROUP BY tag;
3、SQL语句
WITH T AS (
SELECT uid,
CASE WHEN user_cnt >1 AND MIN(interval_time) over (PARTITION BY uid) <= 7 THEN '忠实用户'
WHEN user_cnt = 1 AND interval_time <= 7 THEN '新晋用户'
WHEN MIN(interval_time) over (PARTITION BY uid) > 30 THEN '流失用户'
WHEN MIN(interval_time) over (PARTITION BY uid)BETWEEN 7 AND 30 THEN '沉睡用户'
END AS tag
FROM (SELECT uid,TIMESTAMPDIFF(DAY,DATE(out_time),'2021-11-04') interval_time,COUNT(uid) over(PARTITION BY uid) user_cnt
FROM tb_user_log) AS t1)
SELECT tag,ROUND(COUNT(DISTINCT uid)/(SELECT COUNT(DISTINCT uid) total_user_cnt FROM tb_user_log),2) ratio
FROM T