MYSQL的销售数据分析案例

数据来源于某网站的销售信息统计,主要由订单信息及用户信息两个部分组成,分析过程中可通过对两部分的单独分析或联结分析来进行相关指标的综合分析:
1、订单数据:链接: https://pan.baidu.com/s/1HLbzfXWTUgREFLyu3Mwe0Q 提取码:529n
2、用户数据:链接: https://pan.baidu.com/s/1lgncrR-DviysaD3WhHDoIg 提取码:6vmq

主要内容:

导入数据时,出现报错的各点总结:
a、语句出错:把文件路径导入到 cmd 黑窗口得到的路径斜杆是‘\’ ,要修改为‘/’,路径不能带有中文。
b、要有fields terminated by ',' 因为csv 文件是以逗号为分割符的。
4)数据导入完成

1)统计不同月份的下单人数
思路:先筛选出“已支付”,再对月份进行分组,最后对不同月份的不同下单用户的数量进行统计,注意对用户的去重。

SELECT month(paidtime) as 月份, count(distinct userId) as 用户数 FROM czy.order_info
where isPaid = '已支付'
group by month(paidtime) ;
2.1)复购率:自然月内,购买多次的用户占该月总用户数的比例

思路:先筛选出‘三月份’及‘已支付’的数据,再对用户Id进行分组,求得用户id及其对应的购买次数,最后通过子查询的方式,嵌套一个select语句,令购买次数 >1的用户的购买次数变成1并相加求和以便求出购买多次的用户数,而购买次数不大于1的购买次数变成null,不参与计数,最后求得购买多次的用户数及当月有消费的用户总数。

●在三月份用户总数已提前计算得知时:

select concat((count(userId)/54799)*100,'%') as 三月份复购率 from 
(select month(paidtime), userId ,count(userId) as uc from czy.order_info
where month(paidtime) = 3
and isPaid = '已支付'
group by userId having count(userId) > 1) as fu;

●在三月份用户总数未提前计算得知时(可通过求得该月用户总数及复购的用户数来求得):

select count(userId) as 三月份用户总数,count(if(uc>1,1,null)) as 三月份复购用户数 from 
    (select userId ,count(userId) as uc from czy.order_info
    where month(paidtime) = 3 and isPaid = '已支付'
    group by userId) as fg;

●拓展思维:当需要考虑更多月份的更实际、更复杂繁多的计算时(此处以三个月来举例):

select md as 月份, count(userId) as 各月份用户总数,count(if(uc>1,1,null)) as 各月份复购用户数 from 
    (select userId, month(paidtime) as md, count(userId) as uc from czy.order_info
    where isPaid = '已支付'
    group by month(paidtime),userId) as fg
group by md ;
●只考虑三月份的回购率时:

思路:先筛选出‘三月份’‘已支付’的‘用户id’,再筛选出‘四月份’‘已支付’的用户中与三月份已支付的用户相同的‘用户id',对这些相同的’用户id‘进行去重、计数,并除以三月份的用户总数即可得出。

select concat((count(distinct userId)/54799)*100,'%') as 三月份回购率 from czy.order_info
where month(paidtime) = 4
and userId in (select distinct userId from czy.order_info
               where isPaid = '已支付' and month(paidtime) = 3
               group by userId);

●拓展思维:和复购率同理,当考虑更多月份时:

思路:先筛选出各个月份’已支付‘的’用户id‘,此处需将各个月份中用户购买的时间点统一成同一个月份的第一天,对各个月份和‘用户id‘进行分组,以便操作,再通过left join的方式,让其’用户id'一一对应,但前后的月份相差一个月,最后计算出当月的用户数 及回购的用户数。

select t1.m as 月份,count(t1.m) as 用户总数,count(t2.m) as 回购用户数, 
 concat((count(t2.m)/count(t1.m))*100,'%') as 回购率 from
    (select userId , date_format(paidtime,'%Y-%m-01') as m from czy.order_info
    where isPaid = '已支付'
    group by userId , date_format(paidtime,'%Y-%m-01')) as t1
left join
    (select userId , date_format(paidtime,'%Y-%m-01') as m from czy.order_info
    where isPaid = '已支付'
    group by userId , date_format(paidtime,'%Y-%m-01')) as t2
on t1.userId = t2.userId and t1.m = date_sub(t2.m,interval 1 month)
group by t1.m;

3)统计男女用户的消费频次是否有差异
思路:将order_info表与user_info表进行内部联结inner join,对’用户id‘和对应的’性别sex‘进行分组,并统计每个用户的消费次数,最后再对性别进行分组,求出’男‘&’女‘对应的平均消费次数

select sex as 性别,avg(ct) as 平均消费次数 from 
    (select userId,sex,count(userId) as ct from czy.order_info as o
    inner join
        (SELECT * FROM czy.userinfo
        where sex <> '') as s
    on o.userId = s.useId 
    where isPaid = '已支付'
    group by userId,sex) as b
group by sex;

4)统计多次消费,第一次消费和最后一次消费的间隔是多少;

思路:先筛选出‘已支付’的‘用户id’,并对用户进行分组group by,再挑选出消费次数>1的用户id,最后得出第一次消费和最后一次消费的时间点及时间差。

select userId as 用户,max(paidtime) as 最后一次消费,min(paidtime) as 第一次消费,datediff(max(paidtime),min(paidtime)) as 两次消费的时间差 from czy.order_info
where isPaid = '已支付'
group by userId having count(userId) > 1;

●平均间隔

select avg(jg) as 平均间隔 from
    (select userId,max(paidtime),min(paidtime),datediff(max(paidtime),min(paidtime)) as jg from czy.order_info
    where isPaid = '已支付'
    group by userId having count(userId) > 1) j

5)统计不同年龄段,用户的消费金额是否有差异;

思路:先对数据进行筛选,再以10年为时间间隔进行划分各个年龄段,最后再挑出各用户、年龄段及各年龄段的人数,最后再进行统计分析。

select age, avg(sp) from
    (select u.userId, age,sum(price) as sp from czy.order_info as u
    inner join
        (select useId, ceil((year(now())-year(birth))/10) as age from czy.userinfo
        where birth > '1901-00-00') as ad
    on u.userId = ad.useId
    where isPaid = '已支付'
    group by u.userId, age) as n
group by age;

6)统计消费的二八法则:消费top20%的用户贡献了多少额度;

思路:先统计全部的消费用户数,计算出其20%的用户数大致是多少

select (count(distinct userId) * 0.2)as 百分之20的用户数 from czy.order_info
where isPaid = '已支付'
select userId, sum(price) as total from czy.order_info
where isPaid = '已支付'
group by userId
order by total desc
limit 17129;
select sum(total) as top20的消费总额 from
    (select userId, sum(price) as total from czy.order_info
    where isPaid = '已支付'
    group by userId
    order by total desc
    limit 17129) u ;