【数据库】Hive SQL--如何使用分位数函数（percentile）_hive percentile函数_J小白Y的博客

link之家

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

微笑的汉堡包 · 灵境行者三道山娘娘和主角是谁-QQ阅读· 6 月前 ·

刚分手的毛衣 · 资讯详情-郑州大学就业创业服务网(job.v ...· 1 年前 ·

高大的黑框眼镜 · 武逆九天漫画_抖抖音· 1 年前 ·

胆小的签字笔 · 苏家屯地铁站是几号线地铁-是属于哪个区-苏家 ...· 1 年前 ·

从容的香蕉 · sana漫画 - 抖音· 1 年前 ·

在做一些酒店产量分析时，用到统计学中常用的分位数函数，所以就学习了一下分位数函数在HIVE中的应用。

HIVE中有两个关于分为数的函数： percentile 和 percentile_approx。

使用方式：

percentile： percentile(col, p) col是要计算的列（值必须为 int 类型），p的取值为0-1，若为0.2，那么就是2分位数，依次类推。

percentile_approx： percentile_approx(col, p)。列为数值类型都可以。

percentile_approx 还有一种形式percentile_approx(col, p，B)，参数B控制内存消耗的近似精度，B越大，结果的精度越高。默认值为10000。当col字段中的distinct值的个数小于B时，结果就为准确的百分位数。

如果需要多个分位数，可以一次性取出来，案例如下：

去每天的UV的第二个十分位数、第四个十分位数，第六个十分位数、第八个十分位数：

select d, 
       percentile_approx(uv, array(0.2,0.4,0.6,0.8), 9999) as uv --2%分位数作为最小值       
  from aa
 group by d

结果如下：

在做一些酒店产量分析时，用到统计学中常用的分位数函数，所以就学习了一下分位数函数在HIVE中的应用。HIVE中有两个关于分为数的函数：percentile和percentile_approx。使用方式：percentile：percentile(col, p) col是要计算的列（值必须为int类型），p的取值为0-1，若为0.2，那么就是2分位数，依次类推。percentile_...

开门见山的说， hive 中有两个函数 percen tile 和 percen tile _approx，可以用来计算 分位数 。而中位数即2 分位数 ，那么同样可以使用该函数计算。具体使用方如下： percen tile ： percen tile (col, p) col是要计算的列（值必须为int类型），p的取值为0-1，若为0.5，那么就是2 分位数 ，即中位数。 percen tile _approx： percen tile _approx(col, p)。列为数值类型都可以。 percen tile _approx还有一种形式 percen t

percen tile 函数和 percen tile _approx 函数 : 其使用方式为 percen tile (col, p)、 percen tile _approx(col, p,B)， .返回col列p分位上的值。B用来控制内存消耗的精度。实际col中distinct的值<B返回的时精确的值。其中 percen tile 要求输入的字段必须是int类型的，而 percen t... 1. UNIX时间戳转日期函数 : from_unixtime 18 2. 获取当前UNIX时间戳函数 : unix_timestamp 18 3. 日期转UNIX时间戳函数 : unix_timestamp 18 4. 指定格式日期转UNIX时间戳函数 : unix_timestamp 18 5. 日期时间转日期函数 : to_date 19 6. 日期转年函数 : year 19 7. 日期转月函数 : month 19 8. 日期转天函数 : day 19 9. 日期转小时函数 : hour 20 10. 日期转分钟函数 : minute 20 11. 日期转秒函数 : second 20 12. 日期转周函数 : weekofyear 20 13. 日期比较函数 : datediff 21 14. 日期增加函数 : date_add 21 15. 日期减少函数 : date_sub 21 六、条件函数 21 1. If 函数 : if 21 2. 非空查找函数 : COALESCE 22 3. 条件判断函数：CASE 22 4. 条件判断函数：CASE 22 七、字符串函数 23 1. 字符串长度函数：length 23 2. 字符串反转函数：reverse 23 3. 字符串连接函数：concat 23 4. 带分隔符字符串连接函数：concat_ws 23 5. 字符串截取函数：substr,substring 24 6. 字符串截取函数：substr,substring 24 7. 字符串转大写函数：upper,ucase 24 8. 字符串转小写函数：lower,lcase 25 9. 去空格函数：trim 25 10. 左边去空格函数：ltrim 25 11. 右边去空格函数：rtrim 25 12. 正则表达式替换函数：regexp_replace 26 13. 正则表达式解析函数：regexp_extract 26 14. URL解析函数：parse_url 26 15. json解析函数：get_json_object 27 16. 空格字符串函数：space 27 17. 重复字符串函数：repeat 27 18. 首字符ascii 函数：ascii 28 19. 左补足函数：lpad 28 20. 右补足函数：rpad 28 21. 分割字符串函数 : split 28 22. 集合查找函数 : find_in_set 29 八、集合统计函数 29 1. 个数统计函数 : count 29 2. 总和统计函数 : sum 29 3. 平均值统计函数 : avg 30 4. 最小值统计函数 : min 30 5. 最大值统计函数 : max 30 6. 非空集合总体变量函数 : var_pop 30 7. 非空集合样本变量函数 : var_samp 31 8. 总体标准偏离函数 : stddev_pop 31 9. 样本标准偏离函数 : stddev_samp 31 10．中位数函数 : percen tile 31 11. 中位数函数 : percen tile 31 12. 近似中位数函数 : percen tile _approx 32 13. 近似中位数函数 : percen tile _approx 32 14. 直方图: histogram_numeric 32 九、复合类型构建操作 32 1. Map类型构建: map 32 2. Struct类型构建: struct 33 3. array类型构建: array 33 十、复杂类型访问操作 33 1. array类型访问: A[n] 33 2. map类型访问: M[key] 34 3. struct类型访问: S.x 34 十一、复杂类型长度统计函数 34 1. Map类型长度函数 : size(Map) 34 2. array类型长度函数 : size(Array) 34 3. 类型转换函数 35

percen tile ： percen tile (col, p) col是要计算的列（值必须为int类型），p的取值为0-1，若为0.2，那么就是2 分位数 ，依次类推。 percen tile _approx： percen tile _approx(col, p)。列为数值类型都可以。 percen tile _approx还有一种形式 percen tile _approx(col, p，B)，参数B控制内存消耗的近似精度，B越大，结果的精度越高。默认值为10000。当col字段中的distinct值的个数小于B时，结果就为准

hive 里面有个 percen tile 函数和 percen tile _approx 函数，其使用方式为 percen tile (col, p)、 percen tile _approx(col, p)，p∈(0,1)p∈(0,1) 其中 percen tile 要求输入的字段必须是int类型的，而 percen tile _approx则是数值类似型的都可以。其实 percen tile _approx还有一个参数B： percen tile _approx(col, p，B)。参数B控制内存消耗的近似精度，B越大，结果的准确度越高。默认.