运用SQL对数据进行聚合与分组排序
一、汇总分析
聚合函数是用于汇总的函数。所谓聚合,就是将多行汇总为一行。实际上,所有的聚合函数都是输入多行输出一行。
1.1 常用汇总函数总结
(1) COUNT:求某列行数
COUNT()函数有两种使用方式:
- 使用 COUNT(*)对表中行的数目进行计数,不管表列中包含的是空值 (NULL)还是非空值。
- 使用 COUNT(column name)对特定列中具有值的行进行计数,忽略 NULL 值。
对于 COUNT 函数来说,参数列不同计算的结果也会发生变化。
想要计算值的种类时,可以在 COUNT 函数的参数中使用 DISTINCT。 但是,DISTINCT不能用于 COUNT(*)。
(2) SUM(column name):对某列数据加总求和,只能对数值类型的列计算 (将NULL值事先剔除)
(3) AVG(column name):求某列数据的平均值,只能对数值类型的列计算 (将NULL值事先剔除)
AVG()也可以用来确定特定列或行的平均值。下面的例子返回特定供应商所提供产品的平均价格:
SELECT AVG(prod_price) AS avg_price FROM Products
WHERE vend_id = 'DLL01';
注意:AVG()只能用来确定特定数值列的平均值,而且列名必须作为函数参数给出。为了获得多个列的平均值,必须使用多个 AVG()函数。
(4) MAX(column name):求某列数据的最大值 (将NULL值事先剔除)
提示:虽然 MAX()一般用来找出最大的数值或日期值,但MySQL允许将它用来返回任意列中的最大值,包括返回文本列中的最大值。在用于文本数据时,MAX()返回按该列排序后的最后一行。
(5) MIN(column name):求某列数据的最小值 (将NULL值事先剔除)
类似于MAX函数,同样可以应用于非数值数据。
1.2 练习
练习一:查询课程编号为"0002"的总成绩
SELECT SUM(score) AS total_score FROM scores
WHERE c_id = 0002;
练习二:查询选了课程的学生人数
SELECT COUNT(DISTINCT id) AS num FROM scores;
二、分组(group by)
2.1 语法
使用GROUP BY子句时,会将表中的数据分为多个组进行处理。在GROUP BY 子句中指定的列称为聚合键或者分组列。
SELECT [column name1], ... [column name N]
FROM [table name]
WHERE [condition]
GROUP BY [groups]
分析方法:数据分组、应用函数、组合结果
SQL运行顺序:(1) from (2) where (3) group by (4) select
注意:
- GROUP BY子句中列出的每一列都必须是检索列或有效的表达式(但不能是聚集函数) 。如果在 SELECT中使用表达式,则必须在 GROUP BY子句中指定相同的表达式。不能使用别名。
- 如果分组列中包含具有 NULL 值的行,则 NULL 将作为一个分组返回。 如果列中有多行 NULL值,它们将分为一组。
- GROUP BY子句必须出现在WHERE子句之后,ORDER BY子句之前。
- GROUP BY子句的聚合结果是无序的。
2.2 练习
练习一:查询各科成绩最高和最低的分
SELECT c_id,
MAX(score) AS highest_score, MIN(score) AS 'lowest_score'
FROM scores
GROUP BY c_id;
练习二:查询每门课程被选修的学生数
SELECT c_id, COUNT(id) AS 'num_of_students'
FROM scores
GROUP BY c_id;
练习三:查询男生、女生人数
SELECT gender, COUNT(id) AS 'num_of_students'
FROM students
GROUP BY gender;
三、对分组结果指定条件(HAVING)
3.1 语法
HAVING 子句可以通过指定条件来选取特定的组。在 HAVING 子句中可以可以使用的要素有:常数、聚合函数以及 GROUP BY 子句指定的列名(即聚合键)。
SELECT [column name1], ... [column name N]
FROM [table name]
WHERE [condition]
GROUP BY [groups]
HAVING [condition]
HAVING 非常类似于WHERE,但是WHERE过滤行,而 HAVING过滤分组。
另一种理解方法是:WHERE在数据分组前进行过滤,HAVING在数据分组后进行过滤。
因此,WHERE排除的行不包括在分组中,因为这可能会改变计算值,从而影响 HAVING 子句中基于这些值过滤掉的分组。
3.2 练习
练习一:查询平均成绩大于60分学生的学号和平均成绩
SELECT id, AVG(score) AS 'average_score' FROM scores
GROUP BY id
HAVING average_score > 60;
练习二:查询至少选修两门课程的学生学号
SELECT id, COUNT(id) AS 'num_of_classes' FROM scores
GROUP BY id
HAVING COUNT(id) >= 2;
练习三:查询同名同姓学生名单并统计同名人数
SELECT name, COUNT(name) AS 'num_of_people' FROM students
GROUP BY name
HAVING COUNT(name) >= 2;
四、对查询结果排序(ORDER BY)
4.1 语法
4.1.1 按列名称排序
SELECT [column name1], ... [column name N]
FROM [table name]
WHERE [condition]
ORDER BY [column name1] ASC/DESC, [column name2] ASC/DESC
LIMIT N;
SQL运行顺序:
from -> where -> group by -> having -> select -> order by -> limit
注意: 在指定一条 ORDER BY 子句时,应该保证它是 SELECT 语句中最后一条子句。 如果它不是最后的子句,将会出现错误消息。 此外,通常ORDER BY子句中使用的列将是为显示而选择的列。但是,实际上并不一定要这样,用非检索的列排序数据是完全合法的。
4.1.2 按列位置排序
除了能用列名指出排序顺序外,ORDER BY 还支持按相对列位置进行排序。例如:
SELECT prod_id, prod_price, prod_name
FROM Products
ORDER BY 2, 3;
ORDER BY 2 表示按 SELECT 清单中的第二个列 prod_name 进行排序。ORDER BY 2, 3 表示先按 prod_price,再按 prod_name 进行排序。
按列位置排序的主要好处在于不用重新输入列名。但它也有缺点。首先,不明确给出列名有可能造成错用列名排序。其次,在对SELECT 清单进行更改时容易错误地对数据进行排序(忘记对 ORDER BY子句做相应的改动)。 最后,如果进行排序的列不在 SELECT清单中,则不能按列位置排序。
4.1.3 对空值排序
SELECT ... FROM ... WHERE ...
ORDER BY IS NULL DESC, [column name1], [column name2], ... [column name N]
类似的排序问题可参见SQLZOO【Select from Nobel 】部分第14题。
4.1.4 按照某些条件逻辑排序
例如:如果job是"SALESMAN",则根据"comm"排序,否则根据"sal"排序
方法1:
SELECT ename, sal, job, comm FROM emp
ORDER BY
CASE WHEN job = 'SALESMAN' THEN comm
ELSE sal
END;
方法2:使用CASE表达式动态改变排序结果
SELECT ename, sal, job, comm,
CASE WHEN job = 'SALESMAN' THEN comm
ELSE sal
END AS ordered
FROM emp
ORDER BY 5;
更多 CASE WHEN 表达式用法请参考如下文章:
4.2 练习
练习一:查询不及格的课程并按课程号从大到小排列
SELECT c_id, score FROM scores
WHERE score < 60
ORDER BY c_id DESC;
练习二:查询每门课程的平均成绩,结果按平均成绩升序排序,平均成绩相同时,按课程号降序排列
SELECT c_id, AVG(score) AS 'average_score' FROM scores
GROUP BY c_id
ORDER BY average_score, c_id DESC;
练习三:查询课程编号为“04”且分数小于60的学生学号,结果按分数降序排列
SELECT id, score FROM scores
WHERE c_id = '0004' AND score < 60
ORDER BY score DESC;
练习四:统计每门课程的学生选修人数(超过2人的课程才统计),输出课程号和选修人数,查询结果按人数降序排序,若人数相同,按课程号升序排序
SELECT c_id, COUNT(id) AS 'number_of_students' FROM scores
GROUP BY c_id
HAVING COUNT(id) > 2
ORDER BY 'number_of_students' DESC, c_id ;
练习五:查询两门以上不及格课程的同学的学号及不及格课程的平均成绩
SELECT id, AVG(score) AS 'average_score' FROM scores
WHERE score < 60
GROUP BY id
HAVING COUNT(id) > 2;
五、练习
来源: SQLZOO
5.1【Select from Nobel 练习题】
地址: https:// sqlzoo.net/wiki/SELECT_ from_Nobel_Tutorial
数据表:诺贝尔奖信息表
列名含义:
yr:获奖年份
subject:所属学科
winner:获奖人
练习一:显示1950年获奖情况
Show Nobel prizes for 1950.
SELECT * FROM nobel
WHERE yr = 1950;
练习二:显示1962年诺贝尔文学奖获奖者姓名
Show who won the 1962 prize for Literature.
SELECT winner FROM nobel
WHERE yr = 1962
AND subject = 'Literature';
练习三:显示'Albert Einstein'获奖的年份及所属学科
Show the year and subject that won 'Albert Einstein' his prize.
SELECT yr, subject FROM nobel
WHERE winner = 'Albert Einstein';
练习四:显示2000年以来诺贝尔和平奖的得主
Give the name of the 'Peace' winners since the year 2000, including 2000.
SELECT winner FROM nobel
WHERE subject = 'Peace' AND yr >= 2000;
练习五:显示1980至1989年以来诺贝尔文学奖的获奖情况
Show all details ( yr , subject , winner ) of the Literature prize winners for 1980 to 1989 inclusive.
SELECT * FROM nobel
WHERE yr BETWEEN 1980 AND 1989
AND subject = 'Literature';
练习六:显示Theodore Roosevelt,Woodrow Wilson,Jimmy Carter以及Barack Obama的获奖情况
Show all details of the presidential winners:
- Theodore Roosevelt
- Woodrow Wilson
- Jimmy Carter
- Barack Obama
SELECT * FROM nobel
WHERE winner IN ('Theodore Roosevelt', 'Woodrow Wilson',
'Jimmy Carter', 'Barack Obama');
练习七:显示(first name)叫John的获奖者名单
Show the winners with first name John.
SELECT winner FROM nobel
WHERE winner LIKE 'John%';
练习八:显示1980年诺贝尔物理学奖和1984年诺贝尔化学奖的获奖情况
Show the year, subject, and name of Physics winners for 1980 together with the Chemistry winners for 1984.
SELECT * FROM nobel
WHERE (yr = 1980 AND subject = 'Physics')
OR (yr = 1984 AND subject = 'Chemistry');
练习九:显示1980年除诺贝尔化学奖和诺贝尔医学奖以外的获奖情况
Show the year, subject, and name of winners for 1980 excluding Chemistry and Medicine.
SELECT * FROM nobel
WHERE yr = 1980
AND subject NOT IN ('Chemistry', 'Medicine');
练习十:显示1910年之前诺贝尔医学奖和2004年及其以后诺贝尔文学奖的获奖情况
Show year, subject, and name of people who won a 'Medicine' prize in an early year (before 1910, not including 1910) together with winners of a 'Literature' prize in a later year (after 2004, including 2004).
SELECT * FROM nobel
WHERE (yr < 1910 AND subject = 'Medicine')
OR (yr >= 2004 AND subject = 'Literature');
练习十一:显示PETER GRÜNBERG的获奖情况
Find all details of the prize won by PETER GRÜNBERG.
SELECT * FROM nobel
WHERE winner = 'PETER GRÜNBERG';
练习十二:显示EUGENE O'NEILL的获奖情况
Find all details of the prize won by EUGENE O'NEILL.
SELECT * FROM nobel
WHERE winner = 'EUGENE O\'NEILL';
SQL转义字符:
练习十三:显示名字以Sir开头的获奖者名称、获奖年份、以及所属科目,按照时间倒序、名字顺序排序
List the winners, year and subject where the winner starts with Sir . Show the the most recent first, then by name order.
SELECT winner, yr, subject FROM nobel
WHERE winner LIKE 'Sir%'
ORDER BY yr DESC, winner;
练习十四:显示1984年获奖者姓名及奖项所属学科,按照所属学科,姓名的顺序排序,并将化学奖和物理学奖放在最后
Show the 1984 winners and subject ordered by subject and winner name; but list Chemistry and Physics last.
SELECT winner, subject FROM nobel
WHERE yr = 1984
ORDER BY subject IN ('Chemistry','Physics'), subject, winner;
SQL解释:
subject IN ('Chemistry','Physics') 返回值为0或1,如果subject属于Chemistry或者Physics,则返回1,否则返回0,再用ORDER BY对其排序,即0在前面,1在最后面。
5.2【SUM and COUNT 练习题】
地址: https:// sqlzoo.net/wiki/SUM_and _COUNT
数据表:世界国家信息表
列名含义:
name:国家名称
continent:该国家属于哪个州
area:国土面积
population:人口数
gdp:国内生产总值
练习一:显示世界总人口数
Show the total population of the world.
SELECT SUM(population) AS 'total_population' FROM world;
练习二:显示所有的洲(不可以有重复值)
List all the continents - just once each.
SELECT DISTINCT continent FROM world;
练习三:显示非洲GDP总和
Give the total GDP of Africa.
SELECT SUM(gdp) AS 'total_gdp' FROM world
WHERE continent = 'Africa';
练习四:显示国土面积至少为1000000平方公里的国家数
How many countries have an area of at least 1000000?
SELECT COUNT(name) AS 'number_of_big_country' FROM world
WHERE area >= 1000000;
练习五:显示爱沙尼亚,拉脱维亚,以及立陶宛三个国家的人口总数
What is the total population of ('Estonia', 'Latvia', 'Lithuania')?
SELECT SUM(population) AS 'total_population' FROM world