输入以下语句,以使用 HDInsight 群集随附的示例数据来创建名为 log4jLogs 的表:(基于 URI 方案根据需要进行修改。)
DROP TABLE log4jLogs;
CREATE EXTERNAL TABLE log4jLogs (
t1 string,
t2 string,
t3 string,
t4 string,
t5 string,
t6 string,
t7 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' '
STORED AS TEXTFILE LOCATION 'wasbs:///example/data/';
SELECT t4 AS sev, COUNT(*) AS count FROM log4jLogs
WHERE t4 = '[ERROR]' AND INPUT__FILE__NAME LIKE '%.log'
GROUP BY t4;
这些语句执行以下操作:
INFO :
INFO : Status: Running (Executing on YARN cluster with App id application_1443698635933_0001)
INFO : Map 1: -/- Reducer 2: 0/1
INFO : Map 1: 0/1 Reducer 2: 0/1
INFO : Map 1: 0/1 Reducer 2: 0/1
INFO : Map 1: 0/1 Reducer 2: 0/1
INFO : Map 1: 0/1 Reducer 2: 0/1
INFO : Map 1: 0(+1)/1 Reducer 2: 0/1
INFO : Map 1: 0(+1)/1 Reducer 2: 0/1
INFO : Map 1: 1/1 Reducer 2: 0/1
INFO : Map 1: 1/1 Reducer 2: 0(+1)/1
INFO : Map 1: 1/1 Reducer 2: 1/1
+----------+--------+--+
| sev | count |
+----------+--------+--+
| [ERROR] | 3 |
+----------+--------+--+
1 row selected (47.351 seconds)
退出 Beeline:
!exit
运行 HiveQL 文件
本示例是上一示例的延续部分。 使用以下步骤创建文件,并使用 Beeline 运行该文件。
使用以下命令创建一个名为 query.hql 的文件:
nano query.hql
将以下文本用作文件的内容。 此查询创建名为 errorLogs 的新“内部”表:
CREATE TABLE IF NOT EXISTS errorLogs (t1 string, t2 string, t3 string, t4 string, t5 string, t6 string, t7 string) STORED AS ORC;
INSERT OVERWRITE TABLE errorLogs SELECT t1, t2, t3, t4, t5, t6, t7 FROM log4jLogs WHERE t4 = '[ERROR]' AND INPUT__FILE__NAME LIKE '%.log';