mongodb Aggregation聚合操作之$bucket

念雪

在上一篇 mongodb Aggregation聚合操作之$facet 中详细介绍了mongodb聚合操作中的$facet使用以及参数细节。本篇将开始介绍Aggregation聚合操作中的$bucket操作。

说明：

根据指定的表达式和bucket边界将传入的文档分类到称为bucket的组中，并为每个bucket输出一个文档。每个输出文档都包含一个_id字段，其值指定bucket的包含下界。输出选项指定每个输出文档中包含的字段。

$bucket只为至少包含一个输入文档的bucket生成输出文档。

语法：

{

$bucket: {

groupBy: <expression>,

boundaries: [ <lowerbound1>, <lowerbound2>, ... ],

default: <literal>,

output: {

<output1>: { <$accumulator expression> },

...

<outputN>: { <$accumulator expression> }

}

参数讲解：

groupBy：用来对文档进行分组的表达式。要指定字段路径，请在字段名称前加上美元符号$并将其括在引号中。除非$bucket包含默认规范，否则每个输入文档必须将groupBy字段路径或表达式解析为属于边界指定的范围之一的值。

boundaries：一个基于groupBy表达式的值数组，该表达式指定每个bucket的边界。每一对相邻的值充当桶的包含下边界和独占上边界。您必须指定至少两个边界。

default：可选的。指定附加bucket的_id的文字，该bucket包含groupBy表达式结果不属于边界指定的bucket的所有文档。如果未指定，则每个输入文档必须将groupBy表达式解析为由边界指定的bucket范围中的一个值，否则操作将抛出错误。默认值必须小于最低边界值，或大于或等于最高边界值。

默认值可以是与边界项不同的类型。

output：可选的。除_id字段外，指定输出文档中要包含的字段的文档。要指定要包含的字段，必须使用累加器表达式。

1. 示例

1.1. 单bucket示例

初始化数据：

db.artists.insertMany([

{ "_id" : 1, "last_name" : "Bernard", "first_name" : "Emil", "year_born" : 1868, "year_died" : 1941, "nationality" : "France" },

{ "_id" : 2, "last_name" : "Rippl-Ronai", "first_name" : "Joszef", "year_born" : 1861, "year_died" : 1927, "nationality" : "Hungary" },

{ "_id" : 3, "last_name" : "Ostroumova", "first_name" : "Anna", "year_born" : 1871, "year_died" : 1955, "nationality" : "Russia" },

{ "_id" : 4, "last_name" : "Van Gogh", "first_name" : "Vincent", "year_born" : 1853, "year_died" : 1890, "nationality" : "Holland" },

{ "_id" : 5, "last_name" : "Maurer", "first_name" : "Alfred", "year_born" : 1868, "year_died" : 1932, "nationality" : "USA" },

{ "_id" : 6, "last_name" : "Munch", "first_name" : "Edvard", "year_born" : 1863, "year_died" : 1944, "nationality" : "Norway" },

{ "_id" : 7, "last_name" : "Redon", "first_name" : "Odilon", "year_born" : 1840, "year_died" : 1916, "nationality" : "France" },

{ "_id" : 8, "last_name" : "Diriks", "first_name" : "Edvard", "year_born" : 1855, "year_died" : 1930, "nationality" : "Norway" }

])

示例：

db.artists.aggregate( [

// First Stage

{

$bucket: {

groupBy: "$year_born", // 按year_born字段分组

boundaries: [ 1840, 1850, 1860, 1870, 1880 ], // 桶的边界

default: "Other", // 不属于Bucket的文档的Bucket id【如果一个文档不包含year_born字段，或者它的year_born字段在上面的范围之外，那么它将被放在_id值为“Other”的默认bucket中。】

output: { //输出

"count": { $sum: 1 },

"artists" :

{

$push: {

"name": { $concat: [ "$first_name", " ", "$last_name"] },

"year_born": "$year_born"

}

// 筛选结果大于3的

{

$match: { count: {$gt: 3} }

}

] )

结果是：

{

"_id" : 1860.0, //桶的包含下界。

"count" : 4.0,//桶中文档的计数。

"artists" : [ //包含bucket中每个艺术家信息的文档数组。每个文档都包含了艺术家的name，它是艺术家的first_name和last_name的连接(即$concat)

{

"name" : "Emil Bernard",

"year_born" : 1868.0

{

"name" : "Joszef Rippl-Ronai",

"year_born" : 1861.0

{

"name" : "Alfred Maurer",

"year_born" : 1868.0

{

"name" : "Edvard Munch",

"year_born" : 1863.0

}

]

}

1.2. 使用带有$facet的$bucket，通过多个字段实现bucket

可以使用$facet阶段在单个阶段中执行多个$bucket聚合。

初始化数据：

db.artwork.insertMany([

{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,

"price" : NumberDecimal("199.99") },

{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,

"price" : NumberDecimal("280.00") },

{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,

"price" : NumberDecimal("76.04") },

{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",

"price" : NumberDecimal("167.30") },

{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,

"price" : NumberDecimal("483.00") },

{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,

"price" : NumberDecimal("385.00") },

{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893

/* No price*/ },

{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,

"price" : NumberDecimal("118.42") }

])

示例：下面的操作使用$facet阶段中的两个$bucket阶段创建两个分组，一个按价格，另一个按年:

db.artwork.aggregate( [

{

$facet: { // 顶级$facet stage

"price": [ // Output field 1

{

$bucket: {

groupBy: "$price", // Field to group by

boundaries: [ 0, 200, 400 ], // Boundaries for the buckets

default: "Other", // Bucket id for documents which do not fall into a bucket

output: { // Output for each bucket

"count": { $sum: 1 },

"artwork" : { $push: { "title": "$title", "price": "$price" } },

"averagePrice": { $avg: "$price" }

}

"year": [ // Output field 2

{

$bucket: {

groupBy: "$year", // Field to group by

boundaries: [ 1890, 1910, 1920, 1940 ], // Boundaries for the buckets

default: "Unknown", // Bucket id for documents which do not fall into a bucket

output: { // Output for each bucket

"count": { $sum: 1 },

"artwork": { $push: { "title": "$title", "year": "$year" } }

}

]

}

] )

结果：

[ { price:

[ { _id: 0,

count: 4,

artwork:

[ { title: 'The Pillars of Society',

price:

{ _bsontype: 'Decimal128',

bytes: <Buffer 1f 4e 00 00 00 00 00 00 00 00 00 00 00 00 3c 30> } },

{ title: 'Dancer',

price:

{ _bsontype: 'Decimal128',

bytes: <Buffer b4 1d 00 00 00 00 00 00 00 00 00 00 00 00 3c 30> } },

{ title: 'The Great Wave off Kanagawa',

price:

{ _bsontype: 'Decimal128',

bytes: <Buffer 5a 41 00 00 00 00 00 00 00 00 00 00 00 00 3c 30> } },

{ title: 'Blue Flower',

price:

{ _bsontype: 'Decimal128',

bytes: <Buffer 42 2e 00 00 00 00 00 00 00 00 00 00 00 00 3c 30> } } ],

averagePrice:

{ _bsontype: 'Decimal128',

bytes: <Buffer d7 6d 15 00 00 00 00 00 00 00 00 00 00 00 38 30> } },

{ _id: 200,

count: 2,

artwork:

[ { title: 'Melancholy III',

price:

{ _bsontype: 'Decimal128',

bytes: <Buffer 60 6d 00 00 00 00 00 00 00 00 00 00 00 00 3c 30> } },

{ title: 'Composition VII',

price:

{ _bsontype: 'Decimal128',

bytes: <Buffer 64 96 00 00 00 00 00 00 00 00 00 00 00 00 3c 30> } } ],

averagePrice:

{ _bsontype: 'Decimal128',

bytes: <Buffer e2 81 00 00 00 00 00 00 00 00 00 00 00 00 3c 30> } },

{ _id: 'Other',

count: 2,

artwork:

[ { title: 'The Persistence of Memory',

price:

{ _bsontype: 'Decimal128',

bytes: <Buffer ac bc 00 00 00 00 00 00 00 00 00 00 00 00 3c 30> } },

{ title: 'The Scream' } ],

averagePrice:

{ _bsontype: 'Decimal128',

bytes: <Buffer ac bc 00 00 00 00 00 00 00 00 00 00 00 00 3c 30> } } ],

year:

[ { _id: 1890,

count: 2,

artwork:

[ { title: 'Melancholy III', year: 1902 },

{ title: 'The Scream', year: 1893 } ] },

{ _id: 1910,

count: 2,

artwork:

[ { title: 'Composition VII', year: 1913 },

{ title: 'Blue Flower', year: 1918 } ] },

{ _id: 1920,

count: 3,

artwork:

[ { title: 'The Pillars of Society', year: 1926 },

{ title: 'Dancer', year: 1925 },

{ title: 'The Persistence of Memory', year: 1931 } ] },

{ _id: 'Unknown',

count: 1,

artwork: [ { title: 'The Great Wave off Kanagawa' } ] } ] } ]

发布于 2021-03-18 19:13

MongoDB

1. 示例

1.1. 单bucket示例

1.2. 使用带有$facet的$bucket，通过多个字段实现bucket

文章被以下专栏收录

mongodb