聚合
A key feature of analytics is reducing many values down to a summary. This act is called “aggregation” and always includes a function — for example, average or sum — that reduces values in the table to a single row.
aggregate 變換
The aggregate transform takes a tuple to create one or more new columns that “distill down” data from all the rows.
from invoices
aggregate { sum_of_orders = sum total }
上面的查詢計算 invoices 表所有行的 total 列的總和,以產生單一值。
當元組中包含一個或多個聚合表達式時,aggregate 可以一次產生多個摘要。aggregate 會捨棄元組中不存在的所有列。
from invoices
aggregate {
num_orders = count this,
sum_of_orders = sum total,
}
在上面的示例中,結果是有兩列的單一行。count 函式顯示傳入的表中的行數;sum 函式將所有行的 total 列的值相加。
分組
假設我們想要產生表中 每個城市 發票的摘要。我們可以為每個城市建立一個查詢,並聚合其行:
from invoices
filter billing_city == "Oslo"
aggregate { sum_of_orders = sum total }
但我們需要為每個城市做這種操作:London、Frankfurt 等。當然這是重複的(且令人厭倦的)且容易出錯(因為我們需要手動輸入每個 billing_city)。此外,我們需要在開始前建立每個 billing_city 的列表。
group 變換
The group transform separates the table into groups (say, those having the same city) using information that’s already in the table. It then applies a transform to each group, and combines the results back together:
from invoices
group billing_city (
aggregate {
num_orders = count this,
sum_of_orders = sum total,
}
)
熟悉 SQL 的人可能已經注意到,我們剛才將聚合與分組分離了。
雖然這些操作在 SQL 中是相關的,但 PRQL 使得可以直接將 group 和 aggregate分開使用,同時與其他變換函式結合,例如:
from invoices
group billing_city (
take 2
)
This code collects the first two rows for each city’s group.