Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

聚合

A key feature of analytics is reducing many values down to a summary. This act is called “aggregation” and always includes a function — for example, average or sum — that reduces values in the table to a single row.

aggregate 變換

The aggregate transform takes a tuple to create one or more new columns that “distill down” data from all the rows.

from invoices
aggregate { sum_of_orders = sum total }

上面的查詢計算 invoices 表所有行的 total 列的總和,以產生單一值。

當元組中包含一個或多個聚合表達式時,aggregate 可以一次產生多個摘要。aggregate 會捨棄元組中不存在的所有列。

from invoices
aggregate {
    num_orders = count this,
    sum_of_orders = sum total,
}

在上面的示例中,結果是有兩列的單一行。count 函式顯示傳入的表中的行數;sum 函式將所有行的 total 列的值相加。

分組

假設我們想要產生表中 每個城市 發票的摘要。我們可以為每個城市建立一個查詢,並聚合其行:

from invoices
filter billing_city == "Oslo"
aggregate { sum_of_orders = sum total }

但我們需要為每個城市做這種操作:LondonFrankfurt 等。當然這是重複的(且令人厭倦的)且容易出錯(因為我們需要手動輸入每個 billing_city)。此外,我們需要在開始前建立每個 billing_city 的列表。

group 變換

The group transform separates the table into groups (say, those having the same city) using information that’s already in the table. It then applies a transform to each group, and combines the results back together:

from invoices
group billing_city (
    aggregate {
        num_orders = count this,
        sum_of_orders = sum total,
    }
)

熟悉 SQL 的人可能已經注意到,我們剛才將聚合與分組分離了。

雖然這些操作在 SQL 中是相關的,但 PRQL 使得可以直接將 groupaggregate分開使用,同時與其他變換函式結合,例如:

from invoices
group billing_city (
    take 2
)

This code collects the first two rows for each city’s group.