關係
PRQL is designed on top of relational algebra, which is the established data model used by modern SQL databases. A relation has a rigid mathematical definition, which can be simplified to “a table of data”. For example, the invoices table from the Chinook database (https://github.com/lerocha/chinook-database) looks like this:
| invoice_id | customer_id | billing_city | other columns | total |
|---|---|---|---|---|
| 1 | 2 | Stuttgart | … | 1.98 |
| 2 | 4 | Oslo | … | 3.96 |
| 3 | 8 | Brussels | … | 5.94 |
| 4 | 14 | Edmonton | … | 8.91 |
| 5 | 23 | Boston | … | 13.86 |
| 6 | 37 | Frankfurt | … | 0.99 |
A relation is composed of rows. Each row in a relation contains a value for each of the relation’s columns. Each column in a relation has a unique name and a designated data type. The table above is a relation, and has columns named invoice_idand customer_id each with a data type of “integer number”, a billing_city column with a data type of “text”, several other columns, and a total column that contains floating-point numbers.
查詢
PRQL 的主要目的是建立查詢,以結合和轉換來自關係(如上面的 invoices 表)的資料。以下是最基本的查詢:
from invoices
[!NOTE] 嘗試在 Playground 中的每一個例子。在左側輸入查詢,並點擊右側的 Query Results 選項卡來查看結果。
The result of the query above is not terribly interesting, it’s just the same relation as before.
select 變換
select 函式基於列表選擇要通過的列,並捨棄所有其他列。形式上,該列表是以 { ... } 包裹的逗號分隔表達式的 元組。
假設我們只需要 order_id 和 total 列。使用 select 來選擇要通過的列。(在 Playground 中試試看。)
from invoices
select { order_id, total }
我們可以在一行或多行上寫入元組中的項目:尾部逗號被忽略。此外,我們可以將任何表達式指派給 變數,該變數成為 SQL 輸出中結果列的名稱。
from invoices
select {
OrderID = invoice_id,
Total = total,
}
這是與上面相同的查詢,改寫在多行上,並將 OrderID 和 Total 名稱指派給列。
一旦我們 select 某些列,後續變換將只能訪問在元組中命名的那些列。
derive 變換
To add columns to a relation, we can use the derive function. Let’s define a new column for Value Added Tax, set at 19% of the invoice total.
from invoices
derive { VAT = total * 0.19 }
新列的值可以是常數(例如數字或字串),也可以從現有列的值計算得出。注意新列被指派名稱 VAT。
join 變換
The join transform also adds columns to the relation by combining the rows from two relations “side by side”. To determine which rows from each relation should be joined, join has match criteria, written in ( ... ).
from invoices
join customers ( ==customer_id )
This example “connects” the customer information from the customers relation with the information from the invoices relation, using identical values of the customer_id column from each relation to match the rows.
It is frequently useful to assign an alias to both relations being joined together so that each relation’s columns can be referred to uniquely.
from inv=invoices
join cust=customers ( ==customer_id )
在上面的示例中,別名 inv 代表 invoices 關係,cust 代表 customers 關係。然後可以明確地參考 inv.billing_city 和 cust.last_name。
摘要
PRQL 操作資料的關係(表)。derive、select 和 join 變換改變表中列的數量。前兩者從不影響表中行的數量。join 可能會改變行的數量,這取決於選擇的聯接類型。
這個最後的示例將上述內容組合成一個單一查詢。它說明了 管道 - PRQL 的基礎。我們只是在查詢末尾新增新的行(變換)。每個變換修改由上述語句產生的關係,以產生所需的結果。
from inv=invoices
join cust=customers (==customer_id)
derive { VAT = inv.total * 0.19 }
select {
OrderID = inv.invoice_id,
CustomerName = cust.last_name,
Total = inv.total,
VAT,
}