
PostgreSQL Summary stats and window functions
Introduction to windows functions



Window functions
-
Perform calculations across a set of table rows that are somehow related to the current row
-
Similar to
GROUP BYaggregation, but all rows remain in the output (no grouping)
Uses
-
Fetching values from preceding or following rows (e.g. fetching the previous row’s value)
-
Determining reigning champion status
-
Calculating growth over time
-
-
Assigning ordinal ranks (1st, 2nd, etc.) to rows based on their values’ positions in a sorted list
-
Running totals, moving averages



FUNCTION_NAME()
-
The specific operation to perform
-
Examples:
ROW_NUMBER(),RANK(),SUM(),AVG(),LAG(),LEAD()
OVER()
-
Defines the “window” - the set of rows to perform calculations on
-
OVER()with empty parentheses means “all rows in the result set”
Key Clauses inside OVER():
PARTITION BY
-
Divides the result set into groups/partitions (similar to GROUP BY but without collapsing rows)
-
Example:
OVER (PARTITION BY Country)- calculations are done separately for each country
ORDER BY
-
Sorts the rows within each window/partition
-
Example:
OVER (ORDER BY Year)- orders rows by year
ROWS/RANGE PRECEDING/FOLLOWING/UNBOUNDED
-
Defines the frame - which specific rows to include in the window:
-
PRECEDING - rows before the current row
-
FOLLOWING - rows after the current row
-
UNBOUNDED - all rows from the start/end
-
CURRENT ROW - the current row
-
-- Number all rows sequentially
ROW_NUMBER() OVER ()
-- Number rows within each country group
ROW_NUMBER() OVER (PARTITION BY Country)
-- Rank by year within each country
RANK() OVER (PARTITION BY Country ORDER BY Year)
-- Running total by year
SUM(Medals) OVER (ORDER BY Year ROWS UNBOUNDED PRECEDING)
Assign a number to each year in which Summer Olympic games were held.
SELECT
Year,
-- Assign numbers to each year
ROW_NUMBER() OVER() AS Row_N
FROM (
SELECT DISTINCT year
FROM Summer_Medals
ORDER BY Year ASC
) AS Years
ORDER BY Year ASC;
ORDER BY

Enter ORDER BY
ORDER BY in OVER orders the rows related to the current row
Example: Ordering by year in descending order in ROW_NUMBER’s OVER clause will assign 1 to the most recent year’s rows




Reigning champion
-
A reigning champion is a champion who’s won both the previous and current years’ competitions
-
The previous and current year’s champions need to be in the same row (in two different columns)
Enter LAG
LAG(column, n) OVER (...)returnscolumn’s value at the rownrows before the current rowLAG(column, 1) OVER (...)returns the previous row’s value


Assign a number to each year in which Summer Olympic games were held so that rows with the most recent years have lower row numbers.
SELECT
Year,
-- Assign the lowest numbers to the most recent years
ROW_NUMBER() OVER (ORDER BY Year DESC) AS Row_N
FROM (
SELECT DISTINCT Year
FROM Summer_Medals
) AS Years
ORDER BY Year;
For each athlete, count the number of medals he or she has earned.
SELECT
-- Count the number of medals each athlete has earned
athlete,
COUNT(medal) AS Medals
FROM Summer_Medals
GROUP BY Athlete
ORDER BY Medals DESC;
Having wrapped the previous query in the Athlete_Medals CTE, rank each athlete by the number of medals they’ve earned.
WITH Athlete_Medals AS (
SELECT
-- Count the number of medals each athlete has earned
Athlete,
COUNT(*) AS Medals
FROM Summer_Medals
GROUP BY Athlete)
SELECT
-- Number each athlete by how many medals they've earned
athlete,
ROW_NUMBER() OVER (ORDER BY medals DESC) AS Row_N
FROM Athlete_Medals
ORDER BY Medals DESC;
Return each year’s gold medalists in the Men’s 69KG weightlifting competition.
SELECT
-- Return each year's champions' countries
year,
country AS champion
FROM Summer_Medals
WHERE
Discipline = 'Weightlifting' AND
Event = '69KG' AND
Gender = 'Men' AND
Medal = 'Gold';
Having wrapped the previous query in the Weightlifting_Gold CTE, get the previous year’s champion for each year.
WITH Weightlifting_Gold AS (
SELECT
-- Return each year's champions' countries
Year,
Country AS champion
FROM Summer_Medals
WHERE
Discipline = 'Weightlifting' AND
Event = '69KG' AND
Gender = 'Men' AND
Medal = 'Gold')
SELECT
Year, Champion,
-- Fetch the previous year's champion
LAG(Champion) OVER
(ORDER BY YEAR ASC) AS Last_Champion
FROM Weightlifting_Gold
ORDER BY Year ASC;
PARTITION BY

Enter PARTITION BY
-
PARTITION BYsplits the table into partitions based on a column’s unique values- The results aren’t rolled into one column
Operated on separately by the window function
-
ROW_NUMBERwill reset for each partition -
LAGwill only fetch a row’s previous value if its previous row is in the same partition



Return the previous champions of each year’s event by gender.
WITH Tennis_Gold AS (
SELECT DISTINCT
Gender, Year, Country
FROM Summer_Medals
WHERE
Year >= 2000 AND
Event = 'Javelin Throw' AND
Medal = 'Gold')
SELECT
Gender, Year,
Country AS Champion,
-- Fetch the previous year's champion by gender
LAG(Country) OVER (PARTITION BY Gender
ORDER BY Year ASC) AS Last_Champion
FROM Tennis_Gold
ORDER BY Gender ASC, Year ASC;
Return the previous champions of each year’s events by gender and event.
WITH Athletics_Gold AS (
SELECT DISTINCT
Gender, Year, Event, Country
FROM Summer_Medals
WHERE
Year >= 2000 AND
Discipline = 'Athletics' AND
Event IN ('100M', '10000M') AND
Medal = 'Gold')
SELECT
Gender, Year, Event,
Country AS Champion,
-- Fetch the previous year's champion by gender and event
LAG(Country) OVER (PARTITION BY Gender, Event
ORDER BY Year ASC) AS Last_Champion
FROM Athletics_Gold
ORDER BY Event ASC, Gender ASC, Year ASC;
Fetching
The four functions
Relative
-
LAG(column, n)returnscolumn’s value at the row n rows before the current row -
LEAD(column, n)returnscolumn’s value at the row n rows after the current row
Absolute
-
FIRST_VALUE(column)returns the first value in the table or partition -
LAST_VALUE(column)returns the last value in the table or partition




For each year, fetch the current gold medalist and the gold medalist 3 competitions ahead of the current row.
WITH Discus_Medalists AS (
SELECT DISTINCT
Year,
Athlete
FROM Summer_Medals
WHERE Medal = 'Gold'
AND Event = 'Discus Throw'
AND Gender = 'Women'
AND Year >= 2000)
SELECT
-- For each year, fetch the current and future medalists
year,
Athlete,
LEAD(Athlete, 3) OVER (ORDER BY Year ASC) AS Future_Champion
FROM Discus_Medalists
ORDER BY Year ASC;
Return all athletes and the first athlete ordered by alphabetical order.
WITH All_Male_Medalists AS (
SELECT DISTINCT
Athlete
FROM Summer_Medals
WHERE Medal = 'Gold'
AND Gender = 'Men')
SELECT
-- Fetch all athletes and the first athlete alphabetically
athlete,
FIRST_VALUE(athlete) OVER(ORDER BY athlete ASC) AS First_Athlete
FROM All_Male_Medalists;
-
Return the year and the city in which each Olympic games were held.
-
Fetch the last city in which the Olympic games were held.
WITH Hosts AS (
SELECT DISTINCT Year, City
FROM Summer_Medals)
SELECT
Year,
City,
-- Get the last city in which the Olympic games were held
LAST_VALUE(city) OVER (
ORDER BY year ASC
RANGE BETWEEN
UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
) AS Last_City
FROM Hosts
ORDER BY Year ASC;
Ranking
The ranking functions
-
ROW_NUMBER()always assigns unique numbers, even if two rows’ values are the same -
RANK()assigns the same number to rows with identical values, skipping over the next numbers in such cases -
DENSE_RANK()also assigns the same number to rows with identical values, but doesn’t skip over the next numbers







Rank each athlete by the number of medals they’ve earned – the higher the count, the higher the rank – with identical numbers in case of identical values.
WITH Athlete_Medals AS (
SELECT
Athlete,
COUNT(*) AS Medals
FROM Summer_Medals
GROUP BY Athlete)
SELECT
Athlete,
Medals,
-- Rank athletes by the medals they've won
RANK() OVER (ORDER BY medals DESC) AS Rank_N
FROM Athlete_Medals
ORDER BY Medals DESC;
Rank each country’s athletes by the count of medals they’ve earned – the higher the count, the higher the rank – without skipping numbers in case of identical values.
WITH Athlete_Medals AS (
SELECT
Country, Athlete, COUNT(*) AS Medals
FROM Summer_Medals
WHERE
Country IN ('JPN', 'KOR')
AND Year >= 2000
GROUP BY Country, Athlete
HAVING COUNT(*) > 1)
SELECT
Country,
-- Rank athletes in each country by the medals they've won
athlete,
DENSE_RANK() OVER (PARTITION BY Country
ORDER BY Medals DESC) AS Rank_N
FROM Athlete_Medals
ORDER BY Country ASC, RANK_N ASC;
Paging
What is paging?
-
Paging: Splitting data into (approximately) equal chunks
-
Uses
-
Many APIs return data in “pages” to reduce data being sent
-
Separating data into quartiles or thirds (top middle 33%, and bottom thirds) to judge performance
-
Enter NTILE
NTILE(n)splits the data intonapproximately equal pages




Split the distinct events into exactly 111 groups, ordered by event in alphabetical order.
WITH Events AS (
SELECT DISTINCT Event
FROM Summer_Medals)
SELECT
--- Split up the distinct events into 111 unique groups
Event,
NTILE(111) OVER (ORDER BY event ASC) AS Page
FROM Events
ORDER BY Event ASC;
Split the athletes into top, middle, and bottom thirds based on their count of medals.
WITH Athlete_Medals AS (
SELECT Athlete, COUNT(*) AS Medals
FROM Summer_Medals
GROUP BY Athlete
HAVING COUNT(*) > 1)
SELECT
Athlete,
Medals,
-- Split athletes into thirds by their earned medals
NTILE(3) OVER(ORDER BY Medals DESC) AS Third
FROM Athlete_Medals
ORDER BY Medals DESC, Athlete ASC;
Return the average of each third.
WITH Athlete_Medals AS (
SELECT Athlete, COUNT(*) AS Medals
FROM Summer_Medals
GROUP BY Athlete
HAVING COUNT(*) > 1),
Thirds AS (
SELECT
Athlete,
Medals,
NTILE(3) OVER (ORDER BY Medals DESC) AS Third
FROM Athlete_Medals)
SELECT
-- Get the average medals earned in each third
Third,
AVG(Medals) AS Avg_Medals
FROM Thirds
GROUP BY Third
ORDER BY Third ASC;
Aggregate window functions





Return the athletes, the number of medals they earned, and the medals running total, ordered by the athletes’ names in alphabetical order.
WITH Athlete_Medals AS (
SELECT
Athlete, COUNT(*) AS Medals
FROM Summer_Medals
WHERE
Country = 'USA' AND Medal = 'Gold'
AND Year >= 2000
GROUP BY Athlete)
SELECT
-- Calculate the running total of athlete medals
athlete,
Medals,
SUM(Medals) OVER (ORDER BY Athlete ASC) AS Max_Medals
FROM Athlete_Medals
ORDER BY Athlete ASC;
Return the year, country, medals, and the maximum medals earned so far for each country, ordered by year in ascending order.
WITH Athlete_Medals AS (
SELECT
Athlete, COUNT(*) AS Medals
FROM Summer_Medals
WHERE
Country = 'USA' AND Medal = 'Gold'
AND Year >= 2000
GROUP BY Athlete)
SELECT
-- Calculate the running total of athlete medals
athlete,
Medals,
SUM(Medals) OVER (ORDER BY Athlete ASC) AS Max_Medals
FROM Athlete_Medals
ORDER BY Athlete ASC;
WITH France_Medals AS (
SELECT
Year, COUNT(*) AS Medals
FROM Summer_Medals
WHERE
Country = 'FRA'
AND Medal = 'Gold' AND Year >= 2000
GROUP BY Year)
SELECT
Year,
Medals,
MIN(Medals) OVER (ORDER BY year ASC) AS Min_Medals
FROM France_Medals
ORDER BY Year ASC;
Frames
ROWS BETWEEN

-
ROWS BETWEEN [START] AND [FINISH]-
n PRECEDING:nrows before the current row -
CURRENT ROW: the current row -
n FOLLOWING:nrows after the current row
-
Examples
-
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW -
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING -
ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING




Return the year, medals earned, and the maximum medals earned, comparing only the current year and the next year.
WITH Scandinavian_Medals AS (
SELECT
Year, COUNT(*) AS Medals
FROM Summer_Medals
WHERE
Country IN ('DEN', 'NOR', 'FIN', 'SWE', 'ISL')
AND Medal = 'Gold'
GROUP BY Year)
SELECT
-- Select each year's medals
year,
Medals,
-- Get the max of the current and next years' medals
MAX(Medals) OVER (ORDER BY year ASC
ROWS BETWEEN CURRENT ROW
AND 1 FOLLOWING) AS Max_Medals
FROM Scandinavian_Medals
ORDER BY Year ASC;
Return the athletes, medals earned, and the maximum medals earned, comparing only the last two and current athletes, ordering by athletes’ names in alphabetical order.
WITH Chinese_Medals AS (
SELECT
Athlete, COUNT(*) AS Medals
FROM Summer_Medals
WHERE
Country = 'CHN' AND Medal = 'Gold'
AND Year >= 2000
GROUP BY Athlete)
SELECT
-- Select the athletes and the medals they've earned
Athlete,
Medals,
-- Get the max of the last two and current rows' medals
MAX(Medals) OVER (ORDER BY Athlete ASC
ROWS BETWEEN 2 PRECEDING
AND CURRENT ROW) AS Max_Medals
FROM Chinese_Medals
ORDER BY Athlete ASC;
Moving averages and totals
Overview
-
Moving average (MA): Average of last n periods
-
Example: 10-day MA of units sold in sales is the average of the last 10 days’ sold units
-
Used to indicate momentum/trends
-
Also useful in eliminating seasonality
-
-
Moving total: Sum of last n periods
-
Example: Sum of the last 3 Olympic games’ medals
-
Used to indicate performance; if the sum is going down, overall performance is going down
-




Calculate the 3-year moving average of medals earned.
WITH Russian_Medals AS (
SELECT
Year, COUNT(*) AS Medals
FROM Summer_Medals
WHERE
Country = 'RUS'
AND Medal = 'Gold'
AND Year >= 1980
GROUP BY Year)
SELECT
Year, Medals,
--- Calculate the 3-year moving average of medals earned
AVG(Medals) OVER
(ORDER BY Year ASC
ROWS BETWEEN
2 PRECEDING AND CURRENT ROW) AS Medals_MA
FROM Russian_Medals
ORDER BY Year ASC;
Calculate the 3-year moving sum of medals earned per country.
WITH Country_Medals AS (
SELECT
Year, Country, COUNT(*) AS Medals
FROM Summer_Medals
GROUP BY Year, Country)
SELECT
Year, Country, Medals,
-- Calculate each country's 3-game moving total
SUM(Medals) OVER
(PARTITION BY Country
ORDER BY Year ASC
ROWS BETWEEN
2 PRECEDING AND CURRENT ROW) AS Medals_MA
FROM Country_Medals
ORDER BY Country ASC, Year ASC;
Pivoting







Create the correct extension. Fill in the column names of the pivoted table.
-- Create the correct extension to enable CROSSTAB
CREATE EXTENSION IF NOT EXISTS tablefunc;
SELECT * FROM CROSSTAB($$
SELECT
Gender, Year, Country
FROM Summer_Medals
WHERE
Year IN (2008, 2012)
AND Medal = 'Gold'
AND Event = 'Pole Vault'
ORDER By Gender ASC, Year ASC;
-- Fill in the correct column names for the pivoted table
$$) AS ct (Gender VARCHAR,
"2008" VARCHAR,
"2012" VARCHAR)
ORDER BY Gender ASC;
Count the gold medals that France (FRA), the UK (GBR), and Germany (GER) have earned per country and year.
-- Count the gold medals per country and year
SELECT
Country,
year,
Count(*) AS Awards
FROM Summer_Medals
WHERE
Country IN ('FRA', 'GBR', 'GER')
AND Year IN (2004, 2008, 2012)
AND Medal = 'Gold'
GROUP BY Country, year
ORDER BY Country ASC, Year ASC
Select the country and year columns, then rank the three countries by how many gold medals they earned per year.
WITH Country_Awards AS (
SELECT
Country,
Year,
COUNT(*) AS Awards
FROM Summer_Medals
WHERE
Country IN ('FRA', 'GBR', 'GER')
AND Year IN (2004, 2008, 2012)
AND Medal = 'Gold'
GROUP BY Country, Year)
SELECT
-- Select Country and Year
Country,
Year,
-- Rank by gold medals earned per year
RANK() OVER(PARTITION BY Awards) :: INTEGER AS rank
FROM Country_Awards
ORDER BY Country ASC, Year ASC;
Pivot the query’s results by Year by filling in the new table’s correct column names.
CREATE EXTENSION IF NOT EXISTS tablefunc;
SELECT * FROM CROSSTAB($$
WITH Country_Awards AS (
SELECT
Country,
Year,
COUNT(*) AS Awards
FROM Summer_Medals
WHERE
Country IN ('FRA', 'GBR', 'GER')
AND Year IN (2004, 2008, 2012)
AND Medal = 'Gold'
GROUP BY Country, Year)
SELECT
Country,
Year,
RANK() OVER
(PARTITION BY Year
ORDER BY Awards DESC) :: INTEGER AS rank
FROM Country_Awards
ORDER BY Country ASC, Year ASC;
-- Fill in the correct column names for the pivoted table
$$) AS ct (Country VARCHAR,
"2004" INTEGER,
"2008" INTEGER,
"2012" INTEGER)
Order by Country ASC;
ROLLUP and CUBE









Count the gold medals awarded per country and gender. Generate Country-level gold award counts.
-- Count the gold medals per country and gender
SELECT
Country,
Gender,
COUNT(*) AS Gold_Awards
FROM Summer_Medals
WHERE
Year = 2004
AND Medal = 'Gold'
AND Country IN ('DEN', 'NOR', 'SWE')
-- Generate Country-level subtotals
GROUP BY Country, ROLLUP(Gender)
ORDER BY Country ASC, Gender ASC;
Count the medals awarded per gender and medal type. Generate all possible group-level counts (per gender and medal type subtotals and the grand total).
-- Count the medals per gender and medal type
SELECT
Gender,
Medal,
COUNT(*) AS Awards
FROM Summer_Medals
WHERE
Year = 2012
AND Country = 'RUS'
-- Get all possible group-level subtotals
GROUP BY CUBE(Gender, Medal)
ORDER BY Gender ASC, Medal ASC;
A survey of useful functions

Enter COALESCE
-
COALESCE()takes a list of values and returns the first non-null value, going from left to right -
COALESCE(null, null, 1, null, 2) ? 1 -
Useful when using SQL operations that return
nulls-
ROLLUPandCUBE -
Pivoting
-
LAGandLEAD
-


Enter STRING_AGG


Turn the nulls in the Country column to All countries, and the nulls in the Gender column to All genders
SELECT
-- Replace the nulls in the columns with meaningful text
COALESCE(Country, 'All countries') AS Country,
COALESCE(Gender, 'All genders') AS Gender,
COUNT(*) AS Awards
FROM Summer_Medals
WHERE
Year = 2004
AND Medal = 'Gold'
AND Country IN ('DEN', 'NOR', 'SWE')
GROUP BY ROLLUP(Country, Gender)
ORDER BY Country ASC, Gender ASC;
Rank countries by the medals they’ve been awarded.
SELECT
-- Replace the nulls in the columns with meaningful text
COALESCE(Country, 'All countries') AS Country,
COALESCE(Gender, 'All genders') AS Gender,
COUNT(*) AS Awards
FROM Summer_Medals
WHERE
Year = 2004
AND Medal = 'Gold'
AND Country IN ('DEN', 'NOR', 'SWE')
GROUP BY ROLLUP(Country, Gender)
ORDER BY Country ASC, Gender ASC;
Return the top 3 countries by medals awarded as one comma-separated string.
WITH Country_Medals AS (
SELECT
Country,
COUNT(*) AS Medals
FROM Summer_Medals
WHERE Year = 2000
AND Medal = 'Gold'
GROUP BY Country),
Country_Ranks AS (
SELECT
Country,
RANK() OVER (ORDER BY Medals DESC) AS Rank
FROM Country_Medals
ORDER BY Rank ASC)
-- Compress the countries column
SELECT STRING_AGG(Country, ', ')
FROM Country_Ranks
-- Select only the top three ranks
WHERE Rank <= 3;
Functions for Manipulating Data in Postgresql

Topics
-
Common data types in PostgreSQL
-
Date and time functions and operators
-
Parsing and manipulating text
-
Full-text search and PostgreSQL Extensions
Common data types in PostgreSQL
Common data types
-
Text data types
CHAR,VARCHARandTEXT
-
Numeric data types
INTandDECIMAL
-
Date / time data types
DATE,TIME,TIMESTAMP,INTERVAL
-
Arrays




Select all columns from the INFORMATION_SCHEMA.TABLES system database. Limit results that have a public table_schema
-- Select all columns from the TABLES system database
SELECT *
FROM INFORMATION_SCHEMA.TABLES
-- Filter by schema
WHERE table_schema = 'public';
Select all columns from the INFORMATION_SCHEMA.COLUMNS system database. Limit by table_name to actor
-- Select all columns from the COLUMNS system database
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'actor';
Select the column name and data type from the INFORMATION_SCHEMA.COLUMNS system database. Limit results to only include the customer table.
-- Get the column name and data type
SELECT
column_name,
data_type
-- From the system database information schema
FROM INFORMATION_SCHEMA.COLUMNS
-- For the customer table
WHERE table_name = 'customer';
Date and time data types





Select the rental date and return date from the rental table. Add an INTERVAL of 3 days to the rental_date to calculate the expected return date`
SELECT
-- Select the rental and return dates
rental_date,
return_date,
-- Calculate the expected_return_date
rental_date + INTERVAL '3 days' AS expected_return_date
FROM rental;
Working with ARRAYs





Select all films that have a special feature Trailers by filtering on the first index of the special_features ARRAY.
-- Select the title and special features column
SELECT
title,
special_features
FROM film
-- Use the array index of the special_features column
WHERE special_features[1] ='Trailers';
Now let’s select all films that have Deleted Scenes in the second index of the special_features ARRAY.
-- Select the title and special features column
SELECT
title,
special_features
FROM film
-- Use the array index of the special_features column
WHERE special_features[2] = 'Deleted Scenes';
Match ‘Trailers’ in any index of the special_features ARRAY regardless of position.
SELECT
title,
special_features
FROM film
-- Modify the query to use the ANY function
WHERE 'Trailers' = ANY (special_features);
The contains operator @> operator is alternative syntax to the ANY function and matches data in an ARRAY using the following syntax.
WHERE array_name @> ARRAY['search text'] :: type[]
Use the contains operator to match the text Deleted Scenes in the special_features column.
SELECT
title,
special_features
FROM film
-- Filter where special_features contains 'Deleted Scenes'
WHERE special_features @> ARRAY['Deleted Scenes'];
Overview of basic arithmetic operators
Topics
-
Overview of basic arithmetic operators
-
The
CURRENT_DATE, CURRENT_TIMESTAMP,NOW()functions -
The
AGE()function -
The
EXTRACT(),DATE_PART(), andDATE_TRUNC()functions







Subtract the rental_date from the return_date to calculate the number of days_rented.
SELECT f.title, f.rental_duration,
-- Calculate the number of days rented
r.return_date - r.rental_date AS days_rented
FROM film AS f
INNER JOIN inventory AS i ON f.film_id = i.film_id
INNER JOIN rental AS r ON i.inventory_id = r.inventory_id
ORDER BY f.title;
Now use the AGE() function to calculate the days_rented.
SELECT f.title, f.rental_duration,
-- Calculate the number of days rented
AGE(r.return_date, r.rental_date) AS days_rented
FROM film AS f
INNER JOIN inventory AS i ON f.film_id = i.film_id
INNER JOIN rental AS r ON i.inventory_id = r.inventory_id
ORDER BY f.title;
Convert rental_duration by multiplying it with a 1 day INTERVAL Subtract the rental_date from the return_date to calculate the number of days_rented. Exclude rentals with a NULL value for return_date.
SELECT
f.title,
-- Convert the rental_duration to an interval
INTERVAL '1' day * f.rental_duration,
-- Calculate the days rented as we did previously
r.return_date - r.rental_date AS days_rented
FROM film AS f
INNER JOIN inventory AS i ON f.film_id = i.film_id
INNER JOIN rental AS r ON i.inventory_id = r.inventory_id
-- Filter the query to exclude outstanding rentals
WHERE r.return_date IS NOT NULL
ORDER BY f.title;
SELECT
f.title,
r.rental_date,
f.rental_duration,
-- Add the rental duration to the rental date
INTERVAL '1' day * f.rental_duration + r.rental_date AS expected_return_date,
r.return_date
FROM film AS f
INNER JOIN inventory AS i ON f.film_id = i.film_id
INNER JOIN rental AS r ON i.inventory_id = r.inventory_id
ORDER BY f.title;
Functions for retrieving current date/time
Retrieving the current timestamp
SELECT NOW();
+-------------------------------+
| now() |
+-------------------------------+
| 2019-04-19 02:51:18.448641+00 |
+-------------------------------+
Retrieving the current timestamp (without timezone)
SELECT NOW()::timestamp;
+----------------------------+
| now() |
+----------------------------+
| 2019-04-19 02:51:18.448641 |
+----------------------------+
Retrieving the current timestamp
PostgreSQL specific casting
SELECT NOW()::timestamp;
CAST() function (alternative)
SELECT CAST(NOW() as timestamp);
Retrieving the current timestamp (alternative)
SELECT CURRENT_TIMESTAMP;
+-------------------------------+
| current_timestamp |
+-------------------------------+
| 2019-04-19 02:51:18.448641+00 |
+-------------------------------+
Retrieving the current timestamp
SELECT CURRENT_TIMESTAMP(2);
+----------------------------+
| current_timestamp |
+----------------------------+
| 2019-04-19 02:51:18.44+00 |
+----------------------------+
One difference between CURRENT_TIMESTAMP and NOW() is that with CURRENT_TIMESTAMP you can specify a precision parameter as you see in the example which will cause the result to have the seconds rounded to the number of fractional digits specified.
Current date and time (without time)
SELECT CURRENT_DATE;
+--------------+
| current_date |
+--------------+
| 2019-04-19 |
+--------------+
Current date and time
SELECT CURRENT_TIME;
+-------------------------+
| current_time |
+-------------------------+
| 04:06:30.929845+00:00 |
+-------------------------+
Use NOW() to select the current timestamp with timezone.
-- Select the current timestamp
SELECT NOW();
Select the current date without any time value.
-- Select the current date
SELECT CURRENT_DATE;
Now, let’s use the CAST() function to eliminate the timezone from the current timestamp.
--Select the current timestamp without a timezone
SELECT CAST( NOW() AS timestamp );
Use CAST() to retrieve the same result from the NOW() function.
SELECT
-- Select the current date
CURRENT_DATE,
-- CAST the result of the NOW() function to a date
CAST( NOW() AS date );
Select the current timestamp without timezone and alias it as right_now.
--Select the current timestamp without timezone
SELECT CURRENT_TIMESTAMP::timestamp AS right_now;
Now select a timestamp five days from now and alias it as five_days_from_now.
SELECT
CURRENT_TIMESTAMP::timestamp AS right_now,
INTERVAL '5 days' + CURRENT_TIMESTAMP AS five_days_from_now;
Finally, let’s use a second-level precision with no fractional digits for both the right_now and five_days_from_now fields.
SELECT
CURRENT_TIMESTAMP(2)::timestamp AS right_now,
interval '5 days' + CURRENT_TIMESTAMP(2) AS five_days_from_now;
Extracting and transforming date and time data
Exploring the EXTRACT(), DATE_PART() and DATE_TRUNC() functions
- Transactional timestamp precision not useful for analysis
2005-05-13 08:53:53
- Often need to extract parts of timestamps
2005 or 5 or 2 or Friday
- Or convert / truncate timestamp precision to standardize
2005-05-13 00:00:00
Extracting and transforming date / time data
- EXTRACT( field FROM source )
SELECT EXTRACT(quarter FROM timestamp '2005-01-24 05:12:00') AS quarter;
- DATE_PART(‘field’, source)
SELECT DATE_PART('quarter', timestamp '2005-01-24 05:12:00') AS quarter;
+----------+
| quarter |
+----------+
| 1 |
+----------+
Extracting sub-fields from timestamp data
Transactional data from DVD Rentals payment table
SELECT * FROM payment;
| payment_id | customer_id | staff_id | rental_id | amount | payment_date |
|------------|-------------|----------|-----------|--------|-----------------------|
| 1 | 1 | 1 | 76 | 2.99 | 2005-05-25 11:30:37 |
| 2 | 1 | 1 | 573 | 0.99 | 2005-05-28 10:35:23 |
| 3 | 1 | 1 | 1185 | 5.99 | 2005-06-15 00:54:12 |
Extracting sub-fields from timestamp data

Truncating timestamps using DATE_TRUNC()
The DATE_TRUNC() function will truncate timestamp or interval data types.
- Truncate timestamp ‘2005-05-21 15:30:30’ by year
SELECT DATE_TRUNC('year', TIMESTAMP '2005-05-21 15:30:30');
Result: 2005-01-01 00:00:00
- Truncate timestamp ‘2005-05-21 15:30:30’ by month
SELECT DATE_TRUNC('month', TIMESTAMP '2005-05-21 15:30:30');
Result: 2005-05-01 00:00:00
Get the day of the week from the rental_date column.
SELECT
-- Extract day of week from rental_date
EXTRACT(dow FROM rental_date) AS dayofweek
FROM rental
LIMIT 100;
dow= day of week return (0-6 Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday)
Count the total number of rentals by day of the week.
-- Extract day of week from rental_date
SELECT
EXTRACT(dow FROM rental_date) AS dayofweek,
-- Count the number of rentals
Count(*) as rentals
FROM rental
GROUP BY 1;
Truncate the rental_date field by year.
-- Truncate rental_date by year
SELECT DATE_TRUNC('year', rental_date) AS rental_year
FROM rental;
Now modify the previous query to truncate the rental_date by month.
-- Truncate rental_date by month
SELECT DATE_TRUNC('month', rental_date) AS rental_month
FROM rental;
Let’s see what happens when we truncate by day of the month.
-- Truncate rental_date by day of the month
SELECT DATE_TRUNC('day', rental_date) AS rental_day
FROM rental;
Finally, count the total number of rentals by rental_day and alias it as rentals.
SELECT
DATE_TRUNC('day', rental_date) AS rental_day,
-- Count total number of rentals
Count(*) AS rentals
FROM rental
GROUP BY 1;
Extract the day of the week from the rental_date column using the alias dayofweek. Use an INTERVAL in the WHERE clause to select records for the 90 day period starting on 5/1/2005.
SELECT
-- Extract the day of week date part from the rental_date
EXTRACT(dow FROM rental_date) AS dayofweek,
AGE(return_date, rental_date) AS rental_days
FROM rental AS r
WHERE
-- Use an INTERVAL for the upper bound of the rental_date
rental_date BETWEEN CAST('2005-05-01' AS DATE)
AND CAST('2005-05-01' AS DATE) + INTERVAL '90 day';
Finally, use a CASE statement and DATE_TRUNC() to create a new column called past_due which will be TRUE if the rental_days is greater than the rental_duration otherwise, it will be FALSE.
SELECT
c.first_name || ' ' || c.last_name AS customer_name,
f.title,
r.rental_date,
-- Extract the day of week date part from the rental_date
EXTRACT(dow FROM r.rental_date) AS dayofweek,
AGE(r.return_date, r.rental_date) AS rental_days,
-- Use DATE_TRUNC to get days from the AGE function
CASE WHEN DATE_TRUNC('day', AGE(r.return_date, r.rental_date)) >
-- Calculate number of d
f.rental_duration * INTERVAL '1' day
THEN TRUE
ELSE FALSE END AS past_due
FROM
film AS f
INNER JOIN inventory AS i
ON f.film_id = i.film_id
INNER JOIN rental AS r
ON i.inventory_id = r.inventory_id
INNER JOIN customer AS c
ON c.customer_id = r.customer_id
WHERE
-- Use an INTERVAL for the upper bound of the rental_date
r.rental_date BETWEEN CAST('2005-05-01' AS DATE)
AND CAST('2005-05-01' AS DATE) + INTERVAL '90 day';
Reformatting string and character data
Topics
-
Reformatting string and character data
-
Parsing string and character data
-
Determining string length and character position
-
Truncating and padding string data
The string concatenation operator
SELECT
first_name,
last_name,
first_name || ' ' || last_name AS full_name
FROM customer;
+------------+-----------+----------------+
| first_name | last_name | full_name |
+------------+-----------+----------------+
| MARY | SMITH | MARY SMITH |
| LINDA | WILLIAMS | LINDA WILLIAMS |
+------------+-----------+----------------+
|| ' ' ||is the concatenation operator in SQL that joins strings with a space between them.
String CONCATenation with functions
SELECT
CONCAT(first_name, ' ', last_name) AS full_name
FROM customer;
+----------------+
| full_name |
+----------------+
| MARY SMITH |
| LINDA WILLIAMS |
+----------------+
CONCAT()automatically handles NULL values (treats them as empty strings)
String concatenation with a non-string input
SELECT
customer_id || ': '
|| first_name || ' '
|| last_name AS full_name
FROM customer;
+-------------------+
| full_name |
+-------------------+
| 1: MARY SMITH |
| 2: LINDA WILLIAMS |
+-------------------+
Changing the case of string
SELECT
UPPER(email)
FROM customer;
+-------------------------------------+
| UPPER(email) |
+-------------------------------------+
| [email protected] |
| [email protected] |
| [email protected] |
+-------------------------------------+
UPPERis a function in SQL that converts all characters in a string to UPPERCASE.
Changing the case of string
SELECT
LOWER(title)
FROM film;
+-------------------+
| LOWER(title) |
+-------------------+
| academy dinosaur |
| ace goldfinger |
| adaptation holes |
+-------------------+
LOWERis a function in SQL that converts all characters in a string to lowercase.
Changing the case of string
SELECT
INITCAP(title)
FROM film;
+-------------------+
| INITCAP(title) |
+-------------------+
| Academy Dinosaur |
| Ace Goldfinger |
| Adaptation Holes |
+-------------------+
INITCAPis a function in SQL that converts the first character of each word to uppercase and all other characters to lowercase.
Replacing characters in a string
SELECT description FROM film;
+---------------------------------------------------------+
| description |
+---------------------------------------------------------+
| A Epic Drama of a Feminist And a Mad Scientist... |
| A Astounding Epistle of a Database Administrator... |
| A Astounding Reflection of a Lumberjack And a Car... |
| A Fanciful Documentary of a Frisbee And a Lumberjack... |
| A Fast-Paced Documentary of a Pastry Chef And a... |
+---------------------------------------------------------+
SELECT
REPLACE(description, 'A Astounding',
'An Astounding') as description
FROM film;
+-------------------------------------------------------+
| description |
+-------------------------------------------------------+
| A Epic Drama of a Feminist And a Mad Scientist... |
| An Astounding Epistle of a Database Administrator... |
| An Astounding Reflection of a Lumberjack And a Car... |
+-------------------------------------------------------+
REPLACE()is a function in SQL that replaces all occurrences of a specified substring with another substring in a string.
Manipulating string data with REVERSE
SELECT
title,
REVERSE(title)
FROM
film AS f;
+------------------+------------------+
| title | reverse(title) |
+------------------+------------------+
| ACADEMY DINOSAUR | RUASONID YMEDACA |
| ACE GOLDFINGER | REGNIFDLOG ECA |
+------------------+------------------+
Concatenate the first_name and last_name columns separated by a single space followed by email surrounded by < and >.
-- Concatenate the first_name and last_name and email
SELECT first_name || ' ' || last_name || ' <' || email || '>' AS full_email
FROM customer
Now use the CONCAT() function to do the same operation as the previous step.
-- Concatenate the first_name and last_name and email
SELECT CONCAT(first_name, ' ', last_name, ' <', email, '>') AS full_email
FROM customer
Convert the film category name to uppercase. Convert the first letter of each word in the film’s title to upper case. Concatenate the converted category name and film title separated by a colon. Convert the description column to lowercase.
SELECT
-- Concatenate the category name to coverted to uppercase
-- to the film title converted to title case
UPPER(c.name) || ': ' || INITCAP(f.title) AS film_category,
-- Convert the description column to lowercase
LOWER(f.description) AS description
FROM
film AS f
INNER JOIN film_category AS fc
ON f.film_id = fc.film_id
INNER JOIN category AS c
ON fc.category_id = c.category_id;
Replace all whitespace with an underscore.
SELECT
-- Replace whitespace in the film title with an underscore
REPLACE(title, ' ', '_') AS title
FROM film;
Parsing string and character data
Determining the length of a string
SELECT
title,
CHAR_LENGTH(title)
FROM film;
+-------------------+--------------------+
| title | CHAR_LENGTH(title) |
+-------------------+--------------------+
| ACADEMY DINOSAUR | 16 |
| ACE GOLDFINGER | 14 |
| ADAPTATION HOLES | 16 |
+-------------------+--------------------+
CHAR_LENGTH()is a function in SQL that returns the number of characters in a string.
Determining the length of a string
SELECT
title,
LENGTH(title)
FROM film;
+-------------------+---------------+
| title | LENGTH(title) |
+-------------------+---------------+
| ACADEMY DINOSAUR | 16 |
| ACE GOLDFINGER | 14 |
| ADAPTATION HOLES | 16 |
+-------------------+---------------+
LENGTH() is a function in SQL that returns the length of a string.
Finding the position of a character in a string
SELECT
email,
POSITION('@' IN email)
FROM customer;
+-------------------------------------+------------------------+
| email | POSITION('@' IN email) |
+-------------------------------------+------------------------+
| [email protected] | 11 |
| [email protected] | 17 |
| [email protected] | 15 |
+-------------------------------------+------------------------+
POSITION()is a function in SQL that returns the position of the first occurrence of a substring within a string.
Finding the position of a character in a string
SELECT
email,
STRPOS(email, '@')
FROM customer;
+-------------------------------------+--------------------+
| email | STRPOS(email, '@') |
+-------------------------------------+--------------------+
| [email protected] | 11 |
| [email protected] | 17 |
| [email protected] | 15 |
+-------------------------------------+--------------------+
STRPOS()is a function in PostgreSQL that returns the position of the first occurrence of a substring within a string.
Parsing string data
SELECT
LEFT(description, 50)
FROM film;
+----------------------------------------------------+
| description |
+----------------------------------------------------+
| A Epic Drama of a Feminist And a Mad Scientist who |
| A Astounding Epistle of a Database Administrator A |
| A Astounding Reflection of a Lumberjack And a Car |
+----------------------------------------------------+
LEFT()is a function in SQL that returns a specified number of characters from the beginning of a string.
SELECT
RIGHT(description, 50)
FROM film;
+----------------------------------------------------+
| description |
+----------------------------------------------------+
| who must Battle a Teacher in The Canadian Rockies |
| nd a Explorer who must Find a Car in Ancient China |
| Car who must Sink a Lumberjack in A Baloon Factory |
+----------------------------------------------------+
RIGHT()is a function in SQL that returns a specified number of characters from the end of a string.
Extracting substrings of character data
SELECT
SUBSTRING(description, 10, 50)
FROM
film AS f;
+----------------------------------------------------+
| description |
+----------------------------------------------------+
| ama of a Feminist And a Mad Scientist who must Bat |
| ing Epistle of a Database Administrator And a Expl |
| ing Reflection of a Lumberjack And a Car who must |
+----------------------------------------------------+
Extracting substrings of character data
SELECT
SUBSTRING(email FROM 0 FOR POSITION('@' IN email))
FROM
customer;
+----------------------------------------------------+
| SUBSTRING(email FROM 0 FOR POSITION('@' IN email)) |
+----------------------------------------------------+
| MARY.SMITH |
| PATRICIA.JOHNSON |
| LINDA.WILLIAMS |
+----------------------------------------------------+
SUBSTRING()is a function in SQL that extracts a substring from a string (extracts part of a string).
SELECT
SUBSTRING(email FROM POSITION('@' IN email)+1 FOR CHAR_LENGTH(email))
FROM
customer;
+-----------------------------------------------------------------------+
| SUBSTRING(email FROM POSITION('@' IN email)+1 FOR CHAR_LENGTH(email)) |
+-----------------------------------------------------------------------+
| sakilacustomer.org |
| sakilacustomer.org |
| sakilacustomer.org |
+-----------------------------------------------------------------------+
SELECT
SUBSTR(description, 10, 50)
FROM
film AS f;
+----------------------------------------------------+
| description |
+----------------------------------------------------+
| ama of a Feminist And a Mad Scientist who must Bat |
| ing Epistle of a Database Administrator And a Expl |
| ing Reflection of a Lumberjack And a Car who must |
+----------------------------------------------------+
SUBSTR()is a function in SQL that extracts a substring from a string (similar to SUBSTRING).
Select the title and description columns from the film table. Find the number of characters in the description column with the alias desc_len.
SELECT
-- Select the title and description columns
title,
description,
-- Determine the length of the description column
LENGTH(description) AS desc_len
FROM film;
Select the first 50 characters of the description column with the alias short_desc
SELECT
-- Select the first 50 characters of description
LEFT(description, 50) AS short_desc
FROM
film AS f;
Extract only the street address without the street number from the address column. Use functions to determine the starting and ending position parameters.
SELECT
-- Select only the street name from the address table
SUBSTRING(address FROM POSITION(' ' IN address)+1 FOR CHAR_LENGTH(address))
FROM
address;
Extract the characters to the left of the @ of the email column in the customer table and alias it as username. Now use SUBSTRING to extract the characters after the @ of the email column and alias the new derived field as domain.
SELECT
-- Extract the characters to the left of the '@'
LEFT(email, POSITION('@' IN email)-1) AS username,
-- Extract the characters to the right of the '@'
SUBSTRING(email FROM POSITION('@' IN email)+1 FOR LENGTH(email)) AS domain
FROM customer;
+1because we don’t need to take@only domain name
Truncating and padding string data
Removing whitespace from strings
TRIM([leading | trailing | both] [characters] from string)
-
First parameter: [leading | trailing | both]
-
LEADING — removes characters only from the start of the string
-
TRAILING — removes characters only from the end of the string
-
BOTH — removes characters from both ends (default)
-
-
Second parameter: [characters]
- characters — which characters to remove (default is a space
' ')
- characters — which characters to remove (default is a space
-
Third parameter: from string
- string — the string to trim
Removing whitespace from strings
SELECT TRIM(' padded ');
+---------+
| TRIM |
+---------+
| padded |
+---------+
This default behavior will remove all whitespace from the beginning and end
SELECT LTRIM(' padded ');
+-----------------+
| LTRIM |
+-----------------+
| padded |
+-----------------+
The LTRIM and RTRIM functions are analogous to TRIM but only remove characters from either the beginning OR the end
SELECT RTRIM(' padded ');
+-----------------+
| LTRIM |
+-----------------+
| padded |
+-----------------+
The LTRIM and RTRIM functions are analogous to TRIM but only remove characters from either the beginning OR the end
Padding strings with character data
SELECT LPAD('padded', 10);
+------------+
| LPAD |
|------------|
| padded |
+------------+
LPAD() left pad - add characters (to the left 6 letter + 4 spaces) the string will be padded with a space character by default.
SELECT LPAD('padded', 5);
+------------+
| LPAD |
|------------|
| padde |
+------------+
Original string:
padded(length = 6 characters) trims it on the right to the specified length
SELECT RPAD('padded', 10, '#');
+-------------+
| RPAD |
+-------------+
| padded#### |
+-------------+
RPAD('padded', 10, '#')- padded` (length = 6 characters) and add (4 x #)
Add a single space to the end or right of the first_name column using a padding function. Use the || operator to concatenate the padded first_name to the last_name column.
-- Concatenate the padded first_name and last_name
SELECT
RPAD(first_name, LENGTH(first_name)+1) || last_name AS full_name
FROM customer;
Now add a single space to the left or beginning of the last_name column using a different padding function than the first step. Use the || operator to concatenate the first_name column to the padded last_name.
-- Concatenate the first_name and last_name
SELECT
first_name || LPAD(last_name, LENGTH(last_name)+1) AS full_name
FROM customer;
Add a single space to the right or end of the first_name column. Add the characters < to the right or end of last_name column. Finally, add the characters > to the right or end of the email column.
-- Concatenate the first_name and last_name
SELECT
RPAD(first_name, LENGTH(first_name)+1)
|| RPAD(last_name, LENGTH(last_name)+2, ' <')
|| RPAD(email, LENGTH(email)+1, '>') AS full_email
FROM customer;
Convert the film category name to uppercase and use the CONCAT() concatenate it with the title. Truncate the description to the first 50 characters and make sure there is no leading or trailing whitespace after truncating.
-- Concatenate the uppercase category name and film title
SELECT
CONCAT(UPPER(name), ': ', title) AS film_category,
-- Truncate the description remove trailing whitespace
TRIM(LEFT(description, 50)) AS film_desc
FROM
film AS f
INNER JOIN film_category AS fc
ON f.film_id = fc.film_id
INNER JOIN category AS c
ON fc.category_id = c.category_id;
Get the first 50 characters of the description column Determine the position of the last whitespace character of the truncated description column and subtract it from the number 50 as the second parameter in the first function above.
SELECT
UPPER(c.name) || ': ' || f.title AS film_category,
-- Truncate the description without cutting off a word
LEFT(description, 50 -
-- Subtract the position of the first whitespace character
POSITION(
' ' IN REVERSE(LEFT(description, 50))
)
)
FROM
film AS f
INNER JOIN film_category AS fc
ON f.film_id = fc.film_id
INNER JOIN category AS c
ON fc.category_id = c.category_id;
The text is truncated to the last complete word before the 50th character so that words are not cut off in the middle.
Introduction to full-text search
Topics
-
Full Text search
-
Extending PostgreSQL
-
Improving full text search with extensions
The LIKE operator
_ wildcard: Used to match exactly one character.
% wildcard: Used to match zero or more characters.
SELECT title
FROM film
WHERE title LIKE 'ELF%';
+-----------------------+
| title |
+-----------------------+
| ELF PARTY |
+-----------------------+
ELF- exact match of these three characters%- any number of any characters after (including 0 characters)
SELECT title
FROM film
WHERE title LIKE '%ELF';
+----------------------+
| title |
+----------------------+
| ENCINO ELF |
| GHOSTBUSTERS ELF |
+----------------------+
ELF- exact match of these three characters%- any number of any characters after (including 0 characters)
SELECT title
FROM film
WHERE title LIKE '%elf%';
+---------------------+
| title |
+---------------------+
using LIKE to search the title column for any string that contains the word elf in all lowercase will return zero results. This may be counterintuitive to what you would expect because the LIKE operator matches the exact characters in the query and is case sensitive.
LIKE versus full-text search
SELECT title, description
FROM film
WHERE to_tsvector(title) @@ to_tsquery('elf');
+---------------------+
| title |
+---------------------+
| ELF PARTY |
| ENCINO ELF |
| GHOSTBUSTERS ELF |
+---------------------+
the functions to_tsvector and to_tsquery and the match operator to search the title column. Because full text search accounts for variations of the search string and is case insensitive you will notice that you get the expected results.
What is full-text search?
Full-text search provides a means for performing natural language queries of text data in your database.
-
Stemming
-
Spelling mistakes
-
Ranking
SELECT title, description
FROM film
WHERE to_tsvector(title) @@ to_tsquery('elf');
These functions convert text and string data to a tsvector data type which is a sorted list of words that have been normalized into variants of the same word. These variants are called
lexemes.
Select all columns for all records that begin with the word GOLD.
-- Select all columns
SELECT *
FROM film
-- Select only records that begin with the word 'GOLD'
WHERE title LIKE 'GOLD%';
Now select all records that end with the word GOLD.
SELECT *
FROM film
-- Select only records that end with the word 'GOLD'
WHERE title LIKE '%GOLD';
Finally, select all records that contain the word ‘GOLD’.
SELECT *
FROM film
-- Select only records that contain the word 'GOLD'
WHERE title LIKE '%GOLD%';
Select the film description and convert it to a tsvector data type.
-- Select the film description as a tsvector
SELECT to_tsvector(description)
FROM film;
Select the title and description columns from the film table. Perform a full-text search on the title column for the word elf.
-- Select the title and description
SELECT title, description
FROM film
-- Convert the title to a tsvector and match it against the tsquery
WHERE to_tsvector(title) @@ to_tsquery('elf');
Extending PostgreSQL
User-defined data types
Enumerated data types
CREATE TYPE dayofweek AS ENUM (
'Monday',
'Tuesday',
'Wednesday',
'Thursday',
'Friday',
'Saturday',
'Sunday'
);
ENUM= “a list of predefined text values.” You can’t insert anything that isn’t in this list.
It’s useful when values repeat and must be strictly limited (for example, days of the week, order statuses, user roles).
It works faster and takes up less space than a regular TEXT field.
Getting information about user-defined data types
SELECT typname, typcategory
FROM pg_type
WHERE typname = 'dayofweek';
+-----------+-------------+
| typname | typcategory |
+-----------+-------------+
| dayofweek | E |
+-----------+-------------+
Getting information about user-defined data types
SELECT column_name, data_type, udt_name
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_name = 'film';
+-----------------+-------------------+-------------+
| column_name | data_type | udt_name |
+-----------------+-------------------+-------------+
| title | character varying | varchar |
| rating | USER-DEFINED | mpaa_rating |
+-----------------+-------------------+-------------+
User-defined functions
CREATE FUNCTION squared(i integer) RETURNS integer AS $$
BEGIN
RETURN i * i;
END;
$$ LANGUAGE plpgsql;
SELECT squared(10);
+---------+
| squared |
+---------+
| 100 |
+---------+
CREATE FUNCTIONsquared(i integer) — creates a function namedsquaredthat takes one parameteriof type integer.RETURNS integer— specifies that the function returns a number (typeinteger).AS $$ ... $$— the function’s body (code) is written inside the$$symbols.BEGIN ... END; — a block where the function’s actions are executed.RETURN i * i; — returns the result, the square of the numberi.LANGUAGE plpgsql— indicates that the function is written in the PostgreSQL procedural language PL/pgSQL (built-in SQL programming language).
User-defined functions in the Sakila database
-
get_customer_balance(customer_id, effective_data): calculates the current outstanding balance for a given customer.
-
inventory_held_by_customer(inventory_id): returns the customer_id that is currently renting an inventory item or null if it’s currently available.
-
inventory_in_stock(inventory_id): returns a boolean value of whether an inventory item is currently in stock.
Create a new enumerated data type called compass_position. Use the four positions of a compass as the values.
-- Create an enumerated data type, compass_position
CREATE TYPE compass_position AS ENUM (
-- Use the four cardinal directions
'North',
'South',
'East',
'West'
);
Verify that the new data type has been created by looking in the pg_type system table.
-- Create an enumerated data type, compass_position
CREATE TYPE compass_position AS ENUM (
-- Use the four cardinal directions
'North',
'South',
'East',
'West'
);
-- Confirm the new data type is in the pg_type system table
SELECT typname, typcategory
FROM pg_type
WHERE typname='compass_position';
Select the column_name, data_type, udt_name. Filter for the rating column in the film table.
-- Select the column name, data type and udt name columns
SELECT column_name, data_type, udt_name
FROM INFORMATION_SCHEMA.COLUMNS
-- Filter by the rating column in the film table
WHERE table_name ='film' AND column_name='rating';
Select all columns from the pg_type table where the type name is equal to mpaa_rating.
SELECT *
FROM pg_type
WHERE typname ='mpaa_rating';
Select the title and inventory_id columns from the film and inventory tables in the database.
-- Select the film title and inventory ids
SELECT
f.title,
i.inventory_id
FROM film AS f
-- Join the film table to the inventory table
INNER JOIN inventory AS i ON f.film_Id=i.film_Id;
inventory_id is currently held by a customer and alias the column as held_by_cust
-- Select the film title, rental and inventory ids
SELECT
f.title,
i.inventory_id,
-- Determine whether the inventory is held by a customer
inventory_held_by_customer(i.inventory_id) AS held_by_cust
FROM film as f
-- Join the film table to the inventory table
INNER JOIN inventory AS i ON f.film_id=i.film_id;
Now filter your query to only return records where the inventory_held_by_customer() function returns a non-null value.
-- Select the film title and inventory ids
SELECT
f.title,
i.inventory_id,
-- Determine whether the inventory is held by a customer
inventory_held_by_customer(i.inventory_id) as held_by_cust
FROM film as f
INNER JOIN inventory AS i ON f.film_id=i.film_id
WHERE
-- Only include results where the held_by_cust is not null
inventory_held_by_customer(i.inventory_id) IS NOT NULL
Intro to PostgreSQL extensions
Commonly Used Extensions
-
PostGIS- Adds support for geographic objects, enabling location queries and geospatial analysis. -
PostPic- Provides functions for image processing and manipulation directly within the database. -
fuzzystrmatch- Offers functions for determining similarities and distances between strings, useful for fuzzy string matching. -
pg_trgm- Supports efficient text searches using trigram matching, which is ideal for features like “did you mean?” suggestions and full-text search.
Querying PostgreSQL Extension Metadata
Available Extensions
To view all extensions available for installation in your PostgreSQL system, use this query:
SELECT name
FROM pg_available_extensions;
+-----------------------+
| name |
+-----------------------+
| dblink |
| pg_stat_statements |
+-----------------------+
Installed Extensions
To check which extensions are currently installed in your database, run:
SELECT extname
FROM pg_extension;
+-----------+
| extname |
+-----------+
| plpgsql |
+-----------+
The
plpgsqlextension is typically installed by default as it’s required for procedural functions.
– Enable the fuzzystrmatch extension
CREATE EXTENSION IF NOT EXISTS fuzzystrmatch;
– Confirm that fuzzystrmatch has been enabled
SELECT extname FROM pg_extension;
+----------------+
| extname |
+----------------+
| plpgsql |
| fuzzystrmatch |
+----------------+
This shows the successful installation of the fuzzystrmatch extension alongside the default plpgsql extension. The CREATE EXTENSION IF NOT EXISTS command safely installs the extension only if it doesn’t already exist in the database.
Using fuzzystrmatch for Fuzzy Searching
The levenshtein function calculates the edit distance between two strings - the number of single-character edits (insertions, deletions, or substitutions) required to change one string into the other.
SELECT levenshtein('GUMBO', 'GAMBOL');
+--------------+
| levenshtein |
+--------------+
| 2 |
+--------------+
In this example, the distance is 2 because:
-
Change ‘U’ to ‘A’ in “GUMBO” → “GAMBO”
-
Add ‘L’ at the end → “GAMBOL”
The smaller the Levenshtein distance, the more similar the two strings are.
Comparing Strings with pg_trgm
The similarity function from the pg_trgm extension calculates how similar two strings are based on trigram matching (sequences of three consecutive characters).
SELECT similarity('GUMBO', 'GAMBOL');
+-------------+
| similarity |
+-------------+
| 0.18181818 |
+-------------+
The similarity score ranges from 0 (completely different) to 1 (identical). In this case, the score of approximately 0.18 indicates a low level of similarity between “GUMBO” and “GAMBOL”.
Enable the pg_trgm extension
-- Enable the pg_trgm extension
CREATE EXTENSION IF NOT EXISTS pg_trgm;
Now confirm that both fuzzystrmatch and pg_trgm are enabled by selecting all rows from the appropriate system table.
-- Select all rows extensions
SELECT *
FROM pg_extension;
Select the film title and description. Calculate the similarity between the title and description.
-- Select the title and description columns
SELECT
title,
description,
-- Calculate the similarity
similarity(title, description)
FROM
film
Select the film title and film description. Calculate the levenshtein distance for the film title with the string JET NEIGHBOR.
-- Select the title and description columns
SELECT
title,
description,
-- Calculate the levenshtein distance
levenshtein(title, 'JET NEIGHBOR') AS distance
FROM
film
ORDER BY 3
Select the title and description for all DVDs from the film table. Perform a full-text search by converting the description to a tsvector and match it to the phrase ‘Astounding & Drama’ using a tsquery in the WHERE clause.
-- Select the title and description columns
SELECT
title,
description
FROM
film
WHERE
-- Match "Astounding Drama" in the description
to_tsvector(description) @@
to_tsquery('Astounding & Drama');
Add a new column that calculates the similarity of the description with the phrase ‘Astounding Drama’. Sort the results by the new similarity column in descending order.
SELECT
title,
description,
-- Calculate the similarity
similarity(description, 'Astounding & Drama')
FROM
film
WHERE
to_tsvector(description) @@
to_tsquery('Astounding & Drama')
ORDER BY
similarity(description, 'Astounding & Drama') DESC;