Dishing the dirt on VACUUM
VACUUM is one of the tasks your DBA need to perform to keep things running smoothly in a Postgres database
This is a high-level overview of why this is important.
Multiversion Concurrency Control
The first thing to get to grips with is the concept of Multiversion Concurrency Control (MVCC). The Postgres docs say that this is a way to protect “reading from blocking writing and writing from blocking reading“, by providing each transaction with a snapshot of the data.
While this is great when you’re querying your data, no one gets locked. The trade-off is that all these snapshots float around only marked for deletion.
They are not automatically deleted. These are called ‘dead tuples’, ‘dead rows’ or ‘bloat’ because they clog things up and slow queries down.
How does this work?
VACUUM
Using VACUUM is a way to deal with the garbage collection that needs to be done to keep things running smoothly.
This option only clears out the unused data but doesn’t rewrite to disk.
1 2 3 4 5 |
VACUUM [tablename] -- to specify a table VACUUM [tablename.columnname] -- to specify a table and column |
VACUUM FULL
This is the most no-nonsense way to get the job done. The tradeoff is that VACUUM FULL puts a full lock on the table. Not a great option if you have anyone trying to SELECT as you carry this task out.
1 2 3 4 5 |
VACUUM(FULL) [tablename] -- to specify a table VACUUM(FULL) [tablename.columnname] -- to specify a table and column |
VACUUM ANALYZE
This clears out unused data and updates query plan.
1 2 3 4 5 |
VACUUM(FULL, ANALYZE) [tablename] -- to specify a table VACUUM(FULL, ANALYZE) [tablename.columnname] -- to specify a table and column |
AUTOVACUUM
This option keeps things under control automatically by using a trigger to kick off VACUUM when it reaches a certain level. When it is exceeded the VACUUM begins.
vacuum base threshold +
vacuum scale factor *
number of tuples
1 2 3 4 5 6 |
autovacuum_vacuum_threshold = 50 -- the threshold of 50 rows is set to prevent small tables being overcleaned autovacuum_vacuum_scale_factor = 0.2 -- 20% of a table may have dead tuples |
So I should clean up all the time?
The aim here is to keep disk space usage at a steady state, not to knock it down to a minimum.
Not necessarily.
Performing these tasks uses resources, and depending on which strategy you have in mind, could lock users out while it happens.
The advantage of AUTOVACUUM is that is throttled, so it doesn’t use all your resources
How do I know when it last ran?
Using the query below you can check on when the process ran by using the pg_stat_user_tables table:
1 2 3 4 5 6 |
select relname, last_vacuum, last_autovacuum from pg_stat_user_tables; |
To tell if AUTOVACUUM is running you can use this query using the pg_stat_activity table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
select datname, usename, pid, state, wait_event, current_timestamp - xact_start AS xact_runtime, query from pg_stat_activity where upper(query) LIKE '%VACUUM%' order by xact_start; |
Your friendly DBA will be across the details when it comes to this process and other maintenance tasks. Hopefully, this gives you a good introduction to the concepts and demystifies why you may need to log out so the VACCUUMing can be done.
Read more
- Postgres documentation – Vacuum
- Postgres documentation – Autovacuum
- Amazon RDS documentation
Photo by Miguel Á. Padriñán from Pexels
Comments are closed, but trackbacks and pingbacks are open.