VACUUM is one of the tasks your DBA need to perform to keep things running smoothly in a Postgres database<\/a><\/p>\n\n\n\n
This is a high-level overview of why this is important.<\/p>\n\n\n\n
The first thing to get to grips with is the concept of Multiversion Concurrency Control (MVCC). The Postgres docs say that this is a way to protect \u201creading from blocking writing and writing from blocking reading<\/em>\u201c, by providing each transaction with a snapshot of the data.<\/p>\n\n\n\n
While this is great when you\u2019re querying your data, no one gets locked. The trade-off is that all these snapshots float around only marked for deletion.<\/p>\n\n\n\n
They are not automatically deleted. These are called \u2018dead tuples\u2019, \u2018dead rows\u2019 or \u2018bloat\u2019 because they clog things up and slow queries down.<\/p>\n\n\n\n
Using VACUUM is a way to deal with the garbage collection that needs to be done to keep things running smoothly.<\/p>\n\n\n\n
This option only clears out the unused data but doesn\u2019t rewrite to disk.<\/p>\n\n\n\n
VACUUM [tablename] \n-- to specify a table\n\nVACUUM [tablename.columnname] \n-- to specify a table and column<\/pre>\n\n\n\n
\n\n\n\nVACUUM FULL<\/h4>\n\n\n\n
This is the most no-nonsense way to get the job done. The tradeoff is that VACUUM FULL puts a full lock on the table. Not a great option if you have anyone trying to SELECT as you carry this task out.<\/p>\n\n\n\n
VACUUM(FULL) [tablename]\n-- to specify a table\n\nVACUUM(FULL) [tablename.columnname] \n-- to specify a table and column<\/pre>\n\n\n\n
\n\n\n\nVACUUM ANALYZE<\/h4>\n\n\n\n
This clears out unused data and updates query plan.<\/p>\n\n\n\n
VACUUM(FULL, ANALYZE) [tablename] \n-- to specify a table\n \nVACUUM(FULL, ANALYZE) [tablename.columnname] \n-- to specify a table and column<\/pre>\n\n\n\n
\n\n\n\nAUTOVACUUM<\/h2>\n\n\n\n
This option keeps things under control automatically by using a trigger to kick off VACUUM when it reaches a certain level. When it is exceeded the VACUUM begins.<\/p>\n\n\n\n
vacuum base threshold + <\/p>
vacuum scale factor * <\/p>
number of<\/strong> tuples <\/p><\/blockquote>\n\n\n\n
\n\n\n\nautovacuum_vacuum_threshold = 50 \n-- the threshold of 50 rows is set to prevent \nsmall tables being overcleaned\n \nautovacuum_vacuum_scale_factor = 0.2 \n-- 20% of a table may have dead tuples<\/pre>\n\n\n\n
\n\n\n\nSo I should clean up all the time?<\/h2>\n\n\n\n
The aim here is to keep disk space usage at a steady state, not to knock it down to a minimum.<\/p>\n\n\n\n
Not necessarily.<\/p>\n\n\n\n
Performing these tasks uses resources, and depending on which strategy you have in mind, could lock users out while it happens.<\/p>\n\n\n\n
The advantage of AUTOVACUUM is that is throttled, so it doesn\u2019t use all your resources<\/p>\n\n\n\n
\n\n\n\nHow do I know when it last ran?<\/h2>\n\n\n\n
select \n relname, \n last_vacuum, \n last_autovacuum \nfrom\npg_stat_user_tables;<\/pre>\n\n\n\nTo tell if AUTOVACUUM is running you can use this query using the pg_stat_activity table:<\/p>\n\n\n\n
select\n datname, \n usename, \n pid, \n state, \n wait_event, \n current_timestamp - xact_start AS xact_runtime, \n query\nfrom \n pg_stat_activity \nwhere \n upper(query) LIKE '%VACUUM%' \norder by \n xact_start;<\/pre>\n\n\n\n
\n\n\n\nYour friendly DBA will be across the details when it comes to this process and other maintenance tasks. Hopefully, this gives you a good introduction to the concepts<\/a> and demystifies why you may need to log out so the VACCUUMing can be done.<\/p>\n\n\n\n
\n\n\n\nRead more<\/h2>\n\n\n\n