{"id":3115,"date":"2019-03-24T21:33:09","date_gmt":"2019-03-24T08:33:09","guid":{"rendered":"https:\/\/helenanderson.co.nz\/?p=3115"},"modified":"2020-05-14T20:48:04","modified_gmt":"2020-05-14T08:48:04","slug":"vacuum-database","status":"publish","type":"post","link":"https:\/\/www.helenanderson.co.nz\/vacuum-database\/","title":{"rendered":"Have you VACUUMed your tables lately?"},"content":{"rendered":"\n

Dishing the dirt on VACUUM<\/h2>\n\n\n\n

VACUUM is one of the tasks your DBA need to perform to keep things running smoothly in a Postgres database<\/a><\/p>\n\n\n\n

This is a high-level overview of why this is important.<\/p>\n\n\n\n


\n\n\n\n

Multiversion Concurrency Control<\/h2>\n\n\n\n

The first thing to get to grips with is the concept of Multiversion Concurrency Control (MVCC). The Postgres docs say that this is a way to protect \u201creading from blocking writing and writing from blocking reading<\/em>\u201c, by providing each transaction with a snapshot of the data.<\/p>\n\n\n\n

While this is great when you\u2019re querying your data, no one gets locked. The trade-off is that all these snapshots float around only marked for deletion.<\/p>\n\n\n\n

They are not automatically deleted. These are called \u2018dead tuples\u2019, \u2018dead rows\u2019 or \u2018bloat\u2019 because they clog things up and slow queries down.<\/p>\n\n\n\n


\n\n\n\n

How does this work?<\/h2>\n\n\n\n

VACUUM<\/h4>\n\n\n\n

Using VACUUM is a way to deal with the garbage collection that needs to be done to keep things running smoothly.<\/p>\n\n\n\n

This option only clears out the unused data but doesn\u2019t rewrite to disk.<\/p>\n\n\n\n

VACUUM [tablename]  \n-- to specify a table\n\nVACUUM [tablename.columnname]  \n-- to specify a table and column<\/pre>\n\n\n\n
\n\n\n\n

VACUUM FULL<\/h4>\n\n\n\n

This is the most no-nonsense way to get the job done. The tradeoff is that VACUUM FULL puts a full lock on the table. Not a great option if you have anyone trying to SELECT as you carry this task out.<\/p>\n\n\n\n

VACUUM(FULL) [tablename]\n-- to specify a table\n\nVACUUM(FULL) [tablename.columnname] \n-- to specify a table and column<\/pre>\n\n\n\n
\n\n\n\n

VACUUM ANALYZE<\/h4>\n\n\n\n

This clears out unused data and updates query plan.<\/p>\n\n\n\n

VACUUM(FULL, ANALYZE) [tablename] \n-- to specify a table\n \nVACUUM(FULL, ANALYZE) [tablename.columnname] \n-- to specify a table and column<\/pre>\n\n\n\n
\n\n\n\n

AUTOVACUUM<\/h2>\n\n\n\n

This option keeps things under control automatically by using a trigger to kick off VACUUM when it reaches a certain level. When it is exceeded the VACUUM begins.<\/p>\n\n\n\n

vacuum base threshold + <\/p>

vacuum scale factor * <\/p>

number of<\/strong> tuples <\/p><\/blockquote>\n\n\n\n


\n\n\n\n
autovacuum_vacuum_threshold = 50 \n-- the threshold of 50 rows is set to prevent \nsmall tables being overcleaned\n \nautovacuum_vacuum_scale_factor = 0.2 \n-- 20% of a table may have dead tuples<\/pre>\n\n\n\n
\n\n\n\n

So I should clean up all the time?<\/h2>\n\n\n\n

The aim here is to keep disk space usage at a steady state, not to knock it down to a minimum.<\/p>\n\n\n\n

Not necessarily.<\/p>\n\n\n\n

Performing these tasks uses resources, and depending on which strategy you have in mind, could lock users out while it happens.<\/p>\n\n\n\n

The advantage of AUTOVACUUM is that is throttled, so it doesn\u2019t use all your resources<\/p>\n\n\n\n


\n\n\n\n

How do I know when it last ran?<\/h2>\n\n\n\n

Using the query<\/a> below you can check on when the process ran by using the pg_stat_user_tables table:<\/p>\n\n\n\n

select \n  relname, \n  last_vacuum, \n  last_autovacuum \nfrom\npg_stat_user_tables;<\/pre>\n\n\n\n

To tell if AUTOVACUUM is running you can use this query using the pg_stat_activity table:<\/p>\n\n\n\n

select\n  datname, \n  usename, \n  pid, \n  state, \n  wait_event, \n  current_timestamp - xact_start AS xact_runtime, \n  query\nfrom \n  pg_stat_activity \nwhere \n  upper(query) LIKE '%VACUUM%' \norder by \n  xact_start;<\/pre>\n\n\n\n
\n\n\n\n

Your friendly DBA will be across the details when it comes to this process and other maintenance tasks. Hopefully, this gives you a good introduction to the concepts<\/a> and demystifies why you may need to log out so the VACCUUMing can be done.<\/p>\n\n\n\n


\n\n\n\n

Read more<\/h2>\n\n\n\n