Обсуждение: Execution time from >1s -> 80m+ when extra columns added in SELECTfor sub-query

Поиск
Список
Период
Сортировка

Execution time from >1s -> 80m+ when extra columns added in SELECTfor sub-query

От
A Guy Named Ryan
Дата:
First time posting here, so please let me know what additional information you'd like.  Thanks!

## A description of what you are trying to achieve and what results you expect:

- I have a program that dynamically generates SQL queries
- I made changes to how that program generates the SELECT part of the queries (and sub-queries)
- I have a query that went from taking less than a second with the old version of the SELECT to over 2 hours to complete with the new version of the SELECT
- I'd expect both versions of the query to take the same amount of time
- The changes to the SELECT appear in a sub-query and even though the changed columns in the sub-query are ultimately ignored by the query using the sub-query

## PostgreSQL version number you are running:

- Originally discovered on 9.6 running directly on the host
  - PostgreSQL 9.6.17 on x86_64-pc-linux-gnu (Ubuntu 9.6.17-2.pgdg18.04+1), compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit
- Recreated the issue under 9.6 and 12.2 using Docker
  - PostgreSQL 9.6.17 on x86_64-pc-linux-musl, compiled by gcc (Alpine 9.2.0) 9.2.0, 64-bit
  - PostgreSQL 12.2 on x86_64-pc-linux-musl, compiled by gcc (Alpine 9.2.0) 9.2.0, 64-bit

## How you installed PostgreSQL:

- Originally found in PostgreSQL installed on host
  - postgresql-9.6/bionic-pgdg,now 9.6.17-2.pgdg18.04+1 amd64 [installed]
- Recreated in Docker containers for 9.6 and 12.2

## Changes made to the settings in the postgresql.conf file:  see Server Configuration for a quick way to list them all:

- Ran tests using all combinations of the following values:
  - work_mem (4MB and 4GB)
  - random_page_cost: (1, 2, 4)
- I found no difference in performance with any combination of the above

## Operating system and version:

- Ubuntu 18.04
- Linux helios 4.15.0-76-generic #86-Ubuntu SMP Fri Jan 17 17:24:28 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

## What program you're using to connect to PostgreSQL:

- psql

## Is there anything relevant or unusual in the PostgreSQL server logs?:

- Not that I've seen

## For questions about any kind of error:

- No errors

## What you were doing when the error happened / how to cause the error:

- As mentioned above, my dynamically generated SQL now includes a few extra columns in a sub-query than it used to.  The query using the sub-query ignores those columns, so I'd imagine no change in performance, but instead I see a 6000x increase in execution time.
- I created a set of testing scripts that run a total of 48 variations on a few queries and configuration options and have generated EXPLAINS for all of them.
- I've included a copy of each test query and the resulting EXPLAIN
- The files that include "\_present" are those that have the new SQL and are running very slowly
- The files the include \_null, \_deleted, \_casted represent variations on the queries that all run very quickly
- Running VACUUM and/or ANALYZE does not seem to have an effect

## Tables involved

```
experiments=# set search_path to jigsaw_temp;
SET
experiments=# \dt
                              List of relations
   Schema    |                      Name                      | Type  | Owner
-------------+------------------------------------------------+-------+-------
 jigsaw_temp | jtemp1c37l3b_baseline_windows_after_inclusion  | table | ryan
 jigsaw_temp | jtemp1c37l3b_baseline_windows_with_collections | table | ryan
(2 rows)

experiments=# \d jtemp1c37l3b_baseline_windows_after_inclusion;
    Table "jigsaw_temp.jtemp1c37l3b_baseline_windows_after_inclusion"
        Column        |       Type       | Collation | Nullable | Default
----------------------+------------------+-----------+----------+---------
 uuid                 | text             |           |          |
 person_id            | bigint           |           |          |
 criterion_id         | bigint           |           |          |
 criterion_table      | text             |           |          |
 criterion_domain     | text             |           |          |
 start_date           | date             |           |          |
 end_date             | date             |           |          |
 source_value         | text             |           |          |
 source_vocabulary_id | text             |           |          |
 drug_amount          | double precision |           |          |
 drug_amount_units    | text             |           |          |
 drug_days_supply     | integer          |           |          |
 drug_name            | text             |           |          |
 drug_quantity        | bigint           |           |          |
 window_id            | bigint           |           |          |

experiments=# \d jtemp1c37l3b_baseline_windows_with_collections
Table "jigsaw_temp.jtemp1c37l3b_baseline_windows_with_collections"
  Column   |  Type  | Collation | Nullable | Default
-----------+--------+-----------+----------+---------
 person_id | bigint |           |          |
 uuid      | text   |           |          |

experiments=# SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='jtemp1c37l3b_baseline_windows_with_collections';
                    relname                     | relpages | reltuples | relallvisible | relkind | relnatts | relhassubclass | reloptions | pg_table_size
------------------------------------------------+----------+-----------+---------------+---------+----------+----------------+------------+---------------
 jtemp1c37l3b_baseline_windows_with_collections |     1433 |    138972 |             0 | r       |        2 | f              |            |      11771904
(1 row)

experiments=# SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname='jtemp1c37l3b_baseline_windows_after_inclusion';
                    relname                    | relpages | reltuples | relallvisible | relkind | relnatts | relhassubclass | reloptions | pg_table_size
-----------------------------------------------+----------+-----------+---------------+---------+----------+----------------+------------+---------------
 jtemp1c37l3b_baseline_windows_after_inclusion |     9187 |    505244 |             0 | r       |       15 | f              |            |      75309056
(1 row)
```

config

```
                  name                  |                 setting                  |                                                          description
----------------------------------------+------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------
 allow_system_table_mods                | off                                      | Allows modifications of the structure of system tables.
 application_name                       | psql                                     | Sets the application name to be reported in statistics and logs.
 archive_cleanup_command                |                                          | Sets the shell command that will be executed at every restart point.
 archive_command                        | (disabled)                               | Sets the shell command that will be called to archive a WAL file.
 archive_mode                           | off                                      | Allows archiving of WAL files using archive_command.
 archive_timeout                        | 0                                        | Forces a switch to the next WAL file if a new file has not been started within N seconds.
 array_nulls                            | on                                       | Enable input of NULL elements in arrays.
 authentication_timeout                 | 1min                                     | Sets the maximum allowed time to complete client authentication.
 autovacuum                             | on                                       | Starts the autovacuum subprocess.
 autovacuum_analyze_scale_factor        | 0.1                                      | Number of tuple inserts, updates, or deletes prior to analyze as a fraction of reltuples.
 autovacuum_analyze_threshold           | 50                                       | Minimum number of tuple inserts, updates, or deletes prior to analyze.
 autovacuum_freeze_max_age              | 200000000                                | Age at which to autovacuum a table to prevent transaction ID wraparound.
 autovacuum_max_workers                 | 3                                        | Sets the maximum number of simultaneously running autovacuum worker processes.
 autovacuum_multixact_freeze_max_age    | 400000000                                | Multixact age at which to autovacuum a table to prevent multixact wraparound.
 autovacuum_naptime                     | 1min                                     | Time to sleep between autovacuum runs.
 autovacuum_vacuum_cost_delay           | 2ms                                      | Vacuum cost delay in milliseconds, for autovacuum.
 autovacuum_vacuum_cost_limit           | -1                                       | Vacuum cost amount available before napping, for autovacuum.
 autovacuum_vacuum_scale_factor         | 0.2                                      | Number of tuple updates or deletes prior to vacuum as a fraction of reltuples.
 autovacuum_vacuum_threshold            | 50                                       | Minimum number of tuple updates or deletes prior to vacuum.
 autovacuum_work_mem                    | -1                                       | Sets the maximum memory to be used by each autovacuum worker process.
 backend_flush_after                    | 0                                        | Number of pages after which previously performed writes are flushed to disk.
 backslash_quote                        | safe_encoding                            | Sets whether "\'" is allowed in string literals.
 bgwriter_delay                         | 200ms                                    | Background writer sleep time between rounds.
 bgwriter_flush_after                   | 512kB                                    | Number of pages after which previously performed writes are flushed to disk.
 bgwriter_lru_maxpages                  | 100                                      | Background writer maximum number of LRU pages to flush per round.
 bgwriter_lru_multiplier                | 2                                        | Multiple of the average buffer usage to free per round.
 block_size                             | 8192                                     | Shows the size of a disk block.
 bonjour                                | off                                      | Enables advertising the server via Bonjour.
 bonjour_name                           |                                          | Sets the Bonjour service name.
 bytea_output                           | hex                                      | Sets the output format for bytea.
 check_function_bodies                  | on                                       | Check function bodies during CREATE FUNCTION.
 checkpoint_completion_target           | 0.5                                      | Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval.
 checkpoint_flush_after                 | 256kB                                    | Number of pages after which previously performed writes are flushed to disk.
 checkpoint_timeout                     | 5min                                     | Sets the maximum time between automatic WAL checkpoints.
 checkpoint_warning                     | 30s                                      | Enables warnings if checkpoint segments are filled more frequently than this.
 client_encoding                        | UTF8                                     | Sets the client's character set encoding.
 client_min_messages                    | notice                                   | Sets the message levels that are sent to the client.
 cluster_name                           |                                          | Sets the name of the cluster, which is included in the process title.
 commit_delay                           | 0                                        | Sets the delay in microseconds between transaction commit and flushing WAL to disk.
 commit_siblings                        | 5                                        | Sets the minimum concurrent open transactions before performing commit_delay.
 config_file                            | /var/lib/postgresql/data/postgresql.conf | Sets the server's main configuration file.
 constraint_exclusion                   | partition                                | Enables the planner to use constraints to optimize queries.
 cpu_index_tuple_cost                   | 0.005                                    | Sets the planner's estimate of the cost of processing each index entry during an index scan.
 cpu_operator_cost                      | 0.0025                                   | Sets the planner's estimate of the cost of processing each operator or function call.
 cpu_tuple_cost                         | 0.01                                     | Sets the planner's estimate of the cost of processing each tuple (row).
 cursor_tuple_fraction                  | 0.1                                      | Sets the planner's estimate of the fraction of a cursor's rows that will be retrieved.
 data_checksums                         | off                                      | Shows whether data checksums are turned on for this cluster.
 data_directory                         | /var/lib/postgresql/data                 | Sets the server's data directory.
 data_directory_mode                    | 0700                                     | Mode of the data directory.
 data_sync_retry                        | off                                      | Whether to continue running after a failure to sync data files.
 DateStyle                              | ISO, MDY                                 | Sets the display format for date and time values.
 db_user_namespace                      | off                                      | Enables per-database user names.
 deadlock_timeout                       | 1s                                       | Sets the time to wait on a lock before checking for deadlock.
 debug_assertions                       | off                                      | Shows whether the running server has assertion checks enabled.
 debug_pretty_print                     | on                                       | Indents parse and plan tree displays.
 debug_print_parse                      | off                                      | Logs each query's parse tree.
 debug_print_plan                       | off                                      | Logs each query's execution plan.
 debug_print_rewritten                  | off                                      | Logs each query's rewritten parse tree.
 default_statistics_target              | 100                                      | Sets the default statistics target.
 default_table_access_method            | heap                                     | Sets the default table access method for new tables.
 default_tablespace                     |                                          | Sets the default tablespace to create tables and indexes in.
 default_text_search_config             | pg_catalog.english                       | Sets default text search configuration.
 default_transaction_deferrable         | off                                      | Sets the default deferrable status of new transactions.
 default_transaction_isolation          | read committed                           | Sets the transaction isolation level of each new transaction.
 default_transaction_read_only          | off                                      | Sets the default read-only status of new transactions.
 dynamic_library_path                   | $libdir                                  | Sets the path for dynamically loadable modules.
 dynamic_shared_memory_type             | posix                                    | Selects the dynamic shared memory implementation used.
 effective_cache_size                   | 4GB                                      | Sets the planner's assumption about the total size of the data caches.
 effective_io_concurrency               | 1                                        | Number of simultaneous requests that can be handled efficiently by the disk subsystem.
 enable_bitmapscan                      | on                                       | Enables the planner's use of bitmap-scan plans.
 enable_gathermerge                     | on                                       | Enables the planner's use of gather merge plans.
 enable_hashagg                         | on                                       | Enables the planner's use of hashed aggregation plans.
 enable_hashjoin                        | on                                       | Enables the planner's use of hash join plans.
 enable_indexonlyscan                   | on                                       | Enables the planner's use of index-only-scan plans.
 enable_indexscan                       | on                                       | Enables the planner's use of index-scan plans.
 enable_material                        | on                                       | Enables the planner's use of materialization.
 enable_mergejoin                       | on                                       | Enables the planner's use of merge join plans.
 enable_nestloop                        | on                                       | Enables the planner's use of nested-loop join plans.
 enable_parallel_append                 | on                                       | Enables the planner's use of parallel append plans.
 enable_parallel_hash                   | on                                       | Enables the planner's use of parallel hash plans.
 enable_partition_pruning               | on                                       | Enables plan-time and run-time partition pruning.
 enable_partitionwise_aggregate         | off                                      | Enables partitionwise aggregation and grouping.
 enable_partitionwise_join              | off                                      | Enables partitionwise join.
 enable_seqscan                         | on                                       | Enables the planner's use of sequential-scan plans.
 enable_sort                            | on                                       | Enables the planner's use of explicit sort steps.
 enable_tidscan                         | on                                       | Enables the planner's use of TID scan plans.
 escape_string_warning                  | on                                       | Warn about backslash escapes in ordinary string literals.
 event_source                           | PostgreSQL                               | Sets the application name used to identify PostgreSQL messages in the event log.
 exit_on_error                          | off                                      | Terminate session on any error.
 external_pid_file                      |                                          | Writes the postmaster PID to the specified file.
 extra_float_digits                     | 1                                        | Sets the number of digits displayed for floating-point values.
 force_parallel_mode                    | off                                      | Forces use of parallel query facilities.
 from_collapse_limit                    | 8                                        | Sets the FROM-list size beyond which subqueries are not collapsed.
 fsync                                  | on                                       | Forces synchronization of updates to disk.
 full_page_writes                       | on                                       | Writes full pages to WAL when first modified after a checkpoint.
 geqo                                   | on                                       | Enables genetic query optimization.
 geqo_effort                            | 5                                        | GEQO: effort is used to set the default for other GEQO parameters.
 geqo_generations                       | 0                                        | GEQO: number of iterations of the algorithm.
 geqo_pool_size                         | 0                                        | GEQO: number of individuals in the population.
 geqo_seed                              | 0                                        | GEQO: seed for random path selection.
 geqo_selection_bias                    | 2                                        | GEQO: selective pressure within the population.
 geqo_threshold                         | 12                                       | Sets the threshold of FROM items beyond which GEQO is used.
 gin_fuzzy_search_limit                 | 0                                        | Sets the maximum allowed result for exact search by GIN.
 gin_pending_list_limit                 | 4MB                                      | Sets the maximum size of the pending list for GIN index.
 hba_file                               | /var/lib/postgresql/data/pg_hba.conf     | Sets the server's "hba" configuration file.
 hot_standby                            | on                                       | Allows connections and queries during recovery.
 hot_standby_feedback                   | off                                      | Allows feedback from a hot standby to the primary that will avoid query conflicts.
 huge_pages                             | try                                      | Use of huge pages on Linux or Windows.
 ident_file                             | /var/lib/postgresql/data/pg_ident.conf   | Sets the server's "ident" configuration file.
 idle_in_transaction_session_timeout    | 0                                        | Sets the maximum allowed duration of any idling transaction.
 ignore_checksum_failure                | off                                      | Continues processing after a checksum failure.
 ignore_system_indexes                  | off                                      | Disables reading from system indexes.
 integer_datetimes                      | on                                       | Datetimes are integer based.
 IntervalStyle                          | postgres                                 | Sets the display format for interval values.
 jit                                    | on                                       | Allow JIT compilation.
 jit_above_cost                         | 100000                                   | Perform JIT compilation if query is more expensive.
 jit_debugging_support                  | off                                      | Register JIT compiled function with debugger.
 jit_dump_bitcode                       | off                                      | Write out LLVM bitcode to facilitate JIT debugging.
 jit_expressions                        | on                                       | Allow JIT compilation of expressions.
 jit_inline_above_cost                  | 500000                                   | Perform JIT inlining if query is more expensive.
 jit_optimize_above_cost                | 500000                                   | Optimize JITed functions if query is more expensive.
 jit_profiling_support                  | off                                      | Register JIT compiled function with perf profiler.
 jit_provider                           | llvmjit                                  | JIT provider to use.
 jit_tuple_deforming                    | on                                       | Allow JIT compilation of tuple deforming.
 join_collapse_limit                    | 8                                        | Sets the FROM-list size beyond which JOIN constructs are not flattened.
 krb_caseins_users                      | off                                      | Sets whether Kerberos and GSSAPI user names should be treated as case-insensitive.
 krb_server_keyfile                     |                                          | Sets the location of the Kerberos server key file.
 lc_collate                             | en_US.utf8                               | Shows the collation order locale.
 lc_ctype                               | en_US.utf8                               | Shows the character classification and case conversion locale.
 lc_messages                            | en_US.utf8                               | Sets the language in which messages are displayed.
 lc_monetary                            | en_US.utf8                               | Sets the locale for formatting monetary amounts.
 lc_numeric                             | en_US.utf8                               | Sets the locale for formatting numbers.
 lc_time                                | en_US.utf8                               | Sets the locale for formatting date and time values.
 listen_addresses                       | *                                        | Sets the host name or IP address(es) to listen to.
 lo_compat_privileges                   | off                                      | Enables backward compatibility mode for privilege checks on large objects.
 local_preload_libraries                |                                          | Lists unprivileged shared libraries to preload into each backend.
 lock_timeout                           | 0                                        | Sets the maximum allowed duration of any wait for a lock.
 log_autovacuum_min_duration            | -1                                       | Sets the minimum execution time above which autovacuum actions will be logged.
 log_checkpoints                        | off                                      | Logs each checkpoint.
 log_connections                        | off                                      | Logs each successful connection.
 log_destination                        | stderr                                   | Sets the destination for server log output.
 log_directory                          | log                                      | Sets the destination directory for log files.
 log_disconnections                     | off                                      | Logs end of a session, including duration.
 log_duration                           | off                                      | Logs the duration of each completed SQL statement.
 log_error_verbosity                    | default                                  | Sets the verbosity of logged messages.
 log_executor_stats                     | off                                      | Writes executor performance statistics to the server log.
 log_file_mode                          | 0600                                     | Sets the file permissions for log files.
 log_filename                           | postgresql-%Y-%m-%d_%H%M%S.log           | Sets the file name pattern for log files.
 log_hostname                           | off                                      | Logs the host name in the connection logs.
 log_line_prefix                        | %m [%p]                                  | Controls information prefixed to each log line.
 log_lock_waits                         | off                                      | Logs long lock waits.
 log_min_duration_statement             | -1                                       | Sets the minimum execution time above which statements will be logged.
 log_min_error_statement                | error                                    | Causes all statements generating error at or above this level to be logged.
 log_min_messages                       | warning                                  | Sets the message levels that are logged.
 log_parser_stats                       | off                                      | Writes parser performance statistics to the server log.
 log_planner_stats                      | off                                      | Writes planner performance statistics to the server log.
 log_replication_commands               | off                                      | Logs each replication command.
 log_rotation_age                       | 1d                                       | Automatic log file rotation will occur after N minutes.
 log_rotation_size                      | 10MB                                     | Automatic log file rotation will occur after N kilobytes.
 log_statement                          | none                                     | Sets the type of statements logged.
 log_statement_stats                    | off                                      | Writes cumulative performance statistics to the server log.
 log_temp_files                         | -1                                       | Log the use of temporary files larger than this number of kilobytes.
 log_timezone                           | UTC                                      | Sets the time zone to use in log messages.
 log_transaction_sample_rate            | 0                                        | Set the fraction of transactions to log for new transactions.
 log_truncate_on_rotation               | off                                      | Truncate existing log files of same name during log rotation.
 logging_collector                      | off                                      | Start a subprocess to capture stderr output and/or csvlogs into log files.
 maintenance_work_mem                   | 64MB                                     | Sets the maximum memory to be used for maintenance operations.
 max_connections                        | 100                                      | Sets the maximum number of concurrent connections.
 max_files_per_process                  | 1000                                     | Sets the maximum number of simultaneously open files for each server process.
 max_function_args                      | 100                                      | Shows the maximum number of function arguments.
 max_identifier_length                  | 63                                       | Shows the maximum identifier length.
 max_index_keys                         | 32                                       | Shows the maximum number of index keys.
 max_locks_per_transaction              | 64                                       | Sets the maximum number of locks per transaction.
 max_logical_replication_workers        | 4                                        | Maximum number of logical replication worker processes.
 max_parallel_maintenance_workers       | 2                                        | Sets the maximum number of parallel processes per maintenance operation.
 max_parallel_workers                   | 8                                        | Sets the maximum number of parallel workers that can be active at one time.
 max_parallel_workers_per_gather        | 2                                        | Sets the maximum number of parallel processes per executor node.
 max_pred_locks_per_page                | 2                                        | Sets the maximum number of predicate-locked tuples per page.
 max_pred_locks_per_relation            | -2                                       | Sets the maximum number of predicate-locked pages and tuples per relation.
 max_pred_locks_per_transaction         | 64                                       | Sets the maximum number of predicate locks per transaction.
 max_prepared_transactions              | 0                                        | Sets the maximum number of simultaneously prepared transactions.
 max_replication_slots                  | 10                                       | Sets the maximum number of simultaneously defined replication slots.
 max_stack_depth                        | 2MB                                      | Sets the maximum stack depth, in kilobytes.
 max_standby_archive_delay              | 30s                                      | Sets the maximum delay before canceling queries when a hot standby server is processing archived WAL data.
 max_standby_streaming_delay            | 30s                                      | Sets the maximum delay before canceling queries when a hot standby server is processing streamed WAL data.
 max_sync_workers_per_subscription      | 2                                        | Maximum number of table synchronization workers per subscription.
 max_wal_senders                        | 10                                       | Sets the maximum number of simultaneously running WAL sender processes.
 max_wal_size                           | 1GB                                      | Sets the WAL size that triggers a checkpoint.
 max_worker_processes                   | 8                                        | Maximum number of concurrent worker processes.
 min_parallel_index_scan_size           | 512kB                                    | Sets the minimum amount of index data for a parallel scan.
 min_parallel_table_scan_size           | 8MB                                      | Sets the minimum amount of table data for a parallel scan.
 min_wal_size                           | 80MB                                     | Sets the minimum size to shrink the WAL to.
 old_snapshot_threshold                 | -1                                       | Time before a snapshot is too old to read pages changed after the snapshot was taken.
 operator_precedence_warning            | off                                      | Emit a warning for constructs that changed meaning since PostgreSQL 9.4.
 parallel_leader_participation          | on                                       | Controls whether Gather and Gather Merge also run subplans.
 parallel_setup_cost                    | 1000                                     | Sets the planner's estimate of the cost of starting up worker processes for parallel query.
 parallel_tuple_cost                    | 0.1                                      | Sets the planner's estimate of the cost of passing each tuple (row) from worker to master backend.
 password_encryption                    | md5                                      | Encrypt passwords.
 plan_cache_mode                        | auto                                     | Controls the planner's selection of custom or generic plan.
 port                                   | 5432                                     | Sets the TCP port the server listens on.
 post_auth_delay                        | 0                                        | Waits N seconds on connection startup after authentication.
 pre_auth_delay                         | 0                                        | Waits N seconds on connection startup before authentication.
 primary_conninfo                       |                                          | Sets the connection string to be used to connect to the sending server.
 primary_slot_name                      |                                          | Sets the name of the replication slot to use on the sending server.
 promote_trigger_file                   |                                          | Specifies a file name whose presence ends recovery in the standby.
 quote_all_identifiers                  | off                                      | When generating SQL fragments, quote all identifiers.
 random_page_cost                       | 4                                        | Sets the planner's estimate of the cost of a nonsequentially fetched disk page.
 recovery_end_command                   |                                          | Sets the shell command that will be executed once at the end of recovery.
 recovery_min_apply_delay               | 0                                        | Sets the minimum delay for applying changes during recovery.
 recovery_target                        |                                          | Set to "immediate" to end recovery as soon as a consistent state is reached.
 recovery_target_action                 | pause                                    | Sets the action to perform upon reaching the recovery target.
 recovery_target_inclusive              | on                                       | Sets whether to include or exclude transaction with recovery target.
 recovery_target_lsn                    |                                          | Sets the LSN of the write-ahead log location up to which recovery will proceed.
 recovery_target_name                   |                                          | Sets the named restore point up to which recovery will proceed.
 recovery_target_time                   |                                          | Sets the time stamp up to which recovery will proceed.
 recovery_target_timeline               | latest                                   | Specifies the timeline to recover into.
 recovery_target_xid                    |                                          | Sets the transaction ID up to which recovery will proceed.
 restart_after_crash                    | on                                       | Reinitialize server after backend crash.
 restore_command                        |                                          | Sets the shell command that will retrieve an archived WAL file.
 row_security                           | on                                       | Enable row security.
 search_path                            | "$user", public                          | Sets the schema search order for names that are not schema-qualified.
 segment_size                           | 1GB                                      | Shows the number of pages per disk file.
 seq_page_cost                          | 1                                        | Sets the planner's estimate of the cost of a sequentially fetched disk page.
 server_encoding                        | UTF8                                     | Sets the server (database) character set encoding.
 server_version                         | 12.2                                     | Shows the server version.
 server_version_num                     | 120002                                   | Shows the server version as an integer.
 session_preload_libraries              |                                          | Lists shared libraries to preload into each backend.
 session_replication_role               | origin                                   | Sets the session's behavior for triggers and rewrite rules.
 shared_buffers                         | 12GB                                     | Sets the number of shared memory buffers used by the server.
 shared_memory_type                     | mmap                                     | Selects the shared memory implementation used for the main shared memory region.
 shared_preload_libraries               |                                          | Lists shared libraries to preload into server.
 ssl                                    | off                                      | Enables SSL connections.
 ssl_ca_file                            |                                          | Location of the SSL certificate authority file.
 ssl_cert_file                          | server.crt                               | Location of the SSL server certificate file.
 ssl_ciphers                            | HIGH:MEDIUM:+3DES:!aNULL                 | Sets the list of allowed SSL ciphers.
 ssl_crl_file                           |                                          | Location of the SSL certificate revocation list file.
 ssl_dh_params_file                     |                                          | Location of the SSL DH parameters file.
 ssl_ecdh_curve                         | prime256v1                               | Sets the curve to use for ECDH.
 ssl_key_file                           | server.key                               | Location of the SSL server private key file.
 ssl_library                            | OpenSSL                                  | Name of the SSL library.
 ssl_max_protocol_version               |                                          | Sets the maximum SSL/TLS protocol version to use.
 ssl_min_protocol_version               | TLSv1                                    | Sets the minimum SSL/TLS protocol version to use.
 ssl_passphrase_command                 |                                          | Command to obtain passphrases for SSL.
 ssl_passphrase_command_supports_reload | off                                      | Also use ssl_passphrase_command during server reload.
 ssl_prefer_server_ciphers              | on                                       | Give priority to server ciphersuite order.
 standard_conforming_strings            | on                                       | Causes '...' strings to treat backslashes literally.
 statement_timeout                      | 0                                        | Sets the maximum allowed duration of any statement.
 stats_temp_directory                   | pg_stat_tmp                              | Writes temporary statistics files to the specified directory.
 superuser_reserved_connections         | 3                                        | Sets the number of connection slots reserved for superusers.
 synchronize_seqscans                   | on                                       | Enable synchronized sequential scans.
 synchronous_commit                     | on                                       | Sets the current transaction's synchronization level.
 synchronous_standby_names              |                                          | Number of synchronous standbys and list of names of potential synchronous ones.
 syslog_facility                        | local0                                   | Sets the syslog "facility" to be used when syslog enabled.
 syslog_ident                           | postgres                                 | Sets the program name used to identify PostgreSQL messages in syslog.
 syslog_sequence_numbers                | on                                       | Add sequence number to syslog messages to avoid duplicate suppression.
 syslog_split_messages                  | on                                       | Split messages sent to syslog by lines and to fit into 1024 bytes.
 tcp_keepalives_count                   | 9                                        | Maximum number of TCP keepalive retransmits.
 tcp_keepalives_idle                    | 7200                                     | Time between issuing TCP keepalives.
 tcp_keepalives_interval                | 75                                       | Time between TCP keepalive retransmits.
 tcp_user_timeout                       | 0                                        | TCP user timeout.
 temp_buffers                           | 8MB                                      | Sets the maximum number of temporary buffers used by each session.
 temp_file_limit                        | -1                                       | Limits the total size of all temporary files used by each process.
 temp_tablespaces                       |                                          | Sets the tablespace(s) to use for temporary tables and sort files.
 TimeZone                               | UTC                                      | Sets the time zone for displaying and interpreting time stamps.
 timezone_abbreviations                 | Default                                  | Selects a file of time zone abbreviations.
 trace_notify                           | off                                      | Generates debugging output for LISTEN and NOTIFY.
 trace_recovery_messages                | log                                      | Enables logging of recovery-related debugging information.
 trace_sort                             | off                                      | Emit information about resource usage in sorting.
 track_activities                       | on                                       | Collects information about executing commands.
 track_activity_query_size              | 1kB                                      | Sets the size reserved for pg_stat_activity.query, in bytes.
 track_commit_timestamp                 | off                                      | Collects transaction commit time.
 track_counts                           | on                                       | Collects statistics on database activity.
 track_functions                        | none                                     | Collects function-level statistics on database activity.
 track_io_timing                        | off                                      | Collects timing statistics for database I/O activity.
 transaction_deferrable                 | off                                      | Whether to defer a read-only serializable transaction until it can be executed with no possible serialization failures.
 transaction_isolation                  | read committed                           | Sets the current transaction's isolation level.
 transaction_read_only                  | off                                      | Sets the current transaction's read-only status.
 transform_null_equals                  | off                                      | Treats "expr=NULL" as "expr IS NULL".
 unix_socket_directories                | /var/run/postgresql                      | Sets the directories where Unix-domain sockets will be created.
 unix_socket_group                      |                                          | Sets the owning group of the Unix-domain socket.
 unix_socket_permissions                | 0777                                     | Sets the access permissions of the Unix-domain socket.
 update_process_title                   | on                                       | Updates the process title to show the active SQL command.
 vacuum_cleanup_index_scale_factor      | 0.1                                      | Number of tuple inserts prior to index cleanup as a fraction of reltuples.
 vacuum_cost_delay                      | 0                                        | Vacuum cost delay in milliseconds.
 vacuum_cost_limit                      | 200                                      | Vacuum cost amount available before napping.
 vacuum_cost_page_dirty                 | 20                                       | Vacuum cost for a page dirtied by vacuum.
 vacuum_cost_page_hit                   | 1                                        | Vacuum cost for a page found in the buffer cache.
 vacuum_cost_page_miss                  | 10                                       | Vacuum cost for a page not found in the buffer cache.
 vacuum_defer_cleanup_age               | 0                                        | Number of transactions by which VACUUM and HOT cleanup should be deferred, if any.
 vacuum_freeze_min_age                  | 50000000                                 | Minimum age at which VACUUM should freeze a table row.
 vacuum_freeze_table_age                | 150000000                                | Age at which VACUUM should scan whole table to freeze tuples.
 vacuum_multixact_freeze_min_age        | 5000000                                  | Minimum age at which VACUUM should freeze a MultiXactId in a table row.
 vacuum_multixact_freeze_table_age      | 150000000                                | Multixact age at which VACUUM should scan whole table to freeze tuples.
 wal_block_size                         | 8192                                     | Shows the block size in the write ahead log.
 wal_buffers                            | 16MB                                     | Sets the number of disk-page buffers in shared memory for WAL.
 wal_compression                        | off                                      | Compresses full-page writes written in WAL file.
 wal_consistency_checking               |                                          | Sets the WAL resource managers for which WAL consistency checks are done.
 wal_init_zero                          | on                                       | Writes zeroes to new WAL files before first use.
 wal_keep_segments                      | 0                                        | Sets the number of WAL files held for standby servers.
 wal_level                              | replica                                  | Set the level of information written to the WAL.
 wal_log_hints                          | off                                      | Writes full pages to WAL when first modified after a checkpoint, even for a non-critical modifications.
 wal_receiver_status_interval           | 10s                                      | Sets the maximum interval between WAL receiver status reports to the sending server.
 wal_receiver_timeout                   | 1min                                     | Sets the maximum wait time to receive data from the sending server.
 wal_recycle                            | on                                       | Recycles WAL files by renaming them.
 wal_retrieve_retry_interval            | 5s                                       | Sets the time to wait before retrying to retrieve WAL after a failed attempt.
 wal_segment_size                       | 16MB                                     | Shows the size of write ahead log segments.
 wal_sender_timeout                     | 1min                                     | Sets the maximum time to wait for WAL replication.
 wal_sync_method                        | fdatasync                                | Selects the method used for forcing WAL updates to disk.
 wal_writer_delay                       | 200ms                                    | Time between WAL flushes performed in the WAL writer.
 wal_writer_flush_after                 | 1MB                                      | Amount of WAL written out by WAL writer that triggers a flush.
 work_mem                               | 4MB                                      | Sets the maximum memory to be used for query workspaces.
 xmlbinary                              | base64                                   | Sets how binary values are to be encoded in XML.
 xmloption                              | content                                  | Sets whether XML data in implicit parsing and serialization operations is to be considered as documents or content fragments.
 zero_damaged_pages                     | off                                      | Continues processing past damaged page headers.
(314 rows)
```

table statistics

```
experiments=# SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation FROM pg_stats WHERE tablename ilike 'jtemp1c37l3b_%' ORDER BY tablename, 1 DESC;
 frac_mcv  |                   tablename                    |       attname        | inherited | null_frac | n_distinct | n_mcv | n_hist | correlation
-----------+------------------------------------------------+----------------------+-----------+-----------+------------+-------+--------+-------------
           | jtemp1c37l3b_baseline_windows_after_inclusion  | drug_amount_units    | f         |         1 |          0 |       |        |
           | jtemp1c37l3b_baseline_windows_after_inclusion  | drug_days_supply     | f         |         1 |          0 |       |        |
           | jtemp1c37l3b_baseline_windows_after_inclusion  | drug_name            | f         |         1 |          0 |       |        |
           | jtemp1c37l3b_baseline_windows_after_inclusion  | window_id            | f         |         1 |          0 |       |        |
           | jtemp1c37l3b_baseline_windows_after_inclusion  | uuid                 | f         |         0 |         -1 |       |    101 |           1
           | jtemp1c37l3b_baseline_windows_after_inclusion  | person_id            | f         |         0 |         -1 |       |    101 |  0.38026863
           | jtemp1c37l3b_baseline_windows_after_inclusion  | criterion_id         | f         |         0 |         -1 |       |    101 |  0.38026673
           | jtemp1c37l3b_baseline_windows_after_inclusion  | drug_quantity        | f         |         1 |          0 |       |        |
           | jtemp1c37l3b_baseline_windows_after_inclusion  | source_value         | f         |         1 |          0 |       |        |
           | jtemp1c37l3b_baseline_windows_after_inclusion  | source_vocabulary_id | f         |         1 |          0 |       |        |
           | jtemp1c37l3b_baseline_windows_after_inclusion  | drug_amount          | f         |         1 |          0 |       |        |
         1 | jtemp1c37l3b_baseline_windows_after_inclusion  | criterion_table      | f         |         0 |          1 |     1 |        |           1
         1 | jtemp1c37l3b_baseline_windows_after_inclusion  | criterion_domain     | f         |         0 |          1 |     1 |        |           1
         1 | jtemp1c37l3b_baseline_windows_after_inclusion  | end_date             | f         |         0 |          1 |     1 |        |           1
 0.2788666 | jtemp1c37l3b_baseline_windows_after_inclusion  | start_date           | f         |         0 |       4729 |   100 |    101 |  0.16305907
           | jtemp1c37l3b_baseline_windows_with_collections | person_id            | f         |         0 |         -1 |       |    101 |           1
           | jtemp1c37l3b_baseline_windows_with_collections | uuid                 | f         |         0 |         -1 |       |    101 |  0.37703142
(17 rows)
```

table schemas

```
--
-- PostgreSQL database dump
--

-- Dumped from database version 12.2
-- Dumped by pg_dump version 12.2

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

--
-- Name: jigsaw_temp; Type: SCHEMA; Schema: -; Owner: -
--

CREATE SCHEMA jigsaw_temp;


SET default_tablespace = '';

SET default_table_access_method = heap;

--
-- Name: jtemp1c37l3b_baseline_windows_after_inclusion; Type: TABLE; Schema: jigsaw_temp; Owner: -
--

CREATE TABLE jigsaw_temp.jtemp1c37l3b_baseline_windows_after_inclusion (
    uuid text,
    person_id bigint,
    criterion_id bigint,
    criterion_table text,
    criterion_domain text,
    start_date date,
    end_date date,
    source_value text,
    source_vocabulary_id text,
    drug_amount double precision,
    drug_amount_units text,
    drug_days_supply integer,
    drug_name text,
    drug_quantity bigint,
    window_id bigint
);


--
-- Name: jtemp1c37l3b_baseline_windows_with_collections; Type: TABLE; Schema: jigsaw_temp; Owner: -
--

CREATE TABLE jigsaw_temp.jtemp1c37l3b_baseline_windows_with_collections (
    person_id bigint,
    uuid text
);


--
-- PostgreSQL database dump complete
--

```

Explains

<a href="https://explain.depesz.com/s/HGDc">HGDc : version_9_6_work_mem_4GB_random_page_cost_1_effective_cache_size_36GB_query_file_casted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/f31D">f31D : version_12_2_work_mem_4MB_random_page_cost_4_effective_cache_size_36GB_query_file_present_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/UUPA">UUPA : version_12_2_work_mem_4MB_random_page_cost_4_effective_cache_size_36GB_query_file_nulls_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/P9DF">P9DF : version_12_2_work_mem_4MB_random_page_cost_4_effective_cache_size_36GB_query_file_deleted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/YIU8">YIU8 : version_12_2_work_mem_4MB_random_page_cost_4_effective_cache_size_36GB_query_file_casted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/4woa">4woa : version_12_2_work_mem_4MB_random_page_cost_2_effective_cache_size_36GB_query_file_present_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/VC2b">VC2b : version_12_2_work_mem_4MB_random_page_cost_2_effective_cache_size_36GB_query_file_nulls_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/mQeT4">mQeT4 : version_12_2_work_mem_4MB_random_page_cost_2_effective_cache_size_36GB_query_file_deleted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/BPTA">BPTA : version_12_2_work_mem_4MB_random_page_cost_2_effective_cache_size_36GB_query_file_casted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/K5Af">K5Af : version_12_2_work_mem_4MB_random_page_cost_1_effective_cache_size_36GB_query_file_present_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/kFhX">kFhX : version_12_2_work_mem_4MB_random_page_cost_1_effective_cache_size_36GB_query_file_nulls_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/6tNy">6tNy : version_12_2_work_mem_4MB_random_page_cost_1_effective_cache_size_36GB_query_file_deleted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/OXDs">OXDs : version_12_2_work_mem_4MB_random_page_cost_1_effective_cache_size_36GB_query_file_casted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/jU6F">jU6F : version_12_2_work_mem_4GB_random_page_cost_4_effective_cache_size_36GB_query_file_present_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/layi">layi : version_12_2_work_mem_4GB_random_page_cost_4_effective_cache_size_36GB_query_file_nulls_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/m98L">m98L : version_12_2_work_mem_4GB_random_page_cost_4_effective_cache_size_36GB_query_file_deleted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/iLSa">iLSa : version_12_2_work_mem_4GB_random_page_cost_4_effective_cache_size_36GB_query_file_casted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/8IK7">8IK7 : version_12_2_work_mem_4GB_random_page_cost_2_effective_cache_size_36GB_query_file_present_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/cHr1V">cHr1V : version_12_2_work_mem_4GB_random_page_cost_2_effective_cache_size_36GB_query_file_nulls_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/W5fF">W5fF : version_12_2_work_mem_4GB_random_page_cost_2_effective_cache_size_36GB_query_file_deleted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/SQ9U">SQ9U : version_12_2_work_mem_4GB_random_page_cost_2_effective_cache_size_36GB_query_file_casted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/W2Qy">W2Qy : version_12_2_work_mem_4GB_random_page_cost_1_effective_cache_size_36GB_query_file_present_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/QNuX">QNuX : version_12_2_work_mem_4GB_random_page_cost_1_effective_cache_size_36GB_query_file_nulls_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/1yOO">1yOO : version_12_2_work_mem_4GB_random_page_cost_1_effective_cache_size_36GB_query_file_deleted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/B3TY">B3TY : version_12_2_work_mem_4GB_random_page_cost_1_effective_cache_size_36GB_query_file_casted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/kZ8d">kZ8d : version_9_6_work_mem_4GB_random_page_cost_1_effective_cache_size_36GB_query_file_deleted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/Axup">Axup : version_9_6_work_mem_4MB_random_page_cost_4_effective_cache_size_36GB_query_file_nulls_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/x2J7">x2J7 : version_9_6_work_mem_4MB_random_page_cost_4_effective_cache_size_36GB_query_file_deleted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/OX6f">OX6f : version_9_6_work_mem_4MB_random_page_cost_4_effective_cache_size_36GB_query_file_casted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/k3He">k3He : version_9_6_work_mem_4MB_random_page_cost_2_effective_cache_size_36GB_query_file_present_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/BBhy">BBhy : version_9_6_work_mem_4MB_random_page_cost_2_effective_cache_size_36GB_query_file_nulls_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/5yaz">5yaz : version_9_6_work_mem_4MB_random_page_cost_2_effective_cache_size_36GB_query_file_deleted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/7yiv">7yiv : version_9_6_work_mem_4MB_random_page_cost_2_effective_cache_size_36GB_query_file_casted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/E1cp">E1cp : version_9_6_work_mem_4MB_random_page_cost_1_effective_cache_size_36GB_query_file_present_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/nlQp">nlQp : version_9_6_work_mem_4MB_random_page_cost_1_effective_cache_size_36GB_query_file_nulls_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/b76Y">b76Y : version_9_6_work_mem_4MB_random_page_cost_1_effective_cache_size_36GB_query_file_deleted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/V07e">V07e : version_9_6_work_mem_4MB_random_page_cost_1_effective_cache_size_36GB_query_file_casted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/EHHN">EHHN : version_9_6_work_mem_4GB_random_page_cost_4_effective_cache_size_36GB_query_file_present_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/w4dV">w4dV : version_9_6_work_mem_4GB_random_page_cost_4_effective_cache_size_36GB_query_file_nulls_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/TnAd">TnAd : version_9_6_work_mem_4GB_random_page_cost_4_effective_cache_size_36GB_query_file_deleted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/Dh0pU">Dh0pU : version_9_6_work_mem_4GB_random_page_cost_4_effective_cache_size_36GB_query_file_casted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/jYZE">jYZE : version_9_6_work_mem_4GB_random_page_cost_2_effective_cache_size_36GB_query_file_present_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/9pex">9pex : version_9_6_work_mem_4GB_random_page_cost_2_effective_cache_size_36GB_query_file_nulls_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/feMn">feMn : version_9_6_work_mem_4GB_random_page_cost_2_effective_cache_size_36GB_query_file_deleted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/BtSV">BtSV : version_9_6_work_mem_4GB_random_page_cost_2_effective_cache_size_36GB_query_file_casted_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/rtxW">rtxW : version_9_6_work_mem_4GB_random_page_cost_1_effective_cache_size_36GB_query_file_present_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/ltmx">ltmx : version_9_6_work_mem_4GB_random_page_cost_1_effective_cache_size_36GB_query_file_nulls_sql | explain.depesz.com</a><br/>
<a href="https://explain.depesz.com/s/Xfca">Xfca : version_9_6_work_mem_4MB_random_page_cost_4_effective_cache_size_36GB_query_file_present_sql | explain.depesz.com</a><br/>

Example of quick query:

```
SET work_mem = '4GB';
SET random_page_cost = '1';
SET effective_cache_size = '36GB';
EXPLAIN (ANALYZE,BUFFERS,SETTINGS)  SELECT
                *
            FROM (
                SELECT
                    "l"."person_id" AS "person_id",
                    "l"."criterion_id" AS "criterion_id",
                    "l"."criterion_table" AS "criterion_table",
                    "l"."criterion_domain" AS "criterion_domain",
                    "l"."start_date" AS "start_date",
                    "l"."end_date" AS "end_date",
                    "l"."source_value" AS "source_value",
                    "l"."source_vocabulary_id" AS "source_vocabulary_id",
                    "l"."drug_amount" AS "drug_amount",
                    "l"."drug_amount_units" AS "drug_amount_units",
                    "l"."drug_days_supply" AS "drug_days_supply",
                    "l"."drug_name" AS "drug_name",
                    "l"."drug_quantity" AS "drug_quantity",
                    "l"."uuid" AS "uuid",
                    "l"."window_id" AS "window_id"
                FROM (
                    SELECT
                        "l"."person_id" AS "person_id",
                        "l"."criterion_id" AS "criterion_id",
                        "l"."criterion_table" AS "criterion_table",
                        "l"."criterion_domain" AS "criterion_domain",
                        "l"."start_date" AS "start_date",
                        "l"."end_date" AS "end_date",
                        "l"."source_value" AS "source_value",
                        "l"."source_vocabulary_id" AS "source_vocabulary_id",
                        "l"."drug_amount" AS "drug_amount",
                        "l"."drug_amount_units" AS "drug_amount_units",
                        "l"."drug_days_supply" AS "drug_days_supply",
                        "l"."drug_name" AS "drug_name",
                        "l"."drug_quantity" AS "drug_quantity",
                        "l"."uuid" AS "uuid",
                        "l"."window_id" AS "window_id"
                    FROM (
                        SELECT
                            "person_id" AS "person_id",
                            "criterion_id" AS "criterion_id",
                            "criterion_table" AS "criterion_table",
                            "criterion_domain" AS "criterion_domain",
                            "start_date" AS "start_date",
                            "end_date" AS "end_date",
                            "source_value" AS "source_value",
                            "source_vocabulary_id" AS "source_vocabulary_id",
                            "drug_amount" AS "drug_amount",
                            "drug_amount_units" AS "drug_amount_units",
                            "drug_days_supply" AS "drug_days_supply",
                            "drug_name" AS "drug_name",
                            "drug_quantity" AS "drug_quantity",
                            "uuid" AS "uuid",
                            "window_id" AS "window_id"
                        FROM
                            "jigsaw_temp"."jtemp1c37l3b_baseline_windows_after_inclusion") AS "l") AS "l"
                WHERE (EXISTS (
                        SELECT
                            1
                        FROM (
                            SELECT
                                "r"."uuid" AS "uuid"
                            FROM (
                                SELECT
                                    "uuid" AS "uuid",
                                    CAST(NULL AS float) AS "drug_amount",
                                    CAST(NULL AS text) AS "drug_amount_units",
                                    CAST(NULL AS bigint) AS "drug_days_supply",
                                    CAST(NULL AS text) AS "drug_name",
                                    CAST(NULL AS float) AS "drug_quantity",
                                    CAST(NULL AS integer) AS "window_id",
                                    "person_id" AS "person_id",
                                    CAST(NULL as bigint) AS "criterion_id",
                                    CAST(NULL as text) AS "criterion_table",
                                    CAST(NULL as date) AS "start_date",
                                    CAST(NULL as date) AS "end_date"
                                FROM
                                    "jigsaw_temp"."jtemp1c37l3b_baseline_windows_with_collections") AS "r"
                                GROUP BY "r"."uuid") AS "r"

                        WHERE ("l"."uuid" = "r"."uuid")))) AS "match_2"
```

Quick EXPLAIN:

```
                                                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=7686.74..23252.46 rows=138972 width=265) (actual time=147.773..407.372 rows=138972 loops=1)
   Hash Cond: (jtemp1c37l3b_baseline_windows_after_inclusion.uuid = jtemp1c37l3b_baseline_windows_with_collections.uuid)
   Buffers: shared hit=10620
   ->  Seq Scan on jtemp1c37l3b_baseline_windows_after_inclusion  (cost=0.00..14239.44 rows=505244 width=265) (actual time=0.020..59.875 rows=505244 loops=1)
         Buffers: shared hit=9187
   ->  Hash  (cost=5949.59..5949.59 rows=138972 width=46) (actual time=146.796..146.797 rows=138972 loops=1)
         Buckets: 262144  Batches: 1  Memory Usage: 12711kB
         Buffers: shared hit=1433
         ->  HashAggregate  (cost=3170.15..4559.87 rows=138972 width=46) (actual time=74.334..103.543 rows=138972 loops=1)
               Group Key: jtemp1c37l3b_baseline_windows_with_collections.uuid
               Buffers: shared hit=1433
               ->  Seq Scan on jtemp1c37l3b_baseline_windows_with_collections  (cost=0.00..2822.72 rows=138972 width=46) (actual time=0.011..16.294 rows=138972 loops=1)
                     Buffers: shared hit=1433
 Settings: effective_cache_size = '36GB', random_page_cost = '1', work_mem = '4GB'
 Planning Time: 0.597 ms
 Execution Time: 418.440 ms
(16 rows)
```

Example of Slow Query:

```
SET work_mem = '4GB';
SET random_page_cost = '1';
SET effective_cache_size = '36GB';
EXPLAIN (ANALYZE,BUFFERS,SETTINGS)              SELECT
                *
            FROM (
                SELECT
                    "l"."person_id" AS "person_id",
                    "l"."criterion_id" AS "criterion_id",
                    "l"."criterion_table" AS "criterion_table",
                    "l"."criterion_domain" AS "criterion_domain",
                    "l"."start_date" AS "start_date",
                    "l"."end_date" AS "end_date",
                    "l"."source_value" AS "source_value",
                    "l"."source_vocabulary_id" AS "source_vocabulary_id",
                    "l"."drug_amount" AS "drug_amount",
                    "l"."drug_amount_units" AS "drug_amount_units",
                    "l"."drug_days_supply" AS "drug_days_supply",
                    "l"."drug_name" AS "drug_name",
                    "l"."drug_quantity" AS "drug_quantity",
                    "l"."uuid" AS "uuid",
                    "l"."window_id" AS "window_id"
                FROM (
                    SELECT
                        "l"."person_id" AS "person_id",
                        "l"."criterion_id" AS "criterion_id",
                        "l"."criterion_table" AS "criterion_table",
                        "l"."criterion_domain" AS "criterion_domain",
                        "l"."start_date" AS "start_date",
                        "l"."end_date" AS "end_date",
                        "l"."source_value" AS "source_value",
                        "l"."source_vocabulary_id" AS "source_vocabulary_id",
                        "l"."drug_amount" AS "drug_amount",
                        "l"."drug_amount_units" AS "drug_amount_units",
                        "l"."drug_days_supply" AS "drug_days_supply",
                        "l"."drug_name" AS "drug_name",
                        "l"."drug_quantity" AS "drug_quantity",
                        "l"."uuid" AS "uuid",
                        "l"."window_id" AS "window_id"
                    FROM (
                        SELECT
                            "person_id" AS "person_id",
                            "criterion_id" AS "criterion_id",
                            "criterion_table" AS "criterion_table",
                            "criterion_domain" AS "criterion_domain",
                            "start_date" AS "start_date",
                            "end_date" AS "end_date",
                            "source_value" AS "source_value",
                            "source_vocabulary_id" AS "source_vocabulary_id",
                            "drug_amount" AS "drug_amount",
                            "drug_amount_units" AS "drug_amount_units",
                            "drug_days_supply" AS "drug_days_supply",
                            "drug_name" AS "drug_name",
                            "drug_quantity" AS "drug_quantity",
                            "uuid" AS "uuid",
                            "window_id" AS "window_id"
                        FROM
                            "jigsaw_temp"."jtemp1c37l3b_baseline_windows_after_inclusion") AS "l") AS "l"
                WHERE (EXISTS (
                        SELECT
                            1
                        FROM (
                            SELECT
                                "r"."uuid" AS "uuid"
                            FROM (
                                SELECT
                                    "uuid" AS "uuid",
                                    CAST(NULL AS float) AS "drug_amount",
                                    CAST(NULL AS text) AS "drug_amount_units",
                                    CAST(NULL AS bigint) AS "drug_days_supply",
                                    CAST(NULL AS text) AS "drug_name",
                                    CAST(NULL AS float) AS "drug_quantity",
                                    CAST(NULL AS integer) AS "window_id",
                                    "person_id" AS "person_id",
                                    "criterion_id" AS "criterion_id",
                                    "criterion_table" AS "criterion_table",
                                    "criterion_domain" AS "criterion_domain",
                                    "start_date" AS "start_date",
                                    "end_date" AS "end_date"
                                FROM
                                    "jigsaw_temp"."jtemp1c37l3b_baseline_windows_with_collections") AS "r"
                                GROUP BY "r"."uuid") AS "r"

                        WHERE ("l"."uuid" = "r"."uuid")))) AS "match_2"

```

Slow Query EXPLAIN

```
                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on jtemp1c37l3b_baseline_windows_after_inclusion  (cost=0.00..1601719821.59 rows=252622 width=265) (actual time=721.931..5424987.346 rows=138972 loops=1)
   Filter: (SubPlan 1)
   Rows Removed by Filter: 366272
   Buffers: shared hit=624493395
   SubPlan 1
     ->  Subquery Scan on r  (cost=0.00..3170.16 rows=1 width=0) (actual time=10.731..10.731 rows=0 loops=505244)
           Buffers: shared hit=624484208
           ->  Group  (cost=0.00..3170.15 rows=1 width=46) (actual time=10.730..10.730 rows=0 loops=505244)
                 Group Key: jtemp1c37l3b_baseline_windows_with_collections.uuid
                 Buffers: shared hit=624484208
                 ->  Seq Scan on jtemp1c37l3b_baseline_windows_with_collections  (cost=0.00..3170.15 rows=1 width=46) (actual time=10.724..10.724 rows=0 loops=505244)
                       Filter: (jtemp1c37l3b_baseline_windows_after_inclusion.uuid = uuid)
                       Rows Removed by Filter: 119859
                       Buffers: shared hit=624484208
 Settings: effective_cache_size = '36GB', random_page_cost = '1', work_mem = '4GB'
 Planning Time: 0.872 ms
 JIT:
   Functions: 12
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 3.531 ms, Inlining 113.077 ms, Optimization 408.591 ms, Emission 160.142 ms, Total 685.341 ms
 Execution Time: 5425095.601 ms
(21 rows)
```

Thanks,
Ryan
Вложения

Re: Execution time from >1s -> 80m+ when extra columns added inSELECT for sub-query

От
Pavel Stehule
Дата:
Hi

It looks so in slow plan is some strange relations between subselects - the slow plan looks like plan for correlated subquery, and it should be slow.

Minimally you miss a index on column
jtemp1c37l3b_baseline_windows_after_inclusion.uuid


Re: Execution time from >1s -> 80m+ when extra columns added inSELECT for sub-query

От
A Guy Named Ryan
Дата:
Thanks for responding!


On May 18, 2020 at 12:18:37 PM, Pavel Stehule
(pavel.stehule@gmail.com(mailto:pavel.stehule@gmail.com)) wrote:

> Hi
>
> It looks so in slow plan is some strange relations between subselects - the slow plan looks like plan for correlated
subquery,and it should be slow.
 

I'm not very saavy about query planning and have very little (if any)
idea of what I'm talking about.


Our dynamically generated SQL does generate a lot of "strange"
subselects. They're hard to avoid for what we're doing. I admit it
makes for some abnormal looking queries. That said, I think we've
revealed an interesting issue here.


Why would the planner switch plans so drastically given that all I'm
doing is including a few extra columns in the subselect, particularly
when those columns are discarded by the super? parent? subselect


I'd expect the query planner to know that only the uuid column of the
subselect is used and not bother to actually project any additional
columns. Also, my impression about the EXISTS operator is that it
would not really need to pull any column values from disk except where
those values would be used to determine the conditions for the EXISTS.


Maybe I'm misunderstanding how this query should be executed, but I
wanted to bring it to your attention because it seems like, though the
SQL is a bit crazy, it reveals some very inconsistent planning on
PostgreSQL's part and might be something to look into.

> Minimally you miss a index on column
> jtemp1c37l3b_baseline_windows_after_inclusion.uuid

Thanks for pointing this out!


This is an intermediate table we generate as part of a much larger
process and the table is only used once. I'm under the impression that
there's a trade-off between taking the time to first build an index
then run the query rather than just running the query that one time.

It's an interesting idea to build the indexes just to avoid poor query
plans and it's something I'll keep in mind if we run into other
queries that trigger poor performance and we're unable to work around
them some other way.


If you'd like, I can slap on index on those tables and re-run a few
queries, but again, even if performance improves in this one test
case, I'm not sure it'd convince us to start adding indexes to some or
all our tables as part of our process just to avoid this one bad plan
in this one query. We've run 15-20 of our larger processes and only
hit this situation in one query in one process. I imagine adding
indexes across the board might be a heavy-handed solution to work
around this issue.



Re: Execution time from >1s -> 80m+ when extra columns added in SELECT for sub-query

От
Tom Lane
Дата:
A Guy Named Ryan <aguynamedryan@gmail.com> writes:
> Why would the planner switch plans so drastically given that all I'm
> doing is including a few extra columns in the subselect, particularly
> when those columns are discarded by the super? parent? subselect

The problem is that the columns you're adding *don't belong to that
table*.  Per your schema dump,
jtemp1c37l3b_baseline_windows_with_collections only contains the columns
person_id and uuid.  So when you write

                                SELECT
                                    "uuid" AS "uuid",
                                    CAST(NULL AS float) AS "drug_amount",
                                    CAST(NULL AS text) AS
"drug_amount_units",
                                    CAST(NULL AS bigint) AS
"drug_days_supply",
                                    CAST(NULL AS text) AS "drug_name",
                                    CAST(NULL AS float) AS "drug_quantity",
                                    CAST(NULL AS integer) AS "window_id",
                                    "person_id" AS "person_id",
                                    "criterion_id" AS "criterion_id",
                                    "criterion_table" AS "criterion_table",
                                    "criterion_domain" AS
"criterion_domain",
                                    "start_date" AS "start_date",
                                    "end_date" AS "end_date"
                                FROM
"jigsaw_temp"."jtemp1c37l3b_baseline_windows_with_collections"

those are the only two columns that are "legitimately" part of that bottom
sub-select, and the others are outer references to
jtemp1c37l3b_baseline_windows_after_inclusion.  That's legal per SQL,
but it makes the EXISTS into a correlated sub-select, which is something
we can't turn into a semijoin.

Indeed, the unreferenced columns do get thrown away later, but that
doesn't happen until well past the point where the join restructuring
decisions are made (and there are good reasons for that ordering of
operations).

Basically I'd write this off as "broken SQL code generator".  If it
doesn't understand the difference between a local reference and an
outer reference, you shouldn't be letting it near your database.
That sort of fundamental misunderstanding often leads to incorrect
query results, never mind whether the query is fast or not.

            regards, tom lane