Re: Suboptimal query plan when using expensive BCRYPT functions

Поиск
Список
Период
Сортировка
От bricklen
Тема Re: Suboptimal query plan when using expensive BCRYPT functions
Дата
Msg-id CAGrpgQ_yi-kFq+02Nz2cORhK5o+X8phm3R_sPu7GwtUXUx02fQ@mail.gmail.com
обсуждение исходный текст
Ответ на Suboptimal query plan when using expensive BCRYPT functions  (Erik van Zijst <erik.van.zijst@gmail.com>)
Ответы Re: Suboptimal query plan when using expensive BCRYPT functions  (Erik van Zijst <erik.van.zijst@gmail.com>)
Список pgsql-performance

On Fri, Mar 21, 2014 at 5:59 PM, Erik van Zijst <erik.van.zijst@gmail.com> wrote:
Hi there,

I've got a relatively simple query that contains expensive BCRYPT
functions that gets optimized in a way that causes postgres to compute
more bcrypt hashes than necessary, thereby dramatically slowing things
down.

In a certain part of our application we need to lookup users by their
username, email address and password. Now we don't store plaintext
passwords and so the query needs to compute bcrypt hashes on the fly:

    SELECT DISTINCT u.*
    FROM auth_user u
    JOIN bb_userprofile p ON p.user_id = u.id
    JOIN bb_identity i ON i.profile_id = p.id
    WHERE
    (
      (
        u.username ILIKE 'detkin'
        OR
        i.email ILIKE 'foo@example.com'
      )
      AND
      (
        SUBSTRING(password FROM 8) = CRYPT(
          'detkin', SUBSTRING(password FROM 8))
      )
    )

These queries are generated by a parser that translates from an
external query language to SQL run on the database. This test db
contains 12 user records.

With a single bcrypt hash taking ~300ms to compute, this is a recipe
for disaster and so the app only allows queries that require only a
very small number of bcrypt computation.

E.g. the user must always "AND" the password lookup with a clause like
" username = 'foo' AND password = 'bar'" (username is unique).

However, while the query above technically only needs to compute 1
hash (there is a user 'detkin' and email 'foo@example.com' does not
exist), it instead creates a query plan that computes hashes *before*
filtering on username and email, leading to 12 hash computations and a
very slow query.

The EXPLAIN (ANALYZE, BUFFERS) is here: http://explain.depesz.com/s/yhE

The schemas for the 3 tables involved are here:
http://pgsql.privatepaste.com/f72020ad0a

As a quick experiment I tried moving the joins and email lookup into a
nested IN query, but that still generates a plan that computes hashes
for all 12 users, before picking out the 1 whose username matches.

Is there any way I can get postgres to perform the hash calculations
on the *result* of the other parts of the where clause, instead of the
other way around? Or else rewrite the query?

Cheers,
Erik

(untested), but how about something like the following:

WITH au AS (
    SELECT DISTINCT u.*
    FROM auth_user u
    JOIN bb_userprofile p ON p.user_id = u.id
    JOIN bb_identity i ON i.profile_id = p.id
    WHERE u.username ILIKE 'detkin'
    OR i.email ILIKE 'foo@example.com')

SELECT au.*
FROM au
WHERE SUBSTRING(au.password FROM 8) = CRYPT('detkin', SUBSTRING(au.password FROM 8));

В списке pgsql-performance по дате отправления:

Предыдущее
От: Erik van Zijst
Дата:
Сообщение: Suboptimal query plan when using expensive BCRYPT functions
Следующее
От: Erik van Zijst
Дата:
Сообщение: Re: Suboptimal query plan when using expensive BCRYPT functions