Обсуждение: Re: [SQL] Efficient DELETE Strategies

Поиск
Список
Период
Сортировка

Re: [SQL] Efficient DELETE Strategies

От
Tom Lane
Дата:
Christoph Haller <ch@rodos.fzk.de> writes:
> Based on an entry in the mailing list from 30 Oct 2001 
> about efficient deletes on subqueries, 
> I've found two ways to do so (PostgreSQL 7.2.1): 
> ...
> Is there a way to put the second form (more complicated, but faster) 
> in one statement? 
> Or is there even a third way to delete, which I cannot see? 

The clean way to do this would be to allow extra FROM-list relations
in DELETE.  We already have a similar facility for UPDATE, so it's not
clear to me why there's not one for DELETE.  Then you could do, say,

DELETE FROM onfvalue , onfvalue j WHERE
j.sid= 5 AND
onfvalue.lid = j.lid AND 
onfvalue.mid = j.mid AND
onfvalue.timepoint = j.timepoint AND 
onfvalue.entrancetime < j.entrancetime ;

If you were using two separate tables you could force this to happen
via an implicit FROM-clause entry, much as you've done in your second
alternative --- but there's no way to set up a self-join in a DELETE
because of the lack of any place to put an alias declaration.

AFAIK this extension would be utterly trivial to implement, since all
the machinery is there already --- for 99% of the backend, it doesn't
matter whether a FROM-item is implicit or explicit.  We'd only need to
argue out what the syntax should be.  I could imagine
DELETE FROM relation_expr [ , table_ref [ , ... ] ][ WHERE bool_expr ]

or
DELETE FROM relation_expr [ FROM table_ref [ , ... ] ][ WHERE bool_expr ]

The two FROMs in the second form look a little weird, but they help to
make a clear separation between the deletion target table and the
merely-referenced tables.  Also, the first one might look to people
like they'd be allowed to write
DELETE FROM foo FULL JOIN bar ...

which is not any part of my intention (it's very unclear what it'd
mean for the target table to be on the nullable side of an outer join).
OTOH there'd be no harm in outer joins in a separate from-clause, eg
DELETE FROM foo FROM (bar FULL JOIN baz ON ...) WHERE ...

Actually, either syntax above would support that; I guess what's really
bothering me about the first syntax is that a comma suggests a list of
things that will all be treated similarly, while in reality the first
item will be treated much differently from the rest.

Does anyone know whether other systems that support the UPDATE extension
for multiple tables also support a DELETE extension for multiple tables?
If so, what's their syntax?

A somewhat-related issue is that people keep expecting to be able to
attach an alias to the target table name in UPDATE and DELETE; seems
like we get that question every couple months.  While this is clearly
disallowed by the SQL spec, it's apparently supported by some other
implementations (else we'd not get the question so much).  Should we
add that extension to our syntax?  Or should we continue to resist it?
        regards, tom lane


Re: [SQL] Efficient DELETE Strategies

От
Hannu Krosing
Дата:
On Mon, 2002-06-10 at 15:56, Tom Lane wrote:
> Christoph Haller <ch@rodos.fzk.de> writes:
> > Based on an entry in the mailing list from 30 Oct 2001 
> > about efficient deletes on subqueries, 
> > I've found two ways to do so (PostgreSQL 7.2.1): 
> > ...
> > Is there a way to put the second form (more complicated, but faster) 
> > in one statement? 
> > Or is there even a third way to delete, which I cannot see? 

...
> AFAIK this extension would be utterly trivial to implement, since all
> the machinery is there already --- for 99% of the backend, it doesn't
> matter whether a FROM-item is implicit or explicit.  We'd only need to
> argue out what the syntax should be.  I could imagine
> 
>     DELETE FROM relation_expr [ , table_ref [ , ... ] ]
>     [ WHERE bool_expr ]
> 
> or
> 
>     DELETE FROM relation_expr [ FROM table_ref [ , ... ] ]
>     [ WHERE bool_expr ]

What about

DELETE relation_expr FROM relation_expr [ , table_ref [ , ... ] ]    [ WHERE bool_expr ]

or

DELETE relation_expr.* FROM relation_expr [ , table_ref [ , ... ] ]    [ WHERE bool_expr ]


--------------
Hannu



Re: [SQL] Efficient DELETE Strategies

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Christoph Haller <ch@rodos.fzk.de> writes:
> 
>     DELETE FROM relation_expr [ FROM table_ref [ , ... ] ]
>     [ WHERE bool_expr ]
> 
> The two FROMs in the second form look a little weird, but they help to
> make a clear separation between the deletion target table and the
> merely-referenced tables.  Also, the first one might look to people
> like they'd be allowed to write
> 
>     DELETE FROM foo FULL JOIN bar ...
> 
> which is not any part of my intention (it's very unclear what it'd
> mean for the target table to be on the nullable side of an outer join).
> OTOH there'd be no harm in outer joins in a separate from-clause, eg
> 
>     DELETE FROM foo FROM (bar FULL JOIN baz ON ...) WHERE ...
> 
> Actually, either syntax above would support that; I guess what's really
> bothering me about the first syntax is that a comma suggests a list of
> things that will all be treated similarly, while in reality the first
> item will be treated much differently from the rest.

Interesting.  We could allow an alias on the primary table:
DELETE FROM foo fWHERE

and allow the non-alias version of the table for the join.  Of course,
that doesn't allow "FULL JOIN" and stuff like that.  The FROM ... FROM
looks weird, and there is clearly confusion over the FROM t1, t2.  I
wish there was another option.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [SQL] Efficient DELETE Strategies

От
Bruce Momjian
Дата:
Hannu Krosing wrote:
> What about
> 
> DELETE relation_expr FROM relation_expr [ , table_ref [ , ... ] ]
>      [ WHERE bool_expr ]
> 
> or
> 
> DELETE relation_expr.* FROM relation_expr [ , table_ref [ , ... ] ]
>      [ WHERE bool_expr ]

So make the initial FROM optional and allow the later FROM to be a list
of relations?  Seems kind of strange.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [SQL] Efficient DELETE Strategies

От
Manfred Koizar
Дата:
On Mon, 10 Jun 2002 09:56:27 -0400, Tom Lane <tgl@sss.pgh.pa.us>
wrote:
>Does anyone know whether other systems that support the UPDATE extension
>for multiple tables also support a DELETE extension for multiple tables?
>If so, what's their syntax?

MSSQL seems to guess what the user wants.  All the following
statements do the same:

(0)  DELETE FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.i=t2.i)
(1)  DELETE t1 FROM t2 WHERE t1.i=t2.i
(2a) DELETE t1 FROM t2, t1 WHERE t1.i=t2.i
(2b) DELETE t1 FROM t2 INNER JOIN t1 ON t1.i=t2.i
(3a) DELETE t1 FROM t2, t1 a WHERE a.i=t2.i
(3b) DELETE t1 FROM t2 INNER JOIN t1 a ON a.i=t2.i
(4a) DELETE a FROM t2, t1 a WHERE a.i=t2.i
(4b) DELETE a FROM t2 INNER JOIN t1 a ON a.i=t2.i
(5)  DELETE t1 FROM t1 a    WHERE EXISTS (SELECT * FROM t2 WHERE a.i=t2.i)
(6)  DELETE a FROM t1 a WHERE EXISTS (SELECT * FROM t2 WHERE a.i=t2.i)

(0) is standard SQL and should always work.  As an extension I'd like
(1) or (2), but only one of them and forbid the other one.  I'd also
forbid (3), don't know what to think of (4), and don't see a reason
why we would want (5) or (6).  I'd rather have (7) or (8).

These don't work:
(7) DELETE t1 a FROM t2 WHERE a.i = t2.i
"Incorrect syntax near 'a'."

(8) DELETE FROM t1 a WHERE EXISTS (SELECT * FROM t2 WHERE a.i = t2.i)
"Incorrect syntax near 'a'."

Self joins:
(2as) DELETE t1 FROM t1, t1 b WHERE 2*b.i=t1.i
(4as) DELETE a FROM t1 a, t1 b WHERE 2*b.i=a.i
(4bs) DELETE a FROM t1 a INNER JOIN t1 b on 2*b.i=a.i

These don't work:
DELETE t1 FROM t1 b WHERE 2 * b.i = t1.i
"The column prefix 't1' does not match with a table name or alias name
used in the query."

DELETE t1 FROM t1 a, t1 b WHERE 2 * b.i = a.i
"The table 't1' is ambiguous."

And as if there aren't enough ways yet, I just discovered that (1) to
(6) just as much work with "DELETE FROM" where I wrote "DELETE" ...

ServusManfred


Re: [SQL] Efficient DELETE Strategies

От
Tom Lane
Дата:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Hannu Krosing wrote:
>> What about
>> 
>> DELETE relation_expr FROM relation_expr [ , table_ref [ , ... ] ]
>> [ WHERE bool_expr ]
>> 
>> or
>> 
>> DELETE relation_expr.* FROM relation_expr [ , table_ref [ , ... ] ]
>> [ WHERE bool_expr ]

> So make the initial FROM optional and allow the later FROM to be a list
> of relations?  Seems kind of strange.

No, I think he's suggesting that one be able to pick out any element of
the FROM-list and say that that is the deletion target.  I really don't
want to get into that (unless there is precedent in Oracle or
someplace); it seems way too confusing to me.  It would also force us to
do error checking to eliminate cases that ought to just be syntactically
impossible: target table not present, target is a join or subselect
instead of a table, target is on wrong side of an outer join, etc.

[ and in another message ]
> The FROM ... FROM looks weird, and there is clearly confusion over the
> FROM t1, t2.  I wish there was another option.

The only other thing that's come to mind is to use a different keyword
(ie, not FROM) for the list of auxiliary relations.  WITH might work
from a simple readability point of view:DELETE FROM target WITH other-tables WHERE ...
But we've already got FROM as the equivalent construct in UPDATE, so it
seems weird to use something else in DELETE.
        regards, tom lane


Re: [SQL] Efficient DELETE Strategies

От
Tom Lane
Дата:
Manfred Koizar <mkoi-pg@aon.at> writes:
>> If so, what's their syntax?

> MSSQL seems to guess what the user wants.

Gack.  Nothing like treating mindless syntax variations as a "feature"
list...

> All the following statements do the same:

> (1)  DELETE t1 FROM t2 WHERE t1.i=t2.i
> (2a) DELETE t1 FROM t2, t1 WHERE t1.i=t2.i
> (5)  DELETE t1 FROM t1 a
>      WHERE EXISTS (SELECT * FROM t2 WHERE a.i=t2.i)
> (6)  DELETE a FROM t1 a WHERE EXISTS (SELECT * FROM t2 WHERE a.i=t2.i)

So in other words, MSSQL has no idea whether the name following DELETE
is a real table name or an alias, and it's also unclear whether the name
appears in the separate FROM clause or generates a FROM-item all by
itself.  This is why they have to punt on these cases:

> These don't work:
> DELETE t1 FROM t1 b WHERE 2 * b.i = t1.i
> "The column prefix 't1' does not match with a table name or alias name
> used in the query."

> DELETE t1 FROM t1 a, t1 b WHERE 2 * b.i = a.i
> "The table 't1' is ambiguous."

The ambiguity is entirely self-inflicted...

> And as if there aren't enough ways yet, I just discovered that (1) to
> (6) just as much work with "DELETE FROM" where I wrote "DELETE" ...

Hm.  So (1) with the DELETE FROM corresponds exactly to what I was
suggesting:DELETE FROM t1 FROM t2 WHERE t1.i=t2.i
except that I'd also allow an alias in there:DELETE FROM t1 a FROM t2 b WHERE a.i=b.i

Given the plethora of mutually incompatible interpretations that MSSQL
evidently supports, though, I fear we can't use it as precedent for
making any choices :-(.

Can anyone check out other systems?
        regards, tom lane


Re: [SQL] Efficient DELETE Strategies

От
Josh Berkus
Дата:
Tom,

> >> If so, what's their syntax?
>
> > MSSQL seems to guess what the user wants.
>
> Gack.  Nothing like treating mindless syntax variations as a "feature"
> list...

I vote that we stick to a strick SQL92 interpretation, here.
1) It's standard
2) Strict syntax on DELETE statements is better.

Personally, I would *not* want the database to "guess what I want" in a delete
statement; it might guess wrong and there go my records ...

Heck, one of the things I need to research how to turn off in PostgreSQL is
the "Add missing FROM-clause" feature, which has tripped me up many times.

--
-Josh Berkus



Re: [SQL] Efficient DELETE Strategies

От
Barry Lind
Дата:
This

Hannu Krosing wrote:
> DELETE relation_expr FROM relation_expr [ , table_ref [ , ... ] ]
>      [ WHERE bool_expr ]


This in some ways is similar to Oracle where the FROM is optional in a 
DELETE (ie. DELETE foo WHERE ...).  By omitting the first FROM, the 
syntax ends up mirroring the UPDATE case:

DELETE foo FROM bar WHERE ...

UPDATE foo FROM bar WHERE ...

However I think the syntax should also support the first FROM as being 
optional (even though it looks confusing):

DELETE FROM foo FROM bar WHERE ...

thanks,
--Barry



Re: [SQL] Efficient DELETE Strategies

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Hannu Krosing wrote:
> >> What about
> >> 
> >> DELETE relation_expr FROM relation_expr [ , table_ref [ , ... ] ]
> >> [ WHERE bool_expr ]
> >> 
> >> or
> >> 
> >> DELETE relation_expr.* FROM relation_expr [ , table_ref [ , ... ] ]
> >> [ WHERE bool_expr ]
> 
> > So make the initial FROM optional and allow the later FROM to be a list
> > of relations?  Seems kind of strange.
> 
> No, I think he's suggesting that one be able to pick out any element of
> the FROM-list and say that that is the deletion target.  I really don't
> want to get into that (unless there is precedent in Oracle or
> someplace); it seems way too confusing to me.  It would also force us to
> do error checking to eliminate cases that ought to just be syntactically
> impossible: target table not present, target is a join or subselect
> instead of a table, target is on wrong side of an outer join, etc.

Yuck.

> [ and in another message ]
> > The FROM ... FROM looks weird, and there is clearly confusion over the
> > FROM t1, t2.  I wish there was another option.
> 
> The only other thing that's come to mind is to use a different keyword
> (ie, not FROM) for the list of auxiliary relations.  WITH might work
> from a simple readability point of view:
>     DELETE FROM target WITH other-tables WHERE ...
> But we've already got FROM as the equivalent construct in UPDATE, so it
> seems weird to use something else in DELETE.

Yes, another keyword is the only solution.  Having FROM after DELETE
mean something different from FROM after a tablename is just too weird. 
I know UPDATE uses FROM, and it is logical to use it here, but it is
just too wierd when DELETE already has a FROM.  Should we allow FROM and
add WITH to UPDATE as well, and document WITH but support FROM too?  No
idea.  What if we support ADD FROM as the keywords for the new clause?

Clearly this is a TODO item.  I will document it when we decide on a
direction.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [SQL] Efficient DELETE Strategies

От
"Christopher Kings-Lynne"
Дата:
> Given the plethora of mutually incompatible interpretations that MSSQL
> evidently supports, though, I fear we can't use it as precedent for
> making any choices :-(.
>
> Can anyone check out other systems?

MySQL:

6.4.6 DELETE Syntax

DELETE [LOW_PRIORITY | QUICK] FROM table_name      [WHERE where_definition]      [ORDER BY ...]      [LIMIT rows]

or

DELETE [LOW_PRIORITY | QUICK] table_name[.*] [,table_name[.*] ...]      FROM table-references      [WHERE
where_definition]

or

DELETE [LOW_PRIORITY | QUICK]      FROM table_name[.*], [table_name[.*] ...]      USING table-references      [WHERE
where_definition]

DELETE deletes rows from table_name that satisfy the condition given by
where_definition, and returns the number of records deleted.

If you issue a DELETE with no WHERE clause, all rows are deleted. If you do
this in AUTOCOMMIT mode, this works as TRUNCATE. See section 6.4.7 TRUNCATE
Syntax. In MySQL 3.23, DELETE without a WHERE clause will return zero as the
number of affected records.

If you really want to know how many records are deleted when you are
deleting all rows, and are willing to suffer a speed penalty, you can use a
DELETE statement of this form:

mysql> DELETE FROM table_name WHERE 1>0;

Note that this is much slower than DELETE FROM table_name with no WHERE
clause, because it deletes rows one at a time.

If you specify the keyword LOW_PRIORITY, execution of the DELETE is delayed
until no other clients are reading from the table.

If you specify the word QUICK then the table handler will not merge index
leaves during delete, which may speed up certain kind of deletes.

In MyISAM tables, deleted records are maintained in a linked list and
subsequent INSERT operations reuse old record positions. To reclaim unused
space and reduce file-sizes, use the OPTIMIZE TABLE statement or the
myisamchk utility to reorganise tables. OPTIMIZE TABLE is easier, but
myisamchk is faster. See section 4.5.1 OPTIMIZE TABLE Syntax and section
4.4.6.10 Table Optimisation.

The first multi-table delete format is supported starting from MySQL 4.0.0.
The second multi-table delete format is supported starting from MySQL 4.0.2.

The idea is that only matching rows from the tables listed before the FROM
or before the USING clause are deleted. The effect is that you can delete
rows from many tables at the same time and also have additional tables that
are used for searching.

The .* after the table names is there just to be compatible with Access:

DELETE t1,t2 FROM t1,t2,t3 WHERE t1.id=t2.id AND t2.id=t3.id

or

DELETE FROM t1,t2 USING t1,t2,t3 WHERE t1.id=t2.id AND t2.id=t3.id

In the above case we delete matching rows just from tables t1 and t2.

ORDER BY and using multiple tables in the DELETE statement is supported in
MySQL 4.0.

If an ORDER BY clause is used, the rows will be deleted in that order. This
is really only useful in conjunction with LIMIT. For example:

DELETE FROM somelog
WHERE user = 'jcole'
ORDER BY timestamp
LIMIT 1

This will delete the oldest entry (by timestamp) where the row matches the
WHERE clause.

The MySQL-specific LIMIT rows option to DELETE tells the server the maximum
number of rows to be deleted before control is returned to the client. This
can be used to ensure that a specific DELETE command doesn't take too much
time. You can simply repeat the DELETE command until the number of affected
rows is less than the LIMIT value.

Chris



Re: [SQL] Efficient DELETE Strategies

От
Hannu Krosing
Дата:
On Tue, 2002-06-11 at 04:53, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > Hannu Krosing wrote:
> > >> What about
> > >> 
> > >> DELETE relation_expr FROM relation_expr [ , table_ref [ , ... ] ]
> > >> [ WHERE bool_expr ]
> > >> 
> > >> or
> > >> 
> > >> DELETE relation_expr.* FROM relation_expr [ , table_ref [ , ... ] ]
> > >> [ WHERE bool_expr ]
> > 
> > > So make the initial FROM optional and allow the later FROM to be a list
> > > of relations?  Seems kind of strange.

I was inspired by MS Access syntax that has optional relation_expr.* :
  DELETE [relation_expr.*] FROM relation_expr WHERE criteria

it does not allow any other tablerefs in from 

> Clearly this is a TODO item.  I will document it when we decide on a
> direction.

Or then we can just stick with standard syntax and teach people to do

DELETE FROM t1 where t1.id1 in (select id2 from t2 where t2.id2 = t1.id1)

and perhaps even teach our optimizer to add the t2.id2 = t1.id1 part
itself to make it fast

AFAIK this should be exactly the same as the proposed

DELETE FROM t1 FROM t2
WHERE t2.id2 = t1.id1

--------------
Hannu



Re: [SQL] Efficient DELETE Strategies

От
Bruce Momjian
Дата:
Added to TODO:
* Allow DELETE to handle table aliases for self-joins [delete]

---------------------------------------------------------------------------

Manfred Koizar wrote:
> On Mon, 10 Jun 2002 09:56:27 -0400, Tom Lane <tgl@sss.pgh.pa.us>
> wrote:
> >Does anyone know whether other systems that support the UPDATE extension
> >for multiple tables also support a DELETE extension for multiple tables?
> >If so, what's their syntax?
> 
> MSSQL seems to guess what the user wants.  All the following
> statements do the same:
> 
> (0)  DELETE FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t1.i=t2.i)
> (1)  DELETE t1 FROM t2 WHERE t1.i=t2.i
> (2a) DELETE t1 FROM t2, t1 WHERE t1.i=t2.i
> (2b) DELETE t1 FROM t2 INNER JOIN t1 ON t1.i=t2.i
> (3a) DELETE t1 FROM t2, t1 a WHERE a.i=t2.i
> (3b) DELETE t1 FROM t2 INNER JOIN t1 a ON a.i=t2.i
> (4a) DELETE a FROM t2, t1 a WHERE a.i=t2.i
> (4b) DELETE a FROM t2 INNER JOIN t1 a ON a.i=t2.i
> (5)  DELETE t1 FROM t1 a
>      WHERE EXISTS (SELECT * FROM t2 WHERE a.i=t2.i)
> (6)  DELETE a FROM t1 a WHERE EXISTS (SELECT * FROM t2 WHERE a.i=t2.i)
> 
> (0) is standard SQL and should always work.  As an extension I'd like
> (1) or (2), but only one of them and forbid the other one.  I'd also
> forbid (3), don't know what to think of (4), and don't see a reason
> why we would want (5) or (6).  I'd rather have (7) or (8).
> 
> These don't work:
> (7) DELETE t1 a FROM t2 WHERE a.i = t2.i
> "Incorrect syntax near 'a'."
> 
> (8) DELETE FROM t1 a WHERE EXISTS (SELECT * FROM t2 WHERE a.i = t2.i)
> "Incorrect syntax near 'a'."
> 
> Self joins:
> (2as) DELETE t1 FROM t1, t1 b WHERE 2*b.i=t1.i
> (4as) DELETE a FROM t1 a, t1 b WHERE 2*b.i=a.i
> (4bs) DELETE a FROM t1 a INNER JOIN t1 b on 2*b.i=a.i
> 
> These don't work:
> DELETE t1 FROM t1 b WHERE 2 * b.i = t1.i
> "The column prefix 't1' does not match with a table name or alias name
> used in the query."
> 
> DELETE t1 FROM t1 a, t1 b WHERE 2 * b.i = a.i
> "The table 't1' is ambiguous."
> 
> And as if there aren't enough ways yet, I just discovered that (1) to
> (6) just as much work with "DELETE FROM" where I wrote "DELETE" ...
> 
> Servus
>  Manfred
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073