Re: [HACKERS] Faster methods for getting SPI results (460%improvement)
От | Jim Nasby |
---|---|
Тема | Re: [HACKERS] Faster methods for getting SPI results (460%improvement) |
Дата | |
Msg-id | 4f11b9c9-4b2a-0552-faa7-24d255173679@BlueTreble.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Faster methods for getting SPI results (Jim Nasby <Jim.Nasby@BlueTreble.com>) |
Ответы |
Re: [HACKERS] Faster methods for getting SPI results (460% improvement)
Re: [HACKERS] Faster methods for getting SPI results (460%improvement) |
Список | pgsql-hackers |
On 1/5/17 9:50 PM, Jim Nasby wrote: > The * on that is there's something odd going on where plpython starts > out really fast at this, then gets 100% slower. I've reached out to some > python folks about that. Even so, the overall results from a quick test > on my laptop are (IMHO) impressive: > > Old Code New Code Improvement > Pure SQL 2 sec 2 sec > plpython 12.7-14 sec 4-10 sec ~1.3-3x > plpython - SQL 10.7-12 sec 2-8 sec ~1.3-6x > > Pure SQL is how long an equivalent query takes to run with just SQL. > plpython - SQL is simply the raw python times minus the pure SQL time. I finally got all the kinks worked out and did some testing with python 3. Performance for my test [1] improved ~460% when returning a dict of lists (as opposed to the current list of dicts). Based on previous testing, I expect that using this method to return a list of dicts will be about 8% slower. The inconsistency in results on 2.7 has to do with how python 2 handles ints. Someone who's familiar with pl/perl should take a look at this and see if it would apply there. I've attached the SPI portion of this patch. I think the last step here is to figure out how to support switching between the current behavior and the "columnar" behavior of a dict of lists. I believe the best way to do that is to add two optional arguments to the execution functions: container=[] and members={}, and then copy those to produce the output objects. That means you can get the new behavior by doing something like: plpy.execute('...', container={}, members=[]) Or, more interesting, you could do: plpy.execute('...', container=Pandas.DataFrame, members=Pandas.Series) since that's what a lot of people are going to want anyway. In the future we could also add a GUC to change the default behavior. Any concerns with that approach? 1: > d = plpy.execute('SELECT s AS some_table_id, s AS some_field_name, s AS some_other_field_name FROM generate_series(1,{})s'.format(iter) ) > return len(d['some_table_id']) -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
В списке pgsql-hackers по дате отправления: