Обсуждение: Cleaning up threading code
Hi, This message is not about inventing backend threads, but I was reminded to post this by some thought-provoking hallway track discussion of that large topic at PGCon, mostly with Heikki. 1. We should completely remove --disable-thread-safety and the various !defined(ENABLE_THREAD_SAFETY) code paths. There are no computers that need it, it's not being tested, and support was already dropped/hardcoded in meson.build. Here is a mostly mechanical patch to finish that job, originally posted in one of the big 'historical baggage' threads -- I just didn't get around to committing in time for 16. Also a couple of related tiny cleanups for port/thread.c, which is by now completely bogus. 2. I don't like the way we have to deal with POSIX vs Windows at every site where we use threads, and each place has a different style of wrappers. I considered a few different approaches to cleaning this up: * provide centralised and thorough pthread emulation for Windows; I don't like this, I don't even like all of pthreads and there are many details to get lost in * adopt C11 <threads.h>; unfortunately it is too early, so you'd need to write/borrow replacements for at least 3 of our 11 target systems * invent our own mini-abstraction for a carefully controlled subset of stuff Here is an attempt at that last thing. Since I don't really like making up new names for things, I just used all the C11 names but with pg_ in front, and declared it all in "port/pg_threads.h" (actually I tried to write C11 replacements and then noped out and added the pg_ prefixes). I suppose the idea is that it and the prefixes might eventually go away. Note: here I am talking only about very basic operations like create, exit, join, explicit thread locals -- the stuff that we actually use today in frontend code. I'm not talking about other stuff like C11 atomics, memory models, and the thread_local storage class, which are all very good and interesting topics for another day. One mystery still eludes me on Windows: while trying to fix the ancient bug that ECPG leaks memory on Windows, because it doesn't call thread-local destructors, I discovered that if you use FlsAlloc instead of TlsAlloc you can pass in a destructor (that's "fibre-local", but we're not using fibres so it's works out the same as thread local AFAICT). It seems to crash in strange ways if you uncomment the line FlsAlloc(destructor). Maybe it calls the destructor even for NULL value? Or maybe it suffers from some kind of Windowsian multiple-heaps-active problem. I noticed that some other attempts at implementing tss on Windows use TlsAlloc but do their own bookkeeping of destructors and run them at thread exit, which I was hoping to avoid, but ... Before I spend time fixing that issue or seeking Windowsian help... do you think this direction is an improvement?
Вложения
Hi, On 2023-06-10 14:23:52 +1200, Thomas Munro wrote: > 1. We should completely remove --disable-thread-safety and the > various !defined(ENABLE_THREAD_SAFETY) code paths. Yes, please! > 2. I don't like the way we have to deal with POSIX vs Windows at > every site where we use threads, and each place has a different style > of wrappers. I considered a few different approaches to cleaning this > up: > > * provide centralised and thorough pthread emulation for Windows; I > don't like this, I don't even like all of pthreads and there are many > details to get lost in > * adopt C11 <threads.h>; unfortunately it is too early, so you'd need > to write/borrow replacements for at least 3 of our 11 target systems > * invent our own mini-abstraction for a carefully controlled subset of stuff > > Here is an attempt at that last thing. Since I don't really like > making up new names for things, I just used all the C11 names but with > pg_ in front, and declared it all in "port/pg_threads.h" (actually I > tried to write C11 replacements and then noped out and added the pg_ > prefixes). I suppose the idea is that it and the prefixes might > eventually go away. Note: here I am talking only about very basic > operations like create, exit, join, explicit thread locals -- the > stuff that we actually use today in frontend code. Unsurprisingly, I like this. > I'm not talking > about other stuff like C11 atomics, memory models, and the > thread_local storage class, which are all very good and interesting > topics for another day. Hm. I agree on C11 atomics and memory models, but I don't see a good reason to not add support for thread_local? In fact, I'd rather add support for thread_local and not add support for "thread keys" than the other way round. Afaict most, if not all, the places using keys would look simpler with thread_local than with keys. > One mystery still eludes me on Windows: while trying to fix the > ancient bug that ECPG leaks memory on Windows, because it doesn't call > thread-local destructors, I discovered that if you use FlsAlloc > instead of TlsAlloc you can pass in a destructor (that's > "fibre-local", but we're not using fibres so it's works out the same > as thread local AFAICT). It seems to crash in strange ways if you > uncomment the line FlsAlloc(destructor). Do you have a backtrace available? > Subject: [PATCH 1/5] Remove obsolete comments and code from fe-auth.c. > Subject: [PATCH 2/5] Rename port/thread.c to port/user.c. LGTM > -############################################################### > -# Threading > -############################################################### > - > -# XXX: About to rely on thread safety in the autoconf build, so not worth > -# implementing a fallback. > -cdata.set('ENABLE_THREAD_SAFETY', 1) I wonder if we should just unconditionally set that in c.h or such? It'd not be crazy for external projects to rely on that being set. > From ca74df4ff11ce0fd1e51786eccaeca810921fc6d Mon Sep 17 00:00:00 2001 > From: Thomas Munro <thomas.munro@gmail.com> > Date: Sat, 10 Jun 2023 09:14:07 +1200 > Subject: [PATCH 4/5] Add port/pg_threads.h for a common threading API. > > Loosely based on C11's <threads.h>, but with pg_ prefixes, this will > allow us to clean up many places that have to cope with POSIX and > Windows threads. > --- > src/include/port/pg_threads.h | 252 +++++++++++++++++++++++++++++++ > src/port/Makefile | 1 + > src/port/meson.build | 1 + > src/port/pg_threads.c | 117 ++++++++++++++ > src/tools/pgindent/typedefs.list | 7 + > 5 files changed, 378 insertions(+) > create mode 100644 src/include/port/pg_threads.h > create mode 100644 src/port/pg_threads.c > > diff --git a/src/include/port/pg_threads.h b/src/include/port/pg_threads.h > new file mode 100644 > index 0000000000..1706709994 > --- /dev/null > +++ b/src/include/port/pg_threads.h > @@ -0,0 +1,252 @@ > +/* > + * A multi-threading API abstraction loosely based on the C11 standard's > + * <threads.h> header. The identifiers have a pg_ prefix. Perhaps one day > + * we'll use standard C threads directly, and we'll drop the prefixes. > + * > + * Exceptions: > + * - pg_thrd_barrier_t is not based on C11 > + */ > + > +#ifndef PG_THREADS_H > +#define PG_THREADS_H > + > +#ifdef WIN32 I guess we could try to use C11 thread.h style threading on msvc 2022: https://developercommunity.visualstudio.com/t/finalize-c11-support-for-threading/1629487 > +#define WIN32_LEAN_AND_MEAN Why do we need this again here - shouldn't the define in src/include/port/win32_port.h already take care of this? > +#include <windows.h> > +#include <processthreadsapi.h> > +#include <fibersapi.h> > +#include <synchapi.h> > +#else > +#include <errno.h> > +#include "port/pg_pthread.h" > +#endif Seems somewhat odd to have port pg_threads.h and pg_pthread.h - why not merge these? > +#include <stdint.h> I think we widely rely stdint.h, errno.h to be included via c.h. > +#ifdef WIN32 > +typedef HANDLE pg_thrd_t; > +typedef CRITICAL_SECTION pg_mtx_t; > +typedef CONDITION_VARIABLE pg_cnd_t; > +typedef SYNCHRONIZATION_BARRIER pg_thrd_barrier_t; > +typedef DWORD pg_tss_t; > +typedef INIT_ONCE pg_once_flag; > +#define PG_ONCE_FLAG_INIT INIT_ONCE_STATIC_INIT > +#else > +typedef pthread_t pg_thrd_t; > +typedef pthread_mutex_t pg_mtx_t; > +typedef pthread_cond_t pg_cnd_t; > +typedef pthread_barrier_t pg_thrd_barrier_t; > +typedef pthread_key_t pg_tss_t; > +typedef pthread_once_t pg_once_flag; > +#define PG_ONCE_FLAG_INIT PTHREAD_ONCE_INIT > +#endif > + > +typedef int (*pg_thrd_start_t) (void *); > +typedef void (*pg_tss_dtor_t) (void *); > +typedef void (*pg_call_once_function_t) (void); > + > +enum > +{ > + pg_thrd_success = 0, > + pg_thrd_nomem = 1, > + pg_thrd_timedout = 2, > + pg_thrd_busy = 3, > + pg_thrd_error = 4 > +}; > + > +enum > +{ > + pg_mtx_plain = 0 > +}; I think we add typedefs for enums as well. > +#ifndef WIN32 > +static inline int > +pg_thrd_maperror(int error) Seems like that should have pthread in the name? > +{ > + if (error == 0) > + return pg_thrd_success; > + if (error == ENOMEM) > + return pg_thrd_nomem; > + return pg_thrd_error; > +} > +#endif > + > +#ifdef WIN32 > +BOOL pg_call_once_trampoline(pg_once_flag *flag, void *parameter, void **context); > +#endif > + > +static inline void > +pg_call_once(pg_once_flag *flag, pg_call_once_function_t function) > +{ > +#ifdef WIN32 > + InitOnceExecuteOnce(flag, pg_call_once_trampoline, (void *) function, NULL); > +#else > + pthread_once(flag, function); > +#endif > +} > + > +static inline int > +pg_thrd_equal(pg_thrd_t lhs, pg_thrd_t rhs) > +{ > +#ifdef WIN32 > + return lhs == rhs; > +#else > + return pthread_equal(lhs, rhs); > +#endif > +} > + > +static inline int > +pg_tss_create(pg_tss_t *key, pg_tss_dtor_t destructor) > +{ > +#ifdef WIN32 > + //*key = FlsAlloc(destructor); > + *key = FlsAlloc(NULL); > + return pg_thrd_success; Afaict FlsAlloc() can return errors: > If the function fails, the return value is FLS_OUT_OF_INDEXES. To get > extended error information, call GetLastError. > +#else > + return pg_thrd_maperror(pthread_key_create(key, destructor)); > +#endif > +} > +static inline int > +pg_tss_set(pg_tss_t key, void *value) > +{ > +#ifdef WIN32 > + return FlsSetValue(key, value) ? pg_thrd_success : pg_thrd_error; > +#else > + return pg_thrd_maperror(pthread_setspecific(key, value)); > +#endif > +} It's somewhat annoying that this can return errors. > +static inline int > +pg_mtx_init(pg_mtx_t *mutex, int type) I am somewhat confused by the need for these three letter abbreviations... Blaming C11, not you, to be clear. > > -# port needs to be in include path due to pthread-win32.h > +# XXX why do we need this? > libpq_inc = include_directories('.', '../../port') > libpq_c_args = ['-DSO_MAJOR_VERSION=5'] Historically we need it because random places outside of libpq include pthread-win32.h - but pthread-win32.h is in src/port, not in src/include/port. Why on earth it was done this way, I do not know. Greetings, Andres Freund
On 6/10/23 01:26, Andres Freund wrote: > On 2023-06-10 14:23:52 +1200, Thomas Munro wrote: >> I'm not talking >> about other stuff like C11 atomics, memory models, and the >> thread_local storage class, which are all very good and interesting >> topics for another day. > > Hm. I agree on C11 atomics and memory models, but I don't see a good reason to > not add support for thread_local? Perhaps obvious to most/all on this thread, but something that bit me recently to keep in mind as we contemplate threads. When a shared object library (e.g. extensions) is loaded, the init() function, and any functions marked with the constructor attribute, are executed. However they are not run again during thread_start(). So if there is thread local state that needs to be initialized, the initialization needs to be manually redone in each thread. Unless there is some mechanism similar to a constructor that I missed? -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On 10.06.23 07:26, Andres Freund wrote: >> -############################################################### >> -# Threading >> -############################################################### >> - >> -# XXX: About to rely on thread safety in the autoconf build, so not worth >> -# implementing a fallback. >> -cdata.set('ENABLE_THREAD_SAFETY', 1) > > I wonder if we should just unconditionally set that in c.h or such? It'd not > be crazy for external projects to rely on that being set. We definitely should keep the mention in ecpg_config.h.in, since that is explicitly put there for client code to use. We keep HAVE_LONG_LONG_INT etc. there for similar reasons. Another comment on patch 0003: I think this removal in configure.ac is too much: - else - LDAP_LIBS_FE="-lldap $EXTRA_LDAP_LIBS" This could would still be reachable for $enable_thread_safety=yes and $thread_safe_libldap=yes, which I suppose is the normal case?
On 03/07/2023 10:29, Peter Eisentraut wrote: > On 10.06.23 07:26, Andres Freund wrote: >>> -############################################################### >>> -# Threading >>> -############################################################### >>> - >>> -# XXX: About to rely on thread safety in the autoconf build, so not worth >>> -# implementing a fallback. >>> -cdata.set('ENABLE_THREAD_SAFETY', 1) >> >> I wonder if we should just unconditionally set that in c.h or such? It'd not >> be crazy for external projects to rely on that being set. > > We definitely should keep the mention in ecpg_config.h.in, since that is > explicitly put there for client code to use. We keep HAVE_LONG_LONG_INT > etc. there for similar reasons. +1. Patches 1-3 look good to me, with the things Andres & Peter already pointed out. The docs at https://www.postgresql.org/docs/current/libpq-threading.html needs updating. It's technically still accurate, but it ought to at least mention that libpq on v17 and above is always thread-safe. I propose the attached. It moves the note on "you can only use one PGConn from one thread at a time" to the top, before the description of the PQisthreadsafe() function. On 10/06/2023 05:23, Thomas Munro wrote: > > 2. I don't like the way we have to deal with POSIX vs Windows at > every site where we use threads, and each place has a different style > of wrappers. I considered a few different approaches to cleaning this > up: > > * provide centralised and thorough pthread emulation for Windows; I > don't like this, I don't even like all of pthreads and there are many > details to get lost in > * adopt C11 <threads.h>; unfortunately it is too early, so you'd need > to write/borrow replacements for at least 3 of our 11 target systems > * invent our own mini-abstraction for a carefully controlled subset of stuff Google search on "c11 threads on Windows" found some emulation wrappers: https://github.com/jtsiomb/c11threads and https://github.com/tinycthread/tinycthread, for example. Would either of those work for us? Even if we use an existing emulation wrapper, I wouldn't mind having our own pg_* abstration on top of it, to document which subset of the POSIX or C11 functions we actually use. -- Heikki Linnakangas Neon (https://neon.tech)
Вложения
Thanks all for the reviews. I pushed the first two small patches. Here is a new version of the main --disable-thread-safety removal part, with these changes: * fixed LDAP_LIBS_FE snafu reported by Peter E * defined ENABLE_THREAD_SAFETY 1 in c.h, for the benefit of extensions * defined ENABLE_THREAD_SAFETY 1 ecpg_config.h, for the benefit of ECPG clients * added Heikki's doc change as a separate patch (with minor xml snafu fixed) I'll follow up with a new version of the later pg_threads.h proposal and responses to feedback on that after getting this basic clean-up work in. Any other thoughts on this part?
Вложения
On Mon, Jul 10, 2023 at 10:45 AM Thomas Munro <thomas.munro@gmail.com> wrote: > * defined ENABLE_THREAD_SAFETY 1 in c.h, for the benefit of extensions I may lack imagination but I couldn't think of any use for that vestigial macro in backend/extension code, and client code doesn't see c.h and might not get the right answer anyway if it's dynamically linked which is the usual case. I took it out for now. Open to discussing further if someone can show what kinds of realistic external code would be affected. > * defined ENABLE_THREAD_SAFETY 1 ecpg_config.h, for the benefit of ECPG clients I left this one in. I'm not sure if it could really be needed. Perhaps at a stretch, perhaps ECPG code that is statically linked might test that instead of calling PQisthreadsafe(). Pushed.
On 2023-07-12 08:58:29 +1200, Thomas Munro wrote: > On Mon, Jul 10, 2023 at 10:45 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > * defined ENABLE_THREAD_SAFETY 1 in c.h, for the benefit of extensions > > I may lack imagination but I couldn't think of any use for that > vestigial macro in backend/extension code, and client code doesn't see > c.h and might not get the right answer anyway if it's dynamically > linked which is the usual case. I took it out for now. Open to > discussing further if someone can show what kinds of realistic > external code would be affected. WFM.
Apparently "drongo" didn't like something about this commit, and an ecpg test failed, but I can't immediately see why. Just "server closed the connection unexpectedly". drongo is running the new Meson buildfarm variant on Windows+MSVC, so, being still in development, I wonder if some diagnostic clue/log/backtrace etc is not being uploaded yet. FWIW Meson+Windows+MSVC passes on CI.
On Wed, Jul 12, 2023 at 3:21 PM Thomas Munro <thomas.munro@gmail.com> wrote: > Apparently "drongo" didn't like something about this commit, and an > ecpg test failed, but I can't immediately see why. Just "server > closed the connection unexpectedly". drongo is running the new Meson > buildfarm variant on Windows+MSVC, so, being still in development, I > wonder if some diagnostic clue/log/backtrace etc is not being uploaded > yet. FWIW Meson+Windows+MSVC passes on CI. Oh, that's probably unrelated to this commit. It's failed 6 times like that in the past ~3 months.
On Wed, Jul 12, 2023 at 3:34 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Wed, Jul 12, 2023 at 3:21 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > Apparently "drongo" didn't like something about this commit, and an > > ecpg test failed, but I can't immediately see why. Just "server > > closed the connection unexpectedly". drongo is running the new Meson > > buildfarm variant on Windows+MSVC, so, being still in development, I > > wonder if some diagnostic clue/log/backtrace etc is not being uploaded > > yet. FWIW Meson+Windows+MSVC passes on CI. > > Oh, that's probably unrelated to this commit. It's failed 6 times > like that in the past ~3 months. Ah, right, that is a symptom of the old Windows TCP linger vs process exit thing. Something on my list to try to investigate again, but not today. https://www.postgresql.org/message-id/flat/16678-253e48d34dc0c376%40postgresql.org
On Mon, Jul 3, 2023 at 8:43 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 10/06/2023 05:23, Thomas Munro wrote: > > 2. I don't like the way we have to deal with POSIX vs Windows at > > every site where we use threads, and each place has a different style > > of wrappers. I considered a few different approaches to cleaning this > > up: > > > > * provide centralised and thorough pthread emulation for Windows; I > > don't like this, I don't even like all of pthreads and there are many > > details to get lost in > > * adopt C11 <threads.h>; unfortunately it is too early, so you'd need > > to write/borrow replacements for at least 3 of our 11 target systems > > * invent our own mini-abstraction for a carefully controlled subset of stuff > > Google search on "c11 threads on Windows" found some emulation wrappers: > https://github.com/jtsiomb/c11threads and > https://github.com/tinycthread/tinycthread, for example. Would either of > those work for us? > > Even if we use an existing emulation wrapper, I wouldn't mind having our > own pg_* abstration on top of it, to document which subset of the POSIX > or C11 functions we actually use. Yeah. I am still interested in getting our thread API tidied up, and I intend to do that for PG 18. The patch I posted on that front, which can be summarised as a very lightweight subset of standard <threads.h> except with pg_ prefixes everywhere mapping to Windows or POSIX threads, still needs one tiny bit more work: figuring out how to get the TLS destructors to run on Windows FLS or similar, or implementing destructors myself (= little destructor callback list that a thread exit hook would run, work I was hoping to avoid by using something from the OS libs... I will try again soon). Andres also opined upthread that we should think about offering a thread_local storage class and getting away from TLS with keys. One thing to note is that the ECPG code is using TLS with destructors (in fact they are b0rked in master, git grep "FIXME: destructor" so ECPG leaks memory on Windows, so the thing that I'm trying to fix in pg_threads.h is actually fixing a long-standing bug), and although thread_local has destructors in C++ it doesn't in C, so if we decided to add the storage class but not bother with the tss_create feature, then ECPG would need to do cleanup another way. I will look into that option. One new development since last time I wrote the above stuff is that the Microsoft toolchain finally implemented the library components of C11 <threads.h>: https://devblogs.microsoft.com/cppblog/c11-threads-in-visual-studio-2022-version-17-8-preview-2/ It seems like it would be a long time before we could contemplate using that stuff though, especially given our duty of keeping libpq and ECPG requirements low and reasonable. However, it seems fairly straightforward to say that we require C99 + some way to access a C11-like thread local storage class. In practice that means a pg_thread_local macro that points to MSVC __declspec(thread) (which works since MSVC 2014?) or GCC/Clang __thread_local (which works since GCC4.8 in 2013?) or whatever. Of course every serious toolchain supports this stuff somehow or other, having been required to for C11. I can't immediately see any build farm animals that would be affected. Grison's compiler info is out of date, it's really running 8.something. The old OpenBSD GCC 4.2 animal was upgraded, and antique AIX got the boot: that's not even a coincidence, those retirements came about because those systems didn't support arbitrary alignment, another C11 feature that we now effectively require. (We could have worked around it it we had to but on but they just weren't reasonable targets.) So I'll go ahead and add the storage class to the next version, and contemplate a couple of different options for the tss stuff, including perhaps leaving it out if that seems doable. https://stackoverflow.com/questions/29399494/what-is-the-current-state-of-support-for-thread-local-across-platforms