Обсуждение: Problems with GSS encryption and SSL in libpq in 12~
Hi all, A quick make check with Postgres 11 and 12 for src/test/ssl/ shows a lot of difference in run time, using the same set of options with SSL and the same compilation flags (OpenSSL 1.1.1f, with debugging and assertions enabled among other things FWIW), with 12 taking up to five minutes to complete and 11 finishing as a matter of seconds for me. I have spent a couple of hours on that, to find out that libpq tries to initialize a GSS context where the client remains stuck: #9 0x00007fcd839bf72c in krb5_expand_hostname () from /usr/lib/x86_64-linux-gnu/libkrb5.so.3 #10 0x00007fcd839bf9e0 in krb5_sname_to_principal () from /usr/lib/x86_64-linux-gnu/libkrb5.so.3 #11 0x00007fcd83ad55b4 in ?? () from /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 #12 0x00007fcd83ac0a98 in ?? () from /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 #13 0x00007fcd83ac200f in gss_init_sec_context () from /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 #14 0x00007fcd8423b24d in pqsecure_open_gss (conn=0x5582fa8cad90) at fe-secure-gssapi.c:626 #15 0x00007fcd8421cd2b in PQconnectPoll (conn=0x5582fa8cad90) at fe-connect.c:3165 #16 0x00007fcd8421b311 in connectDBComplete (conn=0x5582fa8cad90) at fe-connect.c:2182 #17 0x00007fcd84218c1f in PQconnectdbParams (keywords=0x5582fa8cacf0, values=0x5582fa8cad40, expand_dbname=1) at fe-connect.c:647 #18 0x00005582f8a81c87 in main (argc=8, argv=0x7ffe5ddb9df8) at startup.c:266 However this makes little sense, why would libpq do that in the context of an OpenSSL connection? Well, makeEmptyPGconn() does that, which means that libpq would try by default to use GSS just if libpq is *built* with GSS: #ifdef ENABLE_GSS conn->try_gss = true; #endif It is possible to enforce this flag to false by using gssencmode=disable, but that's not really user-friendly in my opinion because nobody is going to remember that for connection strings with SSL settings so a lot of application are taking a performance hit at connection because of that in my opinion. I think that's also a bad idea from the start to assume that we have to try GSS by default, as any new code path opening a secured connection may fail into the trap of attempting to use GSS if this flag is not reset. Shouldn't we try to set this flag to false by default, and set it to true only if necessary depending on gssencmode? A quick hack switching this flag to false in makeEmptyPGconn() gives back the past performance to src/test/ssl/, FWIW. Looking around, it seems to me that there is a second issue as of PQconnectPoll(), where we don't reset the state machine correctly for try_gss within reset_connection_state_machine, and instead HEAD does it in connectDBStart(). Also, I have noted a hack as of pqsecure_open_gss() which does that: /* * We're done - hooray! Kind of gross, but we need to disable SSL * here so that we don't accidentally tunnel one over the other. */ #ifdef USE_SSL conn->allow_ssl_try = false; #endif And that looks like a rather bad idea to me.. Thanks, -- Michael
Вложения
On Mon, Apr 06, 2020 at 04:25:57PM +0900, Michael Paquier wrote: > It is possible to enforce this flag to false by using > gssencmode=disable, but that's not really user-friendly in my opinion > because nobody is going to remember that for connection strings with > SSL settings so a lot of application are taking a performance hit at > connection because of that in my opinion. I think that's also a bad > idea from the start to assume that we have to try GSS by default, as > any new code path opening a secured connection may fail into the trap > of attempting to use GSS if this flag is not reset. Shouldn't we try > to set this flag to false by default, and set it to true only if > necessary depending on gssencmode? A quick hack switching this flag > to false in makeEmptyPGconn() gives back the past performance to > src/test/ssl/, FWIW. > > Looking around, it seems to me that there is a second issue as of > PQconnectPoll(), where we don't reset the state machine correctly for > try_gss within reset_connection_state_machine, and instead HEAD does > it in connectDBStart(). So, a lot of things come down to PQconnectPoll() here. Once the connection state reached is CONNECTION_MADE, we first try a GSS connection if try_gss is true, and a SSL connection attempt follows just after. This makes me wonder about the following things: - gssencmode is prefer by default, the same as sslmode. Shouldn't we issue an error if any of them is not disabled to avoid any conflicts in the client, making the choice of gssencmode=prefer by default a bad choice? It seems to me that there could be an argument to make gssencmode disabled by default, and issue an error if somebody attempts a connection without at least gssencode or sslmode set as disabled. - The current code tries a GSS connection first, and then it follows with SSL, which is annoying because gssencmode=prefer by default means that any user would pay the cost of attempting a GSS connection for nothing (with or even without SSL). Shouldn't we do the opposite here, by trying SSL first, and then GSS? For now, I am attaching a WIP patch which seems like a good angle of attack for libpq, meaning that if sslmode and gssencmode are both set, then we would attempt a SSL connection before attempting GSS, so as any user of SSL does not pay a performance hit compared to past versions (I know that src/test/kerberos/ fails with that because sslmode=prefer is triggered first in PQconnectPoll(), but that's just to show the idea I had in mind). Any thoughts? -- Michael
Вложения
Greetings, * Michael Paquier (michael@paquier.xyz) wrote: > A quick make check with Postgres 11 and 12 for src/test/ssl/ shows a > lot of difference in run time, using the same set of options with SSL > and the same compilation flags (OpenSSL 1.1.1f, with debugging and > assertions enabled among other things FWIW), with 12 taking up to five > minutes to complete and 11 finishing as a matter of seconds for me. > > I have spent a couple of hours on that, to find out that libpq tries > to initialize a GSS context where the client remains stuck: > #9 0x00007fcd839bf72c in krb5_expand_hostname () from > /usr/lib/x86_64-linux-gnu/libkrb5.so.3 > #10 0x00007fcd839bf9e0 in krb5_sname_to_principal () from > /usr/lib/x86_64-linux-gnu/libkrb5.so.3 > #11 0x00007fcd83ad55b4 in ?? () from > /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 > #12 0x00007fcd83ac0a98 in ?? () from > /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 > #13 0x00007fcd83ac200f in gss_init_sec_context () from > /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 > #14 0x00007fcd8423b24d in pqsecure_open_gss (conn=0x5582fa8cad90) at > fe-secure-gssapi.c:626 > #15 0x00007fcd8421cd2b in PQconnectPoll (conn=0x5582fa8cad90) at > fe-connect.c:3165 > #16 0x00007fcd8421b311 in connectDBComplete (conn=0x5582fa8cad90) at > fe-connect.c:2182 > #17 0x00007fcd84218c1f in PQconnectdbParams (keywords=0x5582fa8cacf0, > values=0x5582fa8cad40, expand_dbname=1) at fe-connect.c:647 > #18 0x00005582f8a81c87 in main (argc=8, argv=0x7ffe5ddb9df8) at > startup.c:266 > > However this makes little sense, why would libpq do that in the > context of an OpenSSL connection? Well, makeEmptyPGconn() does that, > which means that libpq would try by default to use GSS just if libpq > is *built* with GSS: > #ifdef ENABLE_GSS > conn->try_gss = true; > #endif Sure, but if you look at what is done with it: /* * If GSSAPI encryption is enabled, then call * pg_GSS_have_cred_cache() which will return true if we can * acquire credentials (and give us a handle to use in * conn->gcred), and then send a packet to the server asking * for GSSAPI Encryption (and skip past SSL negotiation and * regular startup below). */ if (conn->try_gss && !conn->gctx) conn->try_gss = pg_GSS_have_cred_cache(&conn->gcred); In other words, it's trying because a call to gss_acquire_cred() (called from pg_GSS_have_cred_cache()) returned without error, indicating that GSS should be possible to attempt. If you have GSS compiled in, and you've got a credential cache such that gss_acquire_cred() returns true, it seems entirely reasonable that you'd like to connect using GSS encryption. > It is possible to enforce this flag to false by using > gssencmode=disable, but that's not really user-friendly in my opinion > because nobody is going to remember that for connection strings with > SSL settings so a lot of application are taking a performance hit at > connection because of that in my opinion. I think that's also a bad > idea from the start to assume that we have to try GSS by default, as > any new code path opening a secured connection may fail into the trap > of attempting to use GSS if this flag is not reset. Shouldn't we try > to set this flag to false by default, and set it to true only if > necessary depending on gssencmode? A quick hack switching this flag > to false in makeEmptyPGconn() gives back the past performance to > src/test/ssl/, FWIW. We don't just always try to do GSS, that certainly wouldn't make sense- we only try when gss_acquire_cred() comes back without an error. As this is part of the initial connection, it's also not possible to decide to do it by "depending on gssencmode", as we haven't talked to the server at all at this point and need to decide if we're going to do GSS encryption or not with the initial packet. Note that this is more-or-less identical to what we do with SSL, and, as you saw, we default to 'prefer' with GSSENCMODE, but you can set it to 'disable' on the client side if you don't want to try GSS, even when you have a client compiled with GSS and you have a credential cache. > Looking around, it seems to me that there is a second issue as of > PQconnectPoll(), where we don't reset the state machine correctly for > try_gss within reset_connection_state_machine, and instead HEAD does > it in connectDBStart(). Not following exactly what you're referring to here, but I see you've sent a follow-up email about this and will respond to that independently. > Also, I have noted a hack as of pqsecure_open_gss() which does that: > /* > * We're done - hooray! Kind of gross, but we need to disable SSL > * here so that we don't accidentally tunnel one over the other. > */ > #ifdef USE_SSL > conn->allow_ssl_try = false; > #endif > And that looks like a rather bad idea to me.. Tunneling SSL over GSS encryption is definitely a bad idea, which is why we prevent that from happening. I'm not sure what the issue here is- are you suggesting that we should support tunneling SSL over GSS encryption..? Thanks, Stephen
Вложения
Greetings, * Michael Paquier (michael@paquier.xyz) wrote: > On Mon, Apr 06, 2020 at 04:25:57PM +0900, Michael Paquier wrote: > > It is possible to enforce this flag to false by using > > gssencmode=disable, but that's not really user-friendly in my opinion > > because nobody is going to remember that for connection strings with > > SSL settings so a lot of application are taking a performance hit at > > connection because of that in my opinion. I think that's also a bad > > idea from the start to assume that we have to try GSS by default, as > > any new code path opening a secured connection may fail into the trap > > of attempting to use GSS if this flag is not reset. Shouldn't we try > > to set this flag to false by default, and set it to true only if > > necessary depending on gssencmode? A quick hack switching this flag > > to false in makeEmptyPGconn() gives back the past performance to > > src/test/ssl/, FWIW. > > > > Looking around, it seems to me that there is a second issue as of > > PQconnectPoll(), where we don't reset the state machine correctly for > > try_gss within reset_connection_state_machine, and instead HEAD does > > it in connectDBStart(). > > So, a lot of things come down to PQconnectPoll() here. Once the > connection state reached is CONNECTION_MADE, we first try a GSS > connection if try_gss is true, and a SSL connection attempt follows > just after. This makes me wonder about the following things: > - gssencmode is prefer by default, the same as sslmode. Shouldn't we > issue an error if any of them is not disabled to avoid any conflicts > in the client, making the choice of gssencmode=prefer by default a bad > choice? It seems to me that there could be an argument to make > gssencmode disabled by default, and issue an error if somebody > attempts a connection without at least gssencode or sslmode set as > disabled. I don't see why it would make sense to throw an error and require that one of them be disabled. I certainly don't agree that we should disable GSS encryption by default, or that there's any reason to throw an error if both GSS and SSL are set to 'prefer' (as our current default is). > - The current code tries a GSS connection first, and then it follows > with SSL, which is annoying because gssencmode=prefer by default means > that any user would pay the cost of attempting a GSS connection for > nothing (with or even without SSL). Shouldn't we do the opposite > here, by trying SSL first, and then GSS? A GSS connection is only attempted, as mentioned, if your GSS library claims that there's a possibility that credentials could be acquired for the connection. > For now, I am attaching a WIP patch which seems like a good angle of > attack for libpq, meaning that if sslmode and gssencmode are both set, > then we would attempt a SSL connection before attempting GSS, so as > any user of SSL does not pay a performance hit compared to past > versions (I know that src/test/kerberos/ fails with that because > sslmode=prefer is triggered first in PQconnectPoll(), but that's just > to show the idea I had in mind). > > Any thoughts? I don't agree with the assumption that, in the face of having GSS up and running, which actually validates the client and the server, that we should prefer SSL, where our default configuration for SSL does *not* validate properly *either* the client or the server. What it sounds like to me is that it'd be helpful for you to review why your environment has a GSS credential cache (or, at least, gss_acquire_cred() returns without any error) but GSS isn't properly working. If you're using OSX, it's possible the issue here is actually the broken and ridiculously ancient kerberos code that OSX ships with (and which, in another thread, there's a discussion about detecting and failing to build if it's detected as it's just outright broken). If it's not that then it'd be great to understand more about the specific environment and what things like 'klist' return. Thanks, Stephen