Re: [INTERFACES] timeouts in libpq- can libpq requests block forever/a very long time?
От | Dirk Niggemann |
---|---|
Тема | Re: [INTERFACES] timeouts in libpq- can libpq requests block forever/a very long time? |
Дата | |
Msg-id | 37F6B68B.5ABC566F@dircon.co.uk обсуждение исходный текст |
Ответ на | Re: [INTERFACES] timeouts in libpq- can libpq requests block forever/a very long time? (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-interfaces |
Tom Lane wrote: > > Dirk Niggemann <dirk-n@dircon.co.uk> writes: > > Would there be any problems with changing pqWait so it uses a timeout? Actually, I've just gone away and implemented it- silly, silly me. The timeout can be completely disabled and is settable from PQconnect and from some new environment variables. (PGTIMEOUT and PGCONNTIMEOUT). It remains off by default. > > I think this would be rather risky: if I issue a query that the backend > takes a long time to compute, I don't want libpq timing out on me, > *especially* not if its response is to close the connection rather than > some fail-soft recovery technique. What kind of fail-soft could be implemented here? I suppose I could be nice & stick a PQrequestCancel in there, but I'll have to make sure it doesn't frazzle any other bit of the system. Hold that thought: PQrequestCancel is _designed_ to be called asynchronously to all te other requests, so it's unlikely to cause any major probs.. (have I missed something?). That doesn't however, deal with the main concern about the timeout- there is no 100% reliable way to tell the difference between "this qeuery is running a long time without returning results" and "the network is down" or "the backend is hung" The elegant solution would be to implement some sort of backend-ping-protocol. The problem with this is that the backend needs to be prepared to service it- even if it's doing other things. Not the sort of ting the backend was designed to do. BTW- as libpq doesn't set SO_KEEPALIVE, queries that run for > 15minutes w/o any data returned will disconnect anyway (provided things are't too confused). >From my POV, the purpose behind this is not to deal with long-computation queries. It's to deal with unattended automatic applications that run on remote sites over flakynetworks that splat relatively small sets of inserts (<150 rows, consisting of 30-40 integers max)into a table in a central DB. Now if I really wanted to do this right, I would build some sort of reliable queuing middleware... Except all that costs, well, real money. and this is only an experimental project. The applications do these uploads at regular intervals. The applications have to recover and perform the inserts at a later stage if the network is not working properly. I expect the network not to be working properly. The applications run 'unattended' and require no user interaction. > If the timeout is > application-settable and defaults to infinity, it might be OK. Yes, after having a closer look at PQrequestCancel I sort of see where the problem lies. libpq makes some effort at not closing the backend socket. That's why I left the original behaviour in place. Can you describe to me what the backend does on socket close? I should really grovel the code myself, but my main concern here is: a) I guess the backend simply aborts the current query/transaction. Nasty if you were running anenormous query that will take a long time to run. Otherwise not too catastrophic. b) I'm concerned the backend might do some blocking somewhere w/o a timeout- that is, can closing a socket at a bad time cause the backend to get stuck? c) can a failed network connection cause the backend to get stuck in a wait/blocking I/O? Having hung backends lying around could be a serious problem, as that would reduce the total number of backends available (max backends does not change dynamically). d) Am I misunderstanding here or is PostgresSQL 'single threaded' with respect to requests over a single client connection? libpq certainly gives that impression. Shoving an empty query at the backend while it is returning rows to you would confuse it very badly? e) just like the PQrequestCancel, could you (me?) build a sort of PQrequestPing to confirm that nothing is reachable or functioning at the remote end before libpq closes the socket? > > In my own applications using the asynchronous query features, I have not > seen any actual hangups that could be improved by a timeout ... but I've > mostly been working across LAN rather than WAN connections... What hangs did you see? the kind of problem I'm talking about is the sort where routers go and disappear on you... after you've opened the connection... or someone chops a cable on a remote site... all fun stuff... try this sometime:a) open connection to a remote lan over a single router (with no backup/standby)...b) start shoving data over it. c) shoot router by preferred method (ax, fire, power plug, admin typo, dripping aircon (my favourite)).d) see how long your socket connection takes to work out the network's gone away... > > > I would also like to add a timeout for the connect operation on the > > socket- > > on a sufficiently frazzled WAN this could block for a long time- > > This might be useful, but again I wonder whether libpq ought to be > second-guessing the transport system about how long is reasonable to > wait for a response. done as well- second-guessing the transport system is a good idea sometimes- the default TCP/ip timeouts are all rather long. Adjusting them generally means fiddling with kernel parameters (might break kernel. might break some other apps.) I certainly have seen problems on Solaris connection to DEC (oracle) RDB over a flaky WAN. the application would occasionally 'stick' for at least 8-10 minutes when we simulated the problem by pulling some cables. In reality it would regularly stick for 1/2 hour. (routers went up&down and let packets drip through on occasions. made both ends think thenetwork was just extremly slow rather than broken.) Oracle's connection timeout solution is particularly insidious(I have this by word of mouth only)- it seems to fork children to asynchronously handle the requests that time out. crashed a machine when 30 concurrent client apps all suddenly lost connection to the same database. apologies for seeming a bit argumentative on this point... BTW- just realized another problem- we can't know for certain a if a commit happened or not if we get a timeout... eek... example normal flow: COMMIT-> ->COMMIT <-RESPONSE <-RESPONSE error flow: COOMMIT-> ->COMMIT <-ERROR <-ERROR timeout flow (ok) COMMIT-> timeout close ->close ABORT (transaction assumed aborted). timeout flow (bad) COMMIT-> ->COMMIT <-RESPONSE timeout close (transaction assumed aborted but has actually committed) ->close I cant' seem to come up with the worst-case scenario (e.g. false positive on commit) maybe it's not possible... or do we have a three-way handshake for commits? > > regards, tom lane Thanks, Dirk
В списке pgsql-interfaces по дате отправления: