Обсуждение: Improve LWLock tranche name visibility across backends

Поиск
Список
Период
Сортировка

Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
Hi,

This is a follow-up to a discussion started in [0].

LWLocks in PostgreSQL are categorized into tranches, and the tranche name
appears as the wait_event in pg_stat_activity. There are both built-in
tranche names and tranche names that can be registered by extensions using
RequestNamedLWLockTranche() or LWLockRegisterTranche().

Tranche names are stored in process-local memory when registered. If a
tranche is registered during postmaster startup, such as with built-in
tranches or those registered via RequestNamedLWLockTranche(), its name is
inherited by backend processes via fork(). However, if a tranche is
registered dynamically by a backend using LWLockRegisterTranche(), other
backends will not be aware of it unless they explicitly register it as well.

Consider a case in which an extension allows a backend to attach a new
dshash via the GetNamedDSHash API and supplies a tranche name like
"MyUsefulExtension". The first backend to call GetNamedDSHash will
initialize an LWLock using the extension-defined tranche name and associate
it with a tranche ID in local memory. Other backends that later attach to
the same dshash will also learn about the tranche name and ID. Backends
that do not attach the dshash will not know this tranche name. This
results in differences in how wait events are reported in pg_stat_activity.

When querying pg_stat_activity, the function pgstat_get_wait_event is
called, which internally uses GetLWLockIdentifier and GetLWTrancheName
to map the LWLock to its tranche name. If the backend does not recognize
the tranche ID, a fallback name "extension" is used. Therefore, backends
that have registered the tranche will report the correct extension-defined
tranche name, while others will report the generic fallback of "extension".

i.e.
````
postgres=# select wait_event, wait_event_type from pg_stat_activity;
-[ RECORD 1 ]---+--------------------
wait_event      | extension
wait_event_type | LWLock
```
instead of
```
postgres=# select wait_event, wait_event_type from pg_stat_activity;
-[ RECORD 1 ]---+--------------------
wait_event      | MyUsefulExtension
wait_event_type | LWLock
```

This is the current design, but I think we can do better to avoid inconsitencies
this my lead for monitoring tools and diagnostics.

To improve this, we could store tranche names registered by a normal backend
in shared memory, for example in a dshash, allowing tranche names to be
resolved even by backends that have not explicitly registered them. This
would lead to more consistent behavior, particularly as more extensions
adopt APIs like GetNamedDSHash, where tranche names are registered by the
backend rather than the postmaster.

Attached is a proof of concept that does not alter the
LWLockRegisterTranche API. Instead, it detects when a registration is
performed by a normal backend and stores the tranche name in shared memory,
using a dshash keyed by tranche ID. Tranche name lookup now proceeds in
the order of built-in names, the local list, and finally the shared memory.
The fallback name "extension" can still be returned if an extension does
not register a tranche.

An exclusive lock is taken when adding a new tranche, which should be a rare
occurrence. A shared lock is taken when looking up a tranche name via
GetLWTrancheName.

There are still some open questions I have:

1/ There is currently no mechanism for deleting entries. I am not sure whether
this is a concern, since the size of the table would grow only with the
number of extensions and the number of LWLocks they initialize, which is
typically small. That said, others may have different thoughts on this.

2/ What is the appropriate size limit for a tranche name. The work done
in [0] caps the tranche name to 128 bytes for the dshash tranche, and
128 bytes + length of " DSA" suffix for the dsa tranche. Also, the
existing RequestNamedLWLockTranche caps the name to NAMEDATALEN. Currently,
LWLockRegisterTranche does not have a limit on the tranche name. I wonder
if we also need to take care of this and implement some common limit that
applies to tranch names regardless of how they're created?

[0] https://www.postgresql.org/message-id/aEiTzmndOVPmA6Mm%40nathan

--

Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Bertrand Drouvot
Дата:
Hi,

On Wed, Jul 09, 2025 at 04:39:48PM -0500, Sami Imseih wrote:
> Hi,
> 
> When querying pg_stat_activity, the function pgstat_get_wait_event is
> called, which internally uses GetLWLockIdentifier and GetLWTrancheName
> to map the LWLock to its tranche name. If the backend does not recognize
> the tranche ID, a fallback name "extension" is used. Therefore, backends
> that have registered the tranche will report the correct extension-defined
> tranche name, while others will report the generic fallback of "extension".
> 
> i.e.
> ````
> postgres=# select wait_event, wait_event_type from pg_stat_activity;
> -[ RECORD 1 ]---+--------------------
> wait_event      | extension
> wait_event_type | LWLock
> ```
> instead of
> ```
> postgres=# select wait_event, wait_event_type from pg_stat_activity;
> -[ RECORD 1 ]---+--------------------
> wait_event      | MyUsefulExtension
> wait_event_type | LWLock
> ```
> 
> This is the current design, but I think we can do better to avoid inconsitencies
> this my lead for monitoring tools and diagnostics.

+1 on finding a way to improve this, thanks for looking at it.

> Attached is a proof of concept that does not alter the
> LWLockRegisterTranche API. Instead, it detects when a registration is
> performed by a normal backend and stores the tranche name in shared memory,
> using a dshash keyed by tranche ID. Tranche name lookup now proceeds in
> the order of built-in names, the local list, and finally the shared memory.
> The fallback name "extension" can still be returned if an extension does
> not register a tranche.

I did not look in details, but do you think we could make use of
WaitEventCustomNew()?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
Thanks for the feedback!

> > Attached is a proof of concept that does not alter the
> > LWLockRegisterTranche API. Instead, it detects when a registration is
> > performed by a normal backend and stores the tranche name in shared memory,
> > using a dshash keyed by tranche ID. Tranche name lookup now proceeds in
> > the order of built-in names, the local list, and finally the shared memory.
> > The fallback name "extension" can still be returned if an extension does
> > not register a tranche.
>
> I did not look in details, but do you think we could make use of
> WaitEventCustomNew()?

It looks like I overlooked the custom wait event, so I didn’t take it into
account initially. That said, I do think it’s reasonable to consider
piggybacking on this infrastructure.

After all, LWLockRegisterTranche is already creating a custom wait event
defined by the extension. The advantage here is that we can avoid creating
new shared memory and instead reuse the existing static hash table, which is
capped at 128 custom wait events:

```
#define WAIT_EVENT_CUSTOM_HASH_MAX_SIZE 128
```

However, WaitEventCustomNew as it currently stands won’t work for our use
case, since it assigns an eventId automatically. The API currently takes a
classId and wait_event_name, but in our case, we’d actually want to pass in a
trancheId.

So, we might need a new API, something like:
```
WaitEventCustomNewWithEventId(uint32 classId, uint16 eventId,
const char *wait_event_name);
```
eventId in the LWLock case will be a tracheId that was generated
by the user in some earlier step, like LWLockInitialize

This would behave the same as the existing WaitEventCustomNew API,
except that it uses the provided eventId.

or maybe we can just allow WaitEventCustomNew to take in the eventId, and
if it's > 0, then use the passed in value, otherwise generate the next eventId.

I do like the latter approach more, what do you think?

With this API, we can then teach LWLockRegisterTranche to register the
custom wait event.

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Bertrand Drouvot
Дата:
Hi,

On Thu, Jul 10, 2025 at 04:34:34PM -0500, Sami Imseih wrote:
> Thanks for the feedback!
> 
> > > Attached is a proof of concept that does not alter the
> > > LWLockRegisterTranche API. Instead, it detects when a registration is
> > > performed by a normal backend and stores the tranche name in shared memory,
> > > using a dshash keyed by tranche ID. Tranche name lookup now proceeds in
> > > the order of built-in names, the local list, and finally the shared memory.
> > > The fallback name "extension" can still be returned if an extension does
> > > not register a tranche.
> >
> > I did not look in details, but do you think we could make use of
> > WaitEventCustomNew()?
> 
> It looks like I overlooked the custom wait event, so I didn’t take it into
> account initially. That said, I do think it’s reasonable to consider
> piggybacking on this infrastructure.
> 
> After all, LWLockRegisterTranche is already creating a custom wait event
> defined by the extension.

Right, the tranche is nothing but an eventID (from a wait_event_info point of
view).

> The advantage here is that we can avoid creating
> new shared memory

Right, I think it's good to rely on this existing machinery.

> and instead reuse the existing static hash table, which is
> capped at 128 custom wait events:
> 
> ```
> #define WAIT_EVENT_CUSTOM_HASH_MAX_SIZE 128
> ```

That's probably still high enough, thoughts?

> However, WaitEventCustomNew as it currently stands won’t work for our use
> case, since it assigns an eventId automatically. The API currently takes a
> classId and wait_event_name, but in our case, we’d actually want to pass in a
> trancheId.
> 
> So, we might need a new API, something like:
> ```
> WaitEventCustomNewWithEventId(uint32 classId, uint16 eventId,
> const char *wait_event_name);
> ```
> eventId in the LWLock case will be a tracheId that was generated
> by the user in some earlier step, like LWLockInitialize
> 
> This would behave the same as the existing WaitEventCustomNew API,
> except that it uses the provided eventId.
> 
> or maybe we can just allow WaitEventCustomNew to take in the eventId, and
> if it's > 0, then use the passed in value, otherwise generate the next eventId.
> 
> I do like the latter approach more, what do you think?

I think I do prefer it too, but in both cases we'll have to make sure there
is no collision on the eventID (LWTRANCHE_FIRST_USER_DEFINED is currently
95).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Wed, Jul 09, 2025 at 04:39:48PM -0500, Sami Imseih wrote:
> Attached is a proof of concept that does not alter the
> LWLockRegisterTranche API.

IMHO we should consider modifying the API, because right now you have to
call LWLockRegisterTranche() in each backend.  Why not accept the name as
an argument for LWLockNewTrancheId() and only require it to be called once
during your shared memory initialization?  In any case, a lot of existing
code will continue to call it in each backend unless the API changes.

> Instead, it detects when a registration is
> performed by a normal backend and stores the tranche name in shared memory,
> using a dshash keyed by tranche ID. Tranche name lookup now proceeds in
> the order of built-in names, the local list, and finally the shared memory.
> The fallback name "extension" can still be returned if an extension does
> not register a tranche.

Why do we need three different places for the lock names?  Is there a
reason we can't put it all in shared memory?

> 1/ There is currently no mechanism for deleting entries. I am not sure whether
> this is a concern, since the size of the table would grow only with the
> number of extensions and the number of LWLocks they initialize, which is
> typically small. That said, others may have different thoughts on this.

I don't see any strong reason to allow deletion unless we started to
reclaim tranche IDs, and I don't see any strong reason for that, either.

> 2/ What is the appropriate size limit for a tranche name. The work done
> in [0] caps the tranche name to 128 bytes for the dshash tranche, and
> 128 bytes + length of " DSA" suffix for the dsa tranche. Also, the
> existing RequestNamedLWLockTranche caps the name to NAMEDATALEN. Currently,
> LWLockRegisterTranche does not have a limit on the tranche name. I wonder
> if we also need to take care of this and implement some common limit that
> applies to tranch names regardless of how they're created?

Do we need to set a limit?  If we're using a DSA and dshash, we could let
folks use arbitrary long tranche names, right?  The reason for the limit in
the DSM registry is because the name is used as the key for the dshash
table.

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> > and instead reuse the existing static hash table, which is
> > capped at 128 custom wait events:
> >
> > ```
> > #define WAIT_EVENT_CUSTOM_HASH_MAX_SIZE 128
> > ```
>
> That's probably still high enough, thoughts?

I have no reason to believe that this number could be too low.
I am not aware of an extension that will initialize more than a
couple of LWLocks.

> > or maybe we can just allow WaitEventCustomNew to take in the eventId, and
> > if it's > 0, then use the passed in value, otherwise generate the next eventId.
> >
> > I do like the latter approach more, what do you think?
>
> I think I do prefer it too, but in both cases we'll have to make sure there
> is no collision on the eventID (LWTRANCHE_FIRST_USER_DEFINED is currently
> 95).

As far as collisions are concerned, the key of the hash is the wait_event_info,
which is a bitwise OR of classId and eventId
```
wait_event_info = classId | eventId;
```
Do you think collision can still be possible?

Now, what I think will be a good API is to provide an alternative to
LWLockRegisterTranche,
which now takes in both a tranche ID and tranche_name. The new API I propose is
LWLockRegisterTrancheWaitEventCustom which takes only a tranche_name
and internally
calls WaitEventCustomNew to add a new wait_event_info to the hash
table. The wait_event_info
is made up of classId = PG_WAIT_LWLOCK and LWLockNewTrancheId().

I prefer we implement a new API for this to make it explicit that this
API will both register
a tranche and create a custom wait event. I do not think we should get
rid of LWLockRegisterTranche
because it is used by CreateLWLocks during startup and I don't see a
reason to change that.
See the attached v1.

What is missing from the attached v1 is:

1/ the documentation changes required in [0].

2/ in dsm_registry.c, we are going to have to modify GetNamedDSHash and
GetNamedDSA to use the new API. Here is an example of how that looks like

```
@@ -389,14 +385,12 @@ GetNamedDSHash(const char *name, const
dshash_parameters *params, bool *found)
                entry->type = DSMR_ENTRY_TYPE_DSH;

                /* Initialize the LWLock tranche for the DSA. */
-               dsa_state->tranche = LWLockNewTrancheId();
                sprintf(dsa_state->tranche_name, "%s%s", name,
DSMR_DSA_TRANCHE_SUFFIX);
-               LWLockRegisterTranche(dsa_state->tranche,
dsa_state->tranche_name);
+               dsa_state->tranche =
LWLockRegisterTrancheWaitEventCustom(dsa_state->tranche_name);
 ```

Is there a concern with a custom wait event to be created implicitly
via the GetNamed* APIs?

3/ I still did not workout the max length requirement of the tranche
name, but I think
it would have to be as large as the GetNamed* APIs from the dsm
registry support.

I hope with v1, we can agree to the general direction of this change.

[0] https://www.postgresql.org/docs/current/xfunc-c.html

--
Sami

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Fri, Jul 11, 2025 at 04:32:13PM -0500, Sami Imseih wrote:
> Now, what I think will be a good API is to provide an alternative to
> LWLockRegisterTranche,
> which now takes in both a tranche ID and tranche_name. The new API I propose is
> LWLockRegisterTrancheWaitEventCustom which takes only a tranche_name
> and internally
> calls WaitEventCustomNew to add a new wait_event_info to the hash
> table. The wait_event_info
> is made up of classId = PG_WAIT_LWLOCK and LWLockNewTrancheId().
> 
> I prefer we implement a new API for this to make it explicit that this
> API will both register
> a tranche and create a custom wait event. I do not think we should get
> rid of LWLockRegisterTranche
> because it is used by CreateLWLocks during startup and I don't see a
> reason to change that.
> See the attached v1.

Hm.  I was thinking we could have LWLockNewTrancheId() take care of
registering the name.  The CreateLWLocks() case strikes me as a special
path.  IMHO LWLockRegisterTranche() should go away.

> Is there a concern with a custom wait event to be created implicitly
> via the GetNamed* APIs?

I'm not sure I see any particular advantage to using custom wait events
versus a dedicated LWLock tranche name table.  If anything, the limits on
the number of tranches and the lengths of the names gives me pause.

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
>> Attached is a proof of concept that does not alter the
>> LWLockRegisterTranche API.

> IMHO we should consider modifying the API, because right now you have to
> call LWLockRegisterTranche() in each backend. Why not accept the name as
> an argument for LWLockNewTrancheId() and only require it to be called once
> during your shared memory initialization?

Yes, we could do that, and this will simplify the tranche registration from the
current 2 step process of LWLockNewTrancheId() followed up with a
LWLockRegisterTranche(), to just simply  LWLockNewTrancheId("my tranche").
I agree.

>> Instead, it detects when a registration is
>> performed by a normal backend and stores the tranche name in shared memory,
>> using a dshash keyed by tranche ID. Tranche name lookup now proceeds in
>> the order of built-in names, the local list, and finally the shared memory.
>> The fallback name "extension" can still be returned if an extension does
>> not register a tranche.

> Why do we need three different places for the lock names? Is there a
> reason we can't put it all in shared memory?

The real reason I felt it was better to keep three separate locations is that
it allows for a clear separation between user-defined tranches registered
during postmaster startup and those registered during a normal backend. The
tranches registered during postmaster are inherited by the backend via
fork() (or EXEC_BACKEND), and therefore, the dshash table will only be used
by a normal backend.

Since DSM is not available during postmaster, if we were to create a DSA
segment in place, similar to what's done in StatsShmemInit(), we would also
need to ensure that the initial shared memory is sized appropriately. This is
because it would need to be large enough to accommodate all user-defined
tranches registered during postmaster, without having to rely on new
dsm segments.
From my experimentation, this sizing is not as straightforward as simply
calculating # of tranches * size of a tranche entry.

I still think we should create the dsa during postmaster, as we do with
StatsShmemInit, but it would be better if postmaster keeps its hands off this
dshash and only normal backends can use them.

Thoughts?

>> 2/ What is the appropriate size limit for a tranche name. The work done
>> in [0] caps the tranche name to 128 bytes for the dshash tranche, and
>> 128 bytes + length of " DSA" suffix for the dsa tranche. Also, the
>> existing RequestNamedLWLockTranche caps the name to NAMEDATALEN. Currently,
>> LWLockRegisterTranche does not have a limit on the tranche name. I wonder
>> if we also need to take care of this and implement some common limit that
>> applies to tranch names regardless of how they're created?

> Do we need to set a limit? If we're using a DSA and dshash, we could let
> folks use arbitrary long tranche names, right? The reason for the limit in
> the DSM registry is because the name is used as the key for the dshash
> table.

Sure that is a good point. The dshash entry could be like below, without a limit
on the tranche_name.

```
typedef struct LWLockTracheNamesEntry
{
int trancheId;
const char *tranche_name;
} LWLockTracheNamesEntry;
```

>> Is there a concern with a custom wait event to be created implicitly
>> via the GetNamed* APIs?

> I'm not sure I see any particular advantage to using custom wait events
> versus a dedicated LWLock tranche name table. If anything, the limits on
> the number of tranches and the lengths of the names gives me pause.

Sure, after contemplating on this a bit, I prefer a separate shared memory
as well. Custom wait events, while could work, will also be a bit of a confusing
user experience.

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Mon, Jul 14, 2025 at 02:34:00PM -0500, Sami Imseih wrote:
>> Why do we need three different places for the lock names? Is there a
>> reason we can't put it all in shared memory?
> 
> The real reason I felt it was better to keep three separate locations is that
> it allows for a clear separation between user-defined tranches registered
> during postmaster startup and those registered during a normal backend. The
> tranches registered during postmaster are inherited by the backend via
> fork() (or EXEC_BACKEND), and therefore, the dshash table will only be used
> by a normal backend.
> 
> Since DSM is not available during postmaster, if we were to create a DSA
> segment in place, similar to what's done in StatsShmemInit(), we would also
> need to ensure that the initial shared memory is sized appropriately. This is
> because it would need to be large enough to accommodate all user-defined
> tranches registered during postmaster, without having to rely on new
> dsm segments.
> From my experimentation, this sizing is not as straightforward as simply
> calculating # of tranches * size of a tranche entry.
> 
> I still think we should create the dsa during postmaster, as we do with
> StatsShmemInit, but it would be better if postmaster keeps its hands off this
> dshash and only normal backends can use them.

Ah, I missed the problem with postmaster.  Could we have the first backend
that needs to access the table be responsible for creating it and
populating it with the built-in/requested-at-startup entries?  Also, is
there any chance that postmaster might need to access the tranche names?

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Tom Lane
Дата:
Nathan Bossart <nathandbossart@gmail.com> writes:
> Ah, I missed the problem with postmaster.  Could we have the first backend
> that needs to access the table be responsible for creating it and
> populating it with the built-in/requested-at-startup entries?  Also, is
> there any chance that postmaster might need to access the tranche names?

Seems quite hazardous to let the postmaster get involved with such
a data structure.  If it seems to need to, we'd better rethink
where to put the functionality that needs the access.

            regards, tom lane



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> Ah, I missed the problem with postmaster.  Could we have the first backend
> that needs to access the table be responsible for creating it and
> populating it with the built-in/requested-at-startup entries?

We can certainly maintain a flag in the shared state that is set once
the first backend loads all the tranches in shared memory. That did not
cross my mind, but it feels wrong to offload such responsibility to a
normal backend.

> Also, is there any chance that postmaster might need to access the
> tranche names?

A postmaster does not currently have a reason to lookup
a tranche name, afaict. This only occurs when looking up wait events
or if lwlock tracing is enabled.

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Mon, Jul 14, 2025 at 03:45:02PM -0500, Sami Imseih wrote:
>> Ah, I missed the problem with postmaster.  Could we have the first backend
>> that needs to access the table be responsible for creating it and
>> populating it with the built-in/requested-at-startup entries?
> 
> We can certainly maintain a flag in the shared state that is set once
> the first backend loads all the tranches in shared memory. That did not
> cross my mind, but it feels wrong to offload such responsibility to a
> normal backend.

Well, we already need each backend to either initialize or attach to the
dshash table, and the initialization would only ever happen once on a
running server.  Adding a new initialization step to bootstrap the built-in
and registered-at-startup tranche names doesn't seem like that much of a
leap to me.

Another random thought: I worry that the dshash approach might be quite a
bit slower, and IIUC we just need to map an integer to a string.  Maybe we
should just use a DSA for LWLockTrancheNames.  IOW we'd leave it as a char
** but put it in shared memory.  

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Rahila Syed
Дата:
Hi,
 

Since DSM is not available during postmaster, if we were to create a DSA
segment in place, similar to what's done in StatsShmemInit(), we would also
need to ensure that the initial shared memory is sized appropriately. This is
because it would need to be large enough to accommodate all user-defined
tranches registered during postmaster, without having to rely on new
dsm segments.
From my experimentation, this sizing is not as straightforward as simply
calculating # of tranches * size of a tranche entry.

I still think we should create the dsa during postmaster, as we do with
StatsShmemInit, but it would be better if postmaster keeps its hands off this
dshash and only normal backends can use them.

Thoughts?


Since creating a DSA segment in place during Postmaster startup still requires calculating the size of
the tranche names table including the user-defined tranches, why not use static shared memory to
pre-allocate a fixed sized table and arrays of size NAMEDATALEN to store the names?  

If a dshash table is used to store tranche names and IDs, where would the tranche name for this table
be registered?  

Thank you,
Rahila Syed

Re: Improve LWLock tranche name visibility across backends

От
Bertrand Drouvot
Дата:
Hi,

On Fri, Jul 11, 2025 at 04:32:13PM -0500, Sami Imseih wrote:
> > > and instead reuse the existing static hash table, which is
> > > capped at 128 custom wait events:
> > >
> > > ```
> > > #define WAIT_EVENT_CUSTOM_HASH_MAX_SIZE 128
> > > ```
> >
> > That's probably still high enough, thoughts?
> 
> I have no reason to believe that this number could be too low.
> I am not aware of an extension that will initialize more than a
> couple of LWLocks.
> 
> > > or maybe we can just allow WaitEventCustomNew to take in the eventId, and
> > > if it's > 0, then use the passed in value, otherwise generate the next eventId.
> > >
> > > I do like the latter approach more, what do you think?
> >
> > I think I do prefer it too, but in both cases we'll have to make sure there
> > is no collision on the eventID (LWTRANCHE_FIRST_USER_DEFINED is currently
> > 95).
> 
> As far as collisions are concerned, the key of the hash is the wait_event_info,
> which is a bitwise OR of classId and eventId
> ```
> wait_event_info = classId | eventId;
> ```
> Do you think collision can still be possible?

I meant to say collision between the trancheID and WaitEventCustomCounter->nextId

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> Another random thought: I worry that the dshash approach might be quite a
> bit slower, and IIUC we just need to map an integer to a string.  Maybe we
> should just use a DSA for LWLockTrancheNames. IOW we'd leave it as a char**
> but put it in shared memory.

To use DSA just for this purpose, we would need to maintain an array of
dsa_pointers that reference the string(s), right? I am not clear what you
mean by using dsa to put the char**

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Tue, Jul 15, 2025 at 11:52:19AM -0500, Sami Imseih wrote:
>> Another random thought: I worry that the dshash approach might be quite a
>> bit slower, and IIUC we just need to map an integer to a string.  Maybe we
>> should just use a DSA for LWLockTrancheNames. IOW we'd leave it as a char**
>> but put it in shared memory.
> 
> To use DSA just for this purpose, we would need to maintain an array of
> dsa_pointers that reference the string(s), right? I am not clear what you
> mean by using dsa to put the char**

I was imagining putting the array in one big DSA allocation instead of
carting around a pointer for each tranche name.  (Sorry, I realize I am
hand-waving over some of the details.)

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
On Tue, Jul 15, 2025 at 11:57 AM Nathan Bossart
<nathandbossart@gmail.com> wrote:
>
> On Tue, Jul 15, 2025 at 11:52:19AM -0500, Sami Imseih wrote:
> >> Another random thought: I worry that the dshash approach might be quite a
> >> bit slower, and IIUC we just need to map an integer to a string.  Maybe we
> >> should just use a DSA for LWLockTrancheNames. IOW we'd leave it as a char**
> >> but put it in shared memory.
> >
> > To use DSA just for this purpose, we would need to maintain an array of
> > dsa_pointers that reference the string(s), right? I am not clear what you
> > mean by using dsa to put the char**
>
> I was imagining putting the array in one big DSA allocation instead of
> carting around a pointer for each tranche name.  (Sorry, I realize I am
> hand-waving over some of the details.)

I understood it like this. Here is a sketch:

```
dsa_pointer p;

dsa = dsa_create(....)

p = dsa_allocate(dsa, LWLockTranchesInitialSize());
tranche_names = (char **) dsa_get_address(dsa, p);
tranche_names[0] = "my tranche";
tranche_names[1] = "my tranche";
```

We will need to track the size and resize if needed.

Is this what you mean, from a high level?

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Tue, Jul 15, 2025 at 12:06:00PM -0500, Sami Imseih wrote:
> On Tue, Jul 15, 2025 at 11:57 AM Nathan Bossart
> <nathandbossart@gmail.com> wrote:
>> I was imagining putting the array in one big DSA allocation instead of
>> carting around a pointer for each tranche name.  (Sorry, I realize I am
>> hand-waving over some of the details.)
> 
> I understood it like this. Here is a sketch:
> 
> ```
> dsa_pointer p;
> 
> dsa = dsa_create(....)
> 
> p = dsa_allocate(dsa, LWLockTranchesInitialSize());
> tranche_names = (char **) dsa_get_address(dsa, p);
> tranche_names[0] = "my tranche";
> tranche_names[1] = "my tranche";
> ```
> 
> We will need to track the size and resize if needed.
> 
> Is this what you mean, from a high level?

Yes, that's roughly what I had in mind.  We might need to employ some
tricks to avoid a limit on tranche name length, but maybe that's not worth
the energy.

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Bertrand Drouvot
Дата:
Hi,

On Tue, Jul 15, 2025 at 12:59:04PM +0530, Rahila Syed wrote:
> Hi,
> 
> If a dshash table is used to store tranche names and IDs, where would the
> tranche name for this table
> be registered?

I guess it could be a new BuiltinTrancheId for this dsa but not sure what Nathan
and Sami have in mind.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:

> Hi,
>
> If a dshash table is used to store tranche names and IDs, where would the
> tranche name for this table
> be registered?

I guess it could be a new BuiltinTrancheId for this dsa but not sure what Nathan
and Sami have in mind.

Yes, it will be a BuiltinTrancheId for a shared memory that is allocated 
during postmaster for tracking tranches. The shared memory will then 
only be used by normal backends to register tranches. Any tranche 
registered during postmaster is inherited by the backends. 

Regards,

Sami 

Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> >> I was imagining putting the array in one big DSA allocation instead of
> >> carting around a pointer for each tranche name.  (Sorry, I realize I am
> >> hand-waving over some of the details.)
> >
> > I understood it like this. Here is a sketch:
> >
> > ```
> > dsa_pointer p;
> >
> > dsa = dsa_create(....)
> >
> > p = dsa_allocate(dsa, LWLockTranchesInitialSize());
> > tranche_names = (char **) dsa_get_address(dsa, p);
> > tranche_names[0] = "my tranche";
> > tranche_names[1] = "my tranche";
> > ```
> >
> > We will need to track the size and resize if needed.
> >
> > Is this what you mean, from a high level?
>
> Yes, that's roughly what I had in mind.  We might need to employ some
> tricks to avoid a limit on tranche name length, but maybe that's not worth
> the energy.

After spending some time looking at this, I don't think it's a good idea to
do as mentioned above, since it's unclear when to resize the dsa pointer.
It's better to have a single dsa pointer that holds a list of dsa
pointers, each tracking a tranche name. We can also track the size of this
list and grow it when needed. Essentially we should only be manipulating
a DSA area via dsa_pointers.

Attached is what I have in mind, if we want to avoid using dshash.
This patch initializes a DSA with create_in_place at startup ( similar to what
we do for pgstat, see pgstat_shmem.c) , and provides
a routine to append a tranche name to the list, as well as a routine to
look up a tranche name by index. The index is the tranche ID offset by
LWTRANCHE_FIRST_USER_DEFINED.

Additionally, this patch removes the step in [0] to call LWLockRegisterTranche.
Instead, LWLockNewTrancheId now takes a tranche_name and returns the
tranche ID. This new API is what we discussed earlier, and I agree with it,
as it simplifies the workflow. LWLockRegisterTranche is still kept, but it's
a private routine.

To support the above, LWLockTrancheNames is now a struct that tracks both a
local list of tranche names (populated during postmaster startup) and
pointers to shared memory, for tranche names registered after postmaster.

GetLWTrancheName now first checks BuiltinTrancheNames, then the local
list in LWLockTrancheNames, and finally the shared memory. This allows
tranche names registered after postmaster startup to be visible to all
backends.

Unlike the local list of tranche names, appending to and searching the
shared memory list requires an LWLock; in exclusive mode to append, and
shared mode to search.

--

Sami

[0] https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-ADDIN-LWLOCKS-AFTER-STARTUP

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
I've attached a rebased version of the patch.

On Mon, Jul 21, 2025 at 11:26:44PM -0500, Sami Imseih wrote:
> Unlike the local list of tranche names, appending to and searching the
> shared memory list requires an LWLock; in exclusive mode to append, and
> shared mode to search.

The thing that stood out the most to me is how much more expensive looking
up the lock name is.  At the risk of suggesting even more complexity, I'm
wondering if we should add some sort of per-backend cache to avoid the need
for locking and dsa_get_address() calls every time you need to look up a
lock name.

Also, I suspect that there might be some concerns about the API changes.
While it should be a very easy fix, that seems likely to break a lot of
extensions.  I don't know if it's possible to make this stuff backward
compatible, and I also don't know if we really want to, as that'll both
introduce more complexity and keep folks using the old API.  (That being
said, I just looked on GitHub and Debian Code Search and was surprised at
how few uses of LWLockNewTrancheId() and LWLockRegisterTranche() there
are...)

-- 
nathan

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Bertrand Drouvot
Дата:
Hi,

On Thu, Jul 31, 2025 at 04:24:38PM -0500, Nathan Bossart wrote:
> I've attached a rebased version of the patch.
> 
> On Mon, Jul 21, 2025 at 11:26:44PM -0500, Sami Imseih wrote:
> > Unlike the local list of tranche names, appending to and searching the
> > shared memory list requires an LWLock; in exclusive mode to append, and
> > shared mode to search.
> 
> The thing that stood out the most to me is how much more expensive looking
> up the lock name is.  At the risk of suggesting even more complexity, I'm
> wondering if we should add some sort of per-backend cache to avoid the need
> for locking and dsa_get_address() calls every time you need to look up a
> lock name.

Yeah, I 've the same concern that GetLWTrancheNameByID() might be too costly.
OTOH it's not really used in a hot path, "only" when displaying wait events.

But still, I agree that the extra overhead (and possible contention) is
concerning and that, for example,  a per-backend cache (with fallback to the
dsa if needed) could help.

> Also, I suspect that there might be some concerns about the API changes.
> While it should be a very easy fix, that seems likely to break a lot of
> extensions.
> I don't know if it's possible to make this stuff backward
> compatible, and I also don't know if we really want to, as that'll both
> introduce more complexity and keep folks using the old API.

Yeah, I'm not sure that would be worth the extra complexity and using the old API
would "keep" the issue we're trying to solve here.

I don't think we should be worried that much by the number of extensions impacted
but more about the change complexity and it looks pretty simple.

So, I don't think we should worry that much in that regard.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> > > Unlike the local list of tranche names, appending to and searching the
> > > shared memory list requires an LWLock; in exclusive mode to append, and
> > > shared mode to search.
> >
> > The thing that stood out the most to me is how much more expensive looking
> > up the lock name is.  At the risk of suggesting even more complexity, I'm
> > wondering if we should add some sort of per-backend cache to avoid the need
> > for locking and dsa_get_address() calls every time you need to look up a
> > lock name.
>
> Yeah, I 've the same concern that GetLWTrancheNameByID() might be too costly.
> OTOH it's not really used in a hot path, "only" when displaying wait events.

LWLockTrancheGetName also gets called, via T_NAME, when
trace_lwlock is enabled. This is a developer option that requires
LOCK_DEBUG to be set at compile time.
So, that path may become even slower than it currently
is under high concurrency.

Also, if dtrace is enabled and the specific LWLock probe is used.
```
if (TRACE_POSTGRESQL_LWLOCK_WAIT_START_ENABLED())
TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode);
```
But both of the above cases are not used in normal production activity.

For the case of dtrace, b94409a02f612 added the _ENABLED macros to ensure that
GetLWTrancheName ( via T_NAME) is only called when necessary.

"
    If dtrace is compiled in but disabled, the lwlock dtrace probes still
    evaluate their arguments.  Since PostgreSQL 13, T_NAME(lock) does
    nontrivial work, so it should be avoided if not needed.  To fix, make
    these calls conditional on the *_ENABLED() macro corresponding to each
    probe.
"

The normal case is when pg_stat_get_activity calls
pgstat_get_wait_event to retrieve the wait event via
GetLWLockIdentifier. However, as mentioned, this is not on the hot path.

I think we could add a local backend copy that stays up to date with the
DSA. One idea would be to use an atomic counter to track the number of
entries in the DSA and compare it with a local backend counter whenever the
tranche name lookup occurs. If the atomic counter is higher (since we
don't have deletions),
we can update the local copy. Updating the local table should be a
rare occurrence, but it would
require an additional atomic fetch every time the name lookup occurs, in all the
above code paths.

Perhaps there's a better approach?

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:

> Also, I suspect that there might be some concerns about the API changes.
> While it should be a very easy fix, that seems likely to break a lot of
> extensions.
> I don't know if it's possible to make this stuff backward
> compatible, and I also don't know if we really want to, as that'll both
> introduce more complexity and keep folks using the old API.

Yeah, I'm not sure that would be worth the extra complexity and using the old API
would "keep" the issue we're trying to solve here.

I don't think we should be worried that much by the number of extensions impacted
but more about the change complexity and it looks pretty simple.

So, I don't think we should worry that much in that regard.

I agree. I don’t think the API changes are a big 
deal, especially when they reduce the number of
of steps. 

-- 
Sami 

Re: Improve LWLock tranche name visibility across backends

От
Bertrand Drouvot
Дата:
Hi,

On Fri, Aug 01, 2025 at 01:15:24PM -0500, Sami Imseih wrote:
> I think we could add a local backend copy that stays up to date with the
> DSA. One idea would be to use an atomic counter to track the number of
> entries in the DSA and compare it with a local backend counter whenever the
> tranche name lookup occurs. If the atomic counter is higher (since we
> don't have deletions),
> we can update the local copy. Updating the local table should be a
> rare occurrence, but it would
> require an additional atomic fetch every time the name lookup occurs, in all the
> above code paths.
> 
> Perhaps there's a better approach?

I was thinking to switch to the DSA (and update local copy) when a name is
not found in the local copy. That way there is no need to maintain a counter and
the DSA overhead should be rare enough.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> > I think we could add a local backend copy that stays up to date with the
> > DSA. One idea would be to use an atomic counter to track the number of
> > entries in the DSA and compare it with a local backend counter whenever the
> > tranche name lookup occurs. If the atomic counter is higher (since we
> > don't have deletions),
> > we can update the local copy. Updating the local table should be a
> > rare occurrence, but it would
> > require an additional atomic fetch every time the name lookup occurs, in all the
> > above code paths.
> >
> > Perhaps there's a better approach?
>
> I was thinking to switch to the DSA (and update local copy) when a name is
> not found in the local copy. That way there is no need to maintain a counter and
> the DSA overhead should be rare enough.
>
> Regards,

That should work as well. good idea.

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> > > I think we could add a local backend copy that stays up to date with the
> > > DSA. One idea would be to use an atomic counter to track the number of
> > > entries in the DSA and compare it with a local backend counter whenever the
> > > tranche name lookup occurs. If the atomic counter is higher (since we
> > > don't have deletions),
> > > we can update the local copy. Updating the local table should be a
> > > rare occurrence, but it would
> > > require an additional atomic fetch every time the name lookup occurs, in all the
> > > above code paths.
> > >
> > > Perhaps there's a better approach?
> >
> > I was thinking to switch to the DSA (and update local copy) when a name is
> > not found in the local copy. That way there is no need to maintain a counter and
> > the DSA overhead should be rare enough.
> >
> > Regards,
>
> That should work as well. good idea.

With a local hash table, I don't think it's necessary to introduce new
code for managing
a DSA based list of tranche names as is done in v3. We can go back to
storing the shared
trance names in dshash.

What do you think?

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Mon, Aug 04, 2025 at 11:32:19AM -0500, Sami Imseih wrote:
> With a local hash table, I don't think it's necessary to introduce new
> code for managing
> a DSA based list of tranche names as is done in v3. We can go back to
> storing the shared
> trance names in dshash.
> 
> What do you think?

My first thought is that a per-backend hash table seems too
expensive/complicated for this.  Couldn't it just be an array like we have
now?

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> On Mon, Aug 04, 2025 at 11:32:19AM -0500, Sami Imseih wrote:
> > With a local hash table, I don't think it's necessary to introduce new
> > code for managing
> > a DSA based list of tranche names as is done in v3. We can go back to
> > storing the shared
> > trance names in dshash.
> >
> > What do you think?
>
> My first thought is that a per-backend hash table seems too
> expensive/complicated for this.  Couldn't it just be an array like we have
> now?

We can, but I was considering simplicity of implementation, and using a
local hash table is slightly simpler.

That said, since we're dealing with an append-only data structure, a hash
table is probably more than we need. All we need is index-based lookup,
so I’ll go with the local array to mirror the shared ( dsa ) array.

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> > > With a local hash table, I don't think it's necessary to introduce new
> > > code for managing
> > > a DSA based list of tranche names as is done in v3. We can go back to
> > > storing the shared
> > > trance names in dshash.
> > >
> > > What do you think?
> >
> > My first thought is that a per-backend hash table seems too
> > expensive/complicated for this.  Couldn't it just be an array like we have
> > now?
>
> We can, but I was considering simplicity of implementation, and using a
> local hash table is slightly simpler.
>
> That said, since we're dealing with an append-only data structure, a hash
> table is probably more than we need. All we need is index-based lookup,
> so I’ll go with the local array to mirror the shared ( dsa ) array.

Here is v4. We can keep using the local array as the backend local array.
It will already include the tranches registered during postmaster startup,
and it will be updated during tranche name lookup from the dsa based
array. I added a new routine UpdateLocalTrancheName which can be used
in both LWLockRegisterTranche and GetLWTrancheName to append to the
local tranche during both postmaster startup and afterwards.

Also, initially I thought we can get rid of returning the "extension" wait event
name, but it's still possible for an "extension" tranche name to be returned
if LWLockInitialize is called with a tranche_id that was never registered
with LWLockNewTrancheId. We don't prevent that, and I can't see how
we can without changing the LWLockInitialize API, but it is something
that should probably be improved. For now, I added a mention of this
behavior in the documentation. This is existing behavior, so I could
also separate this doc change into a separate patch if it makes more
sense to do so.

--
Sami

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Bertrand Drouvot
Дата:
Hi,

On Mon, Aug 04, 2025 at 10:47:45PM -0500, Sami Imseih wrote:
> > > > With a local hash table, I don't think it's necessary to introduce new
> > > > code for managing
> > > > a DSA based list of tranche names as is done in v3. We can go back to
> > > > storing the shared
> > > > trance names in dshash.
> > > >
> > > > What do you think?
> > >
> > > My first thought is that a per-backend hash table seems too
> > > expensive/complicated for this.  Couldn't it just be an array like we have
> > > now?
> >
> > We can, but I was considering simplicity of implementation, and using a
> > local hash table is slightly simpler.
> >
> > That said, since we're dealing with an append-only data structure, a hash
> > table is probably more than we need. All we need is index-based lookup,
> > so I’ll go with the local array to mirror the shared ( dsa ) array.

+1 for the local array.

> Here is v4.

Thanks for the patch update!

> We can keep using the local array as the backend local array.
> It will already include the tranches registered during postmaster startup,
> and it will be updated during tranche name lookup from the dsa based
> array.

> I added a new routine UpdateLocalTrancheName which can be used
> in both LWLockRegisterTranche and GetLWTrancheName to append to the
> local tranche during both postmaster startup and afterwards.

I did not look at the code in details, just played a bit with it and found
some issues.

Issue 1 --

If I register enough tranches to go to:

+               /* Resize if needed */
+               if (LWLockTrancheNames.shmem->count >= LWLockTrancheNames.shmem->allocated)
+               {
+                       newalloc = pg_nextpower2_32(Max(LWLOCK_TRANCHE_NAMES_INIT_SIZE,
LWLockTrancheNames.shmem->allocated+ 1));
 
+                       new_list_ptr = dsa_allocate(LWLockTrancheNames.dsa, newalloc * sizeof(dsa_pointer));
+                       new_name_ptrs = dsa_get_address(LWLockTrancheNames.dsa, new_list_ptr);
+                       memcpy(new_name_ptrs, current_ptrs, LWLockTrancheNames.shmem->allocated *
sizeof(dsa_pointer));
+                       dsa_free(LWLockTrancheNames.dsa, *current_ptrs);

then I get:

ERROR:  dsa_area could not attach to a segment that has been freed

I think we should

"
dsa_free(LWLockTrancheNames.dsa, LWLockTrancheNames.shmem->list_ptr)
"

instead.

Issue 2 --

If an extension calls RequestNamedLWLockTranche() it will register the same
tranche twice:

(gdb) p LWLockTrancheNames.local[0]
$1 = 0x7acf5a40c420 "pg_playmem"
(gdb) p LWLockTrancheNames.local[97]
$2 = 0x7acf5a40c420 "pg_playmem"

First (local[0]) during LWLockNewTrancheId() during InitializeLWLocks()
Second (local[97]) during LWLockRegisterTranche() during CreateLWLocks()

Maybe we should put this back?

-       /* This should only be called for user-defined tranches. */
-       if (tranche_id < LWTRANCHE_FIRST_USER_DEFINED)
-               return;

-       /* Convert to array index. */
-       tranche_id -= LWTRANCHE_FIRST_USER_DEFINED;

and remove:

+       tranche_index = tranche_id - LWTRANCHE_FIRST_USER_DEFINED;

from LWLockNewTrancheId()?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
Thanks for reviewing!

> Issue 1 --
>
> If I register enough tranches to go to:
>
> +               /* Resize if needed */
> +               if (LWLockTrancheNames.shmem->count >= LWLockTrancheNames.shmem->allocated)
> +               {
> +                       newalloc = pg_nextpower2_32(Max(LWLOCK_TRANCHE_NAMES_INIT_SIZE,
LWLockTrancheNames.shmem->allocated+ 1));
 
> +                       new_list_ptr = dsa_allocate(LWLockTrancheNames.dsa, newalloc * sizeof(dsa_pointer));
> +                       new_name_ptrs = dsa_get_address(LWLockTrancheNames.dsa, new_list_ptr);
> +                       memcpy(new_name_ptrs, current_ptrs, LWLockTrancheNames.shmem->allocated *
sizeof(dsa_pointer));
> +                       dsa_free(LWLockTrancheNames.dsa, *current_ptrs);
>
> then I get:
>
> ERROR:  dsa_area could not attach to a segment that has been freed

Will investigate this one and correct in the next rev.

> Issue 2 --
>
> If an extension calls RequestNamedLWLockTranche() it will register the same
> tranche twice:
>
> (gdb) p LWLockTrancheNames.local[0]
> $1 = 0x7acf5a40c420 "pg_playmem"
> (gdb) p LWLockTrancheNames.local[97]
> $2 = 0x7acf5a40c420 "pg_playmem"

Thanks for catching. This one is clear, there is an extra call to
register inside RequestNamedLWLockTranche.
I'll fix this in the next rev.


--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> I'll fix this in the next rev.

v5 fixes the above 2 issues found above.

For the issue that was throwing ".... segment that has been freed",
indeed we should be freeing LWLockTrancheNames.shmem->list_ptr,
list_ptr is a dsa_pointer that stores an array of dsa_pointers, which
then made me realize that I was not freeing the actual dsa_pointers
holding the tranche names. I fixed that as well.

I also made some improvements to the code that copies the old
list to the new list and fixed the lookup in
GetLWTrancheName.

--
Sami

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Bertrand Drouvot
Дата:
Hi,

On Tue, Aug 05, 2025 at 11:24:04PM -0500, Sami Imseih wrote:
> > I'll fix this in the next rev.
> 
> v5 fixes the above 2 issues found above.

Thanks!

> For the issue that was throwing ".... segment that has been freed",
> indeed we should be freeing LWLockTrancheNames.shmem->list_ptr,

Yeah.

> list_ptr is a dsa_pointer that stores an array of dsa_pointers, which
> then made me realize that I was not freeing the actual dsa_pointers
> holding the tranche names. I fixed that as well.
> 
> I also made some improvements to the code that copies the old
> list to the new list and fixed the lookup in
> GetLWTrancheName.
> 

Still doing a bit of testing without looking closely to the code.

Issue 1 --

If I:

LWLockInitialize(&lck, LWLockNewTrancheId("BDT_Lck"));
LWLockAcquire(&lck, LW_EXCLUSIVE);
LWLockAcquire(&lck, LW_EXCLUSIVE);

So that this process is locked on the LWLock BDT_Lck and if I query pg_stat_activity
in another backend, then I get:

postgres=# select wait_event from pg_stat_activity where wait_event_type = 'LWLock';
 wait_event
------------
 BDT_Lck
(1 row)

which is what we expect.

But if I look at LWLockTrancheNames.local (for the process that queried pg_stat_activity):

(gdb) p LWLockTrancheNames.local[0]
$1 = 0x0
(gdb) p LWLockTrancheNames.local[1]
$2 = 0x7b759a81b038 "BDT_Lck"

It looks like there is an indexing issue, as we should start at index 0.

Issue 2 / question --

If I call LWLockNewTrancheId("BDT_play2") multiple times to ensure that 

trancheId >= LWLockTrancheNames.allocated and then ensure that a backend is
locked doing:

"
LWLockInitialize(&lck, LWLockNewTrancheId("BDT_Lck"));
LWLockAcquire(&lck, LW_EXCLUSIVE);
LWLockAcquire(&lck, LW_EXCLUSIVE);
"

Then if, in another backend, I call GetLWTrancheName by querying pg_stat_activity
then I see "BDT_Lck" being reported as wait event name (which is good).

The thing that worries me a bit is that the local cache is populated only for
"BDT_Lck", but not for all the other "BDT_play2".

(gdb) p LWLockTrancheNames.local[2]
$9 = 0x0
(gdb) p LWLockTrancheNames.local[3]
$10 = 0x0
(gdb) p LWLockTrancheNames.local[12]
$11 = 0x0
(gdb) p LWLockTrancheNames.local[13]
$12 = 0x780593a1b138 "BDT_Lck"

When we need to populate the local cache, would it be better to populate
it with the whole dsa content instead of just the current missing ID as done
in GetLWTrancheName():

+               if (tranche_name != NULL)
+                       SetLocalTrancheName(trancheId, tranche_name);?

I think that would make more sense, as that would avoid to take the
LWLockTrancheNames.shmem->lock in GetLWTrancheNameByID() multiple times
should the wait event change, thoughts?

Issue 3 --

If I call LWLockNewTrancheId("BDT_play2") only one time to ensure that
trancheId < LWLockTrancheNames.allocated and then ensure that a backend is
locked doing:

"
LWLockInitialize(&lck, LWLockNewTrancheId("BDT_Lck"));
LWLockAcquire(&lck, LW_EXCLUSIVE);
LWLockAcquire(&lck, LW_EXCLUSIVE);
"

Then if, in another backend, I call GetLWTrancheName by querying pg_stat_activity
then I see "extension" being reported as wait event name (which is not good).

The issue is that in GetLWTrancheName(), the patch does:

+       if (trancheId >= LWLockTrancheNames.allocated)
+       {
+               tranche_name = GetLWTrancheNameByID(trancheId);
+               if (tranche_name != NULL)
+                       SetLocalTrancheName(trancheId, tranche_name);
+       }
+       else
+               tranche_name = LWLockTrancheNames.local[trancheId];

Then does not find the lock in the local cache and then returns "extension".

I think that in that case we should fall back to querying the DSA (and
populate the local cache), thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Improve LWLock tranche name visibility across backends

От
Rahila Syed
Дата:
Hi,

I've begun reviewing this patch and have a few questions listed below:

1.   + if (i < LWLockTrancheNames.shmem->allocated && DsaPointerIsValid(old_ptrs[i]))
           
Should an assert be used for the second condition instead?
Since for i < LWLockTrancheNames.shmem->allocated, the dsa pointer is expected to be valid.  

2.                                copied_ptr = dsa_allocate(LWLockTrancheNames.dsa, len);
+
+                               copied_addr = dsa_get_address(LWLockTrancheNames.dsa, copied_ptr);
+                               memcpy(copied_addr, old_name, len);
+
+                               new_ptrs[i] = copied_ptr;
+
+                               /* free old tranche names */
+                               dsa_free(LWLockTrancheNames.dsa, old_ptrs[i]);

Why is it necessary to allocate a new dsa_pointer for tranche names that are the same size and then
free the old one?
Is there a reason we can't just assign new_ptrs[i] = old_ptrs[i]?

3. 
>Additionally, while users should not pass arbitrary tranche IDs (that is,
>IDs not created via LWLockNewTrancheId) to LWLockInitialize, nothing
>technically prevents them from doing so. Therefore, we must continue to
>handle such cases gracefully by returning a default "extension" tranche name.

Would it be possible to update LWLockInitialize so that it checks if tranche_id is
already registered in the dsa, and if not, registers it during the LWLockInitialize() process?  

Thank you,
Rahila Syed                                 

Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
Thanks for testing!

> which is what we expect.
>
> But if I look at LWLockTrancheNames.local (for the process that queried pg_stat_activity):
>
> (gdb) p LWLockTrancheNames.local[0]
> $1 = 0x0
> (gdb) p LWLockTrancheNames.local[1]
> $2 = 0x7b759a81b038 "BDT_Lck"
>
> It looks like there is an indexing issue, as we should start at index 0.

>
> Issue 2 / question --
>
> If I call LWLockNewTrancheId("BDT_play2") multiple times to ensure that
>
>
> Then if, in another backend, I call GetLWTrancheName by querying pg_stat_activity
> then I see "BDT_Lck" being reported as wait event name (which is good).
>
> The thing that worries me a bit is that the local cache is populated only for
> "BDT_Lck", but not for all the other "BDT_play2".

Issue #1 and issue #2 are related. In both cases you have a registered wait
event that only lives in shared memory, but was never required to be
accessed by pg_stat_activity, so it was never cached locally.


> When we need to populate the local cache, would it be better to populate
> it with the whole dsa content instead of just the current missing ID as done
> in GetLWTrancheName():

I do agree that we should just sync the local cache totally whenever we
can't find the tranche name in the local cache. Will fix in the next rev.

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
Thanks for testing!

> Why is it necessary to allocate a new dsa_pointer for tranche names that are the same size and then
> free the old one?
> Is there a reason we can't just assign new_ptrs[i] = old_ptrs[i]?

Fair point. I will updated in the next rev. We don't need to free the'
existing tranche name pointers, only the list.

> Would it be possible to update LWLockInitialize so that it checks if tranche_id is
> already registered in the dsa, and if not, registers it during the LWLockInitialize() process?

We could. I do think this will need a separate discussion as a follow-up to this
thread.

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
From this discussion, and the fact that the tranche name could come from
either local or shared memory, I think we should have tests.
So, I am working on adding tests using INJECTION_POINTS.

Some of the tests I have in mind now are:

1/ Does the shared memory grow correctly,
2/ Is the tranche name being returned the correct one
3/ Is the local cache being updated correctly.

I will have an updated patch with the new test
module early next week.

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
Hi,

Attached v6 which addresses the feedback from the last review.

1/ Rahila raised a point about the necessity to allocate new dsa pointers
for tranche names when copying ( during resize). That is correct. We can simply
memcpy those dsa pointers at the time of copy. All we need to
do is free the dsa pointer that tracks the list.

2/ I also implemented a full sync of the local cache when necessary.
That is the tranche name is not found locally and the tranche index is
not higher than the max used index. To do that we do have to track the
highest index used as the allocated counter is not sufficient to do this.
Allocation will grow geometrically to avoid resize operations.

3/ I also added tests using INJECTION_POINTS. Although the same tests
could be done with DEBUG, I felt INJECTION_POINTS
are better to use for this purpose to avoid unnecessary logging.
I created a test module called test_lwlock_tranches which has
a perl test because we do need to set shared_preload_libraries.

I did have to define the following in wait_classes.h:

+#define WAIT_EVENT_CLASS_MASK  0xFF000000
+#define WAIT_EVENT_ID_MASK             0x0000FFFF

This is because the extension calls
GetLWLockIdentifier ( to mimic pg_stat_activity) and
needs to calculate a classId.

--
Sami

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
I haven't followed the latest discussion, but I took a look at the patch.

+      It is possible to use a <literal>tranche_id</literal> that was not retrieved
+      using <function>LWLockNewTrancheId</function>, but this is not recommended.
+      The ID may clash with an already registered tranche name, or the specified
+      name may not be found. In such cases, looking up the name will return a generic
+      "extension" tranche name.

Is there any reason to continue allowing this?  For example, maybe we could
ERROR if LWLockInitialize()/GetLWTrancheName() are given a tranche_id
greater than the number allocated.  I guess I'm not following why we should
gracefully handle these kinds of coding errors, especially when they result
in unhelpful behavior like an "extension" tranche.

+#else
+elog(ERROR, "injection points not supported");
+#endif

This causes compilation to fail when injection points are not enabled.

I haven't combed through the patch character-by-character, but upon a
read-through, the general shape looks reasonable to me.  As a general note,
I'd suggest adding more commentary throughout and finding opportunities to
simplify and/or clean up the code as much as possible.

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> I haven't followed the latest discussion, but I took a look at the patch.
>
> +      It is possible to use a <literal>tranche_id</literal> that was not retrieved
> +      using <function>LWLockNewTrancheId</function>, but this is not recommended.
> +      The ID may clash with an already registered tranche name, or the specified
> +      name may not be found. In such cases, looking up the name will return a generic
> +      "extension" tranche name.
>
> Is there any reason to continue allowing this?  For example, maybe we could
> ERROR if LWLockInitialize()/GetLWTrancheName() are given a tranche_id
> greater than the number allocated.  I guess I'm not following why we should
> gracefully handle these kinds of coding errors, especially when they result
> in unhelpful behavior like an "extension" tranche.

We could definitely error out LWLockInitialize in this case. It may
break things out
there, so that is why I have been hesitant.

I don’t think we can use allocated in this case, since that value refers to
pre-allocated slots. Instead, we will need to do a lookup, and if no
tranche_name is found, then we should error out.

Essentially, we can call GetLWTrancheName within LWLockInitialize, but
GetLWTrancheName can no longer have a fallback value such as "extension".
It should error out instead. I believe that is what you mean. correct?

One thought I had was to change the signature of LWLockInitialize to take a
tranche name and no longer should LWLockNewTrancheId be no longer be global.
That will slightly complicate things with built-in tranches a bit (but
is doable).

On the other hand, an error seems like a reasonable approach.

> Attached v6 which addresses the feedback from the last review.

I spoke offline with Bertrand and discussed the synchronization of
the local memory from shared memory. After that discussion it is clear
we don't need to track the highest used index in shared memory. Because
the tranche_id comes from a shared counter, it is not possible that
the same tranche_id can be used twice, so we will not have overlap.
That means, when we sync, we just need to know the highest used
index in the local memory ( since that could be populated during
postmaster startup and inherited via fork ) and start syncing local
memory from that point. We can also cut off the sync once we encounter
an invalid dsa pointer, since there can't be any tranche names in shared
memory after that point. I did change SyncLWLockTrancheNames to
accept the tranche_id that triggered the sync, so we can make sure it
gets added to local memory, either as part of the sync or added with
the default "extension" tranche name.

I also corrected the INJECTION_POINT tests. I was missing a skip
if INJECTION_POINT is not enabled and removed

+elog(ERROR, "injection points not supported");

mentioned by Nathan earlier. To simplify the review, I also split the
test patch. It is not yet clear if there is consensus to include tests
( I think we should ), but having the tests will help the review.

More commentary was added to the code for the point Nathan
also raised.

See v7; which could be simplified further depending on the handling
of LWLockInitialize as mentioned at the top of this message, since we
don't need to account for out of range tranche IDs.


--
Sami

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Bertrand Drouvot
Дата:
Hi,

On Tue, Aug 12, 2025 at 04:16:48PM -0500, Sami Imseih wrote:

> I spoke offline with Bertrand and discussed the synchronization of
> the local memory from shared memory. After that discussion it is clear
> we don't need to track the highest used index in shared memory. Because
> the tranche_id comes from a shared counter, it is not possible that
> the same tranche_id can be used twice, so we will not have overlap.
> That means, when we sync, we just need to know the highest used
> index in the local memory ( since that could be populated during
> postmaster startup and inherited via fork ) and start syncing local
> memory from that point.

Thanks for the updated version!

I think that we can also get rid of max_used_index.

Indeed, I can see that it's updated when SetLocalTrancheName() is called.

But I don't think that's needed as that would be perfectly fine to restart always
from the "initial max_used_index" because that should be rare enough and the number
of entries that we scan is not large. That would mean: entry not found? resync
the entire local cache (starting again from the non updated max_used_index).

So, if we agree that we don't need to update max_used_index in
SetLocalTrancheName(), then during SyncLWLockTrancheNames() we could rely only on
DsaPointerIsValid(pointers[i]).

Indeed, those DSA pointers are only valid for dynamically registered tranches.

For example with max_used_index = 5, what you would find is DsaPointerIsValid(pointers[i])
invalid from 0 to 4 and starts to be valid at [5] (if a tranche has been dynamically
added). 5 is exactly:

"
int                     next_index = LWLockTrancheNames.max_used_index + 1;
"

So we could do something like:

int i = 0;
while (i < LWLockTrancheNames.shmem->allocated && 
           !DsaPointerIsValid(shared_ptrs[i]))
{
        i++; 
}

Now this "i" acts as the "next_index" in v7 and now we can start iterating
from it and sync until it's invalid again.    

That way, we get rid of max_used_index and I think that this part is simpler and
easier to understand.

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> So we could do something like:
>
> int i = 0;
> while (i < LWLockTrancheNames.shmem->allocated &&
>            !DsaPointerIsValid(shared_ptrs[i]))
> {
>         i++;
> }

I found a different approach without tracking an additional counter.
Then sync stops when an invalid pointer is found after at least
one valid entry in shared memory, so we don't stop if there was a
lwlock registered in local memory during postmaster startup, and
this tranche will never actually be added to shared memory at
the same position.

>> Is there any reason to continue allowing this?  For example, maybe we could
>> ERROR if LWLockInitialize()/GetLWTrancheName() are given a tranche_id
>> greater than the number allocated.  I guess I'm not following why we should
>> gracefully handle these kinds of coding errors, especially when they result
>> in unhelpful behavior like an "extension" tranche.

> Essentially, we can call GetLWTrancheName within LWLockInitialize, but
> GetLWTrancheName can no longer have a fallback value such as "extension".
> It should error out instead. I believe that is what you mean. correct?

I also added this, so now only LWLocks correctly registered can be looked
up.

I also reconsidered the use of INJECTION_POINT for this purpose.
I felt it added unnecessary code where we only need a DEBUG ( I used DEBUG3)
statement at a few key places.

See v7.

--
Sami

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
>> See v7.

I fixed an earlier issue with Windows, which was due to not initializing the
shared memory inside

```
#ifdef EXEC_BACKEND
extern void AttachSharedMemoryStructs(void);
#endif
```

But then I found another one after. LWLockTrancheNames gets forked on Linux,
but of course that will not happen on Windows without extra work. So, the
tests failed because the requested tranches (via RequestNamedLWLockTranche)
were not being found when looked up.

But of course, we already have provisions to copy these tranches for
Windows ( see inside launch_backend.c ).

```
int NamedLWLockTrancheRequests;
NamedLWLockTranche *NamedLWLockTrancheArray;
LWLockPadded *MainLWLockArray;
```

So, this means that for the local memory sync, we can actually just copy the
requested tranches (via RequestNamedLWLockTranche) and then the shared memory
tranches. This is much better, as it syncs using both possible sources
for tranche names.

```
int i = 0;

while (i < NamedLWLockTrancheRequests)
{
NamedLWLockTranche *tranche;

tranche = &NamedLWLockTrancheArray[i];

SetLocalTrancheName(i, tranche->trancheName);

i++;
}

/* Acquire shared lock on tranche names shared memory */
LWLockAcquire(&LWLockTrancheNames.shmem->lock, LW_SHARED);

while (i < LWLockTrancheNames.shmem->allocated)
{
```

So, now these tests pass locally on Windows.

Attached is v8.

--

Sami

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Bertrand Drouvot
Дата:
Hi,

On Sat, Aug 16, 2025 at 10:18:05PM -0500, Sami Imseih wrote:
> Attached is v8.

Thanks for the new version!

A few random comments about 0001:

1 ===

+}                      LWLockTrancheNamesShmem;

+}                      LWLockTrancheNamesStruct;

Add LWLockTrancheNamesShmem and LWLockTrancheNamesStruct to
src/tools/pgindent/typedefs.list?

2 ===

Maybe a comment before the above structs definitions to explain what they are
used for?

3 ===

+static void
+SetSharedTrancheName(int tranche_index, const char *tranche_name)
 {
-       /* This should only be called for user-defined tranches. */
-       if (tranche_id < LWTRANCHE_FIRST_USER_DEFINED)
-               return;
+       dsa_pointer *name_ptrs;
+       dsa_pointer str_ptr;
+       char       *str_addr;
+       int                     len;
+       int                     current_allocated;
+       int                     new_alloc = 0;

-       /* Convert to array index. */
-       tranche_id -= LWTRANCHE_FIRST_USER_DEFINED;
+       LWLockAcquire(&LWLockTrancheNames.shmem->lock, LW_EXCLUSIVE);

Add Assert(IsUnderPostmaster || !IsPostmasterEnvironment);?

4 ===

+static void
+SetLocalTrancheName(int tranche_index, const char *tranche_name)
+{
+       int                     newalloc;
+
+       Assert(tranche_name);

should we add some assert on IsUnderPostmaster and/or IsPostmasterEnvironment too?

5 ===

+       while (i < NamedLWLockTrancheRequests)
+       {
+               NamedLWLockTranche *tranche;
+
+               tranche = &NamedLWLockTrancheArray[i];
+
+               SetLocalTrancheName(i, tranche->trancheName);
+
+               i++;
+       }

Maybe add a comment that those are the ones allocated by the postmaster during
startup?

Also, as this will be done during each sync and those tranches don't change,
so one could think there is room for improvement. Maybe add a comment that's
probably not worth optimizing (due to the fact that NamedLWLockTrancheRequests
should be small enough and the sync rare)?

6 ===

There is this existing comment:

/*
 * NamedLWLockTrancheRequests is both the valid length of the request array,
 * and the length of the shared-memory NamedLWLockTrancheArray later on.
 * This variable and NamedLWLockTrancheArray are non-static so that
 * postmaster.c can copy them to child processes in EXEC_BACKEND builds.
 */
int         NamedLWLockTrancheRequests = 0;

Maybe add something like? "Additional dynamic tranche names beyond this count
are stored in a DSA".

7 ===

+               old_ptrs = dsa_get_address(LWLockTrancheNames.dsa,
+                                                                  LWLockTrancheNames.shmem->list_ptr);
+
+               name_ptrs = dsa_get_address(LWLockTrancheNames.dsa, new_list);
+
+               memset(name_ptrs, InvalidDsaPointer, new_alloc);
+               memcpy(name_ptrs, old_ptrs, sizeof(dsa_pointer) * current_allocated);
+
+               dsa_free(LWLockTrancheNames.dsa, LWLockTrancheNames.shmem->list_ptr);

maybe use local variable for LWLockTrancheNames.shmem->list_ptr (and 
LWLockTrancheNames.dsa)?

8 ===

+       needs_sync = (trancheId >= LWLockTrancheNames.allocated ||
+                                 LWLockTrancheNames.local[trancheId] == NULL)
+               && (IsUnderPostmaster || !IsPostmasterEnvironment);

formating does not look good.

9 ===

-       if (trancheId >= LWLockTrancheNamesAllocated ||
-               LWLockTrancheNames[trancheId] == NULL)
-               return "extension";
+       if (trancheId < LWLockTrancheNames.allocated)
+               tranche_name = LWLockTrancheNames.local[trancheId];

-       return LWLockTrancheNames[trancheId];
+       if (!tranche_name)
+               elog(ERROR, "tranche ID %d is not registered", tranche_id_saved);

We now error out instead of returning "extension". That looks ok given the 
up-thread discussion but then the commit message needs some updates as it
states:

"
Additionally, while users should not pass arbitrary tranche IDs (that is,
IDs not created via LWLockNewTrancheId) to LWLockInitialize, nothing
technically prevents them from doing so. Therefore, we must continue to
handle such cases gracefully by returning a default "extension" tranche name.
"

10 ===

+LWLockTrancheNamesInitSize()
+{
+       Size            sz;
+
+       /*
+        * This value is used by other facilities, see pgstat_shmem.c, so it's
+        * good enough.
+        */
+       sz = 256 * 1024;

use DSA_MIN_SEGMENT_SIZE?

11 ===

+       if (!IsUnderPostmaster)
+       {
+               dsa_area   *dsa;
+               LWLockTrancheNamesShmem *ctl = LWLockTrancheNames.shmem;
+               char       *p = (char *) ctl;
+
+               /* Calculate layout within the shared memory region */
+               p += MAXALIGN(sizeof(LWLockTrancheNamesShmem));
+               ctl->raw_dsa_area = p;
+               p += MAXALIGN(LWLockTrancheNamesInitSize());

LWLockTrancheNamesInitSize() already does MAXALIGN() so the last one
above is not needed. But the last p advance seems not necessary as not used
after. I think the same is true in StatsShmemInit() (see [1]).

12 ===

+LWLockTrancheNamesBEInit(void)
+{
+       /* already attached, nothing to do */
+       if (LWLockTrancheNames.dsa)
+               return;
+
+       LWLockTrancheNamesAttach();
+
+       /* Set up a process-exit hook to clean up */

s/already/Already/?

For 0002, a quick review:

13 ===

+       if (log)
+               elog(DEBUG3, "current_allocated: %d in tranche names shared memory", new_alloc);

Instead of this, could we elog distinct messages where the patch currently sets
log = true?


14 ===

-                 xid_wraparound
+                 xid_wraparound \
+                 test_lwlock_tranches

breaks the ordering.

15 ===

 subdir('worker_spi')
 subdir('xid_wraparound')
+subdir('test_lwlock_tranches')

Same, breaks the ordering.

16 ===

+my $ENABLE_LOG_WAIT = 1;
+
+my $isession;
+my $log_location;
+
+# Helper function: wait for one or more logs if $ENABLE_LOG_WAIT is true
+sub maybe_wait_for_log {
+    my ($node, $logs, $log_loc) = @_;
+    return $log_loc unless $ENABLE_LOG_WAIT;

is ENABLE_LOG_WAIT needed as it does not change?


[1]: https://www.postgresql.org/message-id/aKLsu2sdpnyeuSSc%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> 1 ===
>
> +}                      LWLockTrancheNamesShmem;
>
> +}                      LWLockTrancheNamesStruct;
>
> Add LWLockTrancheNamesShmem and LWLockTrancheNamesStruct to
> src/tools/pgindent/typedefs.list?

done.

> 2 ===
>
> Maybe a comment before the above structs definitions to explain what they are
> used for?

done.

> 3 ===
>
> +static void
> +SetSharedTrancheName(int tranche_index, const char *tranche_name)
>  {
> -       /* This should only be called for user-defined tranches. */
> -       if (tranche_id < LWTRANCHE_FIRST_USER_DEFINED)
> -               return;
> +       dsa_pointer *name_ptrs;
> +       dsa_pointer str_ptr;
> +       char       *str_addr;
> +       int                     len;
> +       int                     current_allocated;
> +       int                     new_alloc = 0;
>
> -       /* Convert to array index. */
> -       tranche_id -= LWTRANCHE_FIRST_USER_DEFINED;
> +       LWLockAcquire(&LWLockTrancheNames.shmem->lock, LW_EXCLUSIVE);
>
> Add Assert(IsUnderPostmaster || !IsPostmasterEnvironment);?

done. This is required here since we should not be dealling with DSA
during postmaster or single-user.

> 4 ===
>
> +static void
> +SetLocalTrancheName(int tranche_index, const char *tranche_name)
> +{
> +       int                     newalloc;
> +
> +       Assert(tranche_name);
>
> should we add some assert on IsUnderPostmaster and/or IsPostmasterEnvironment too?

No, It's not needed here. SetLocalTrancheName is called during startup
and by normal
backends.

> 5 ===
>
> +       while (i < NamedLWLockTrancheRequests)
> +       {
> +               NamedLWLockTranche *tranche;
> +
> +               tranche = &NamedLWLockTrancheArray[i];
> +
> +               SetLocalTrancheName(i, tranche->trancheName);
> +
> +               i++;
> +       }
>
> Maybe add a comment that those are the ones allocated by the postmaster during
> startup?
>
> Also, as this will be done during each sync and those tranches don't change,
> so one could think there is room for improvement. Maybe add a comment that's
> probably not worth optimizing (due to the fact that NamedLWLockTrancheRequests
> should be small enough and the sync rare)?

done.

> 6 ===
>
> There is this existing comment:
>
> /*
>  * NamedLWLockTrancheRequests is both the valid length of the request array,
>  * and the length of the shared-memory NamedLWLockTrancheArray later on.
>  * This variable and NamedLWLockTrancheArray are non-static so that
>  * postmaster.c can copy them to child processes in EXEC_BACKEND builds.
>  */
> int         NamedLWLockTrancheRequests = 0;
>
> Maybe add something like? "Additional dynamic tranche names beyond this count
> are stored in a DSA".

No. I don't like talking about that here. This is already described in:
```
* 3. Extensions can create new tranches using either RequestNamedLWLockTranche
* or LWLockNewTrancheId. Tranche names are reg
```

> 7 ===
>
> +               old_ptrs = dsa_get_address(LWLockTrancheNames.dsa,
> +                                                                  LWLockTrancheNames.shmem->list_ptr);
> +
> +               name_ptrs = dsa_get_address(LWLockTrancheNames.dsa, new_list);
> +
> +               memset(name_ptrs, InvalidDsaPointer, new_alloc);
> +               memcpy(name_ptrs, old_ptrs, sizeof(dsa_pointer) * current_allocated);
> +
> +               dsa_free(LWLockTrancheNames.dsa, LWLockTrancheNames.shmem->list_ptr);
>
> maybe use local variable for LWLockTrancheNames.shmem->list_ptr (and
> LWLockTrancheNames.dsa)?

hmm, I don't see the point to this.

>
> 8 ===
>
> +       needs_sync = (trancheId >= LWLockTrancheNames.allocated ||
> +                                 LWLockTrancheNames.local[trancheId] == NULL)
> +               && (IsUnderPostmaster || !IsPostmasterEnvironment);
>
> formating does not look good.

pgindent told me it's fine. But, I did add a parenthesis around the expression,
so pgindent now aligned the conditions in a better way.

> 9 ===
>
> -       if (trancheId >= LWLockTrancheNamesAllocated ||
> -               LWLockTrancheNames[trancheId] == NULL)
> -               return "extension";
> +       if (trancheId < LWLockTrancheNames.allocated)
> +               tranche_name = LWLockTrancheNames.local[trancheId];
>
> -       return LWLockTrancheNames[trancheId];
> +       if (!tranche_name)
> +               elog(ERROR, "tranche ID %d is not registered", tranche_id_saved);
>
> We now error out instead of returning "extension". That looks ok given the
> up-thread discussion but then the commit message needs some updates as it
> states:
> "
> Additionally, while users should not pass arbitrary tranche IDs (that is,
> IDs not created via LWLockNewTrancheId) to LWLockInitialize, nothing
> technically prevents them from doing so. Therefore, we must continue to
> handle such cases gracefully by returning a default "extension" tranche name.
> "

done.

> 10 ===
>
> +LWLockTrancheNamesInitSize()
> +{
> +       Size            sz;
> +
> +       /*
> +        * This value is used by other facilities, see pgstat_shmem.c, so it's
> +        * good enough.
> +        */
> +       sz = 256 * 1024;
>
> use DSA_MIN_SEGMENT_SIZE?

done.

> 11 ===
>
> +       if (!IsUnderPostmaster)
> +       {
> +               dsa_area   *dsa;
> +               LWLockTrancheNamesShmem *ctl = LWLockTrancheNames.shmem;
> +               char       *p = (char *) ctl;
> +
> +               /* Calculate layout within the shared memory region */
> +               p += MAXALIGN(sizeof(LWLockTrancheNamesShmem));
> +               ctl->raw_dsa_area = p;
> +               p += MAXALIGN(LWLockTrancheNamesInitSize());
>
> LWLockTrancheNamesInitSize() already does MAXALIGN() so the last one
> above is not needed. But the last p advance seems not necessary as not used
> after. I think the same is true in StatsShmemInit() (see [1]).

yeah, good point. done.

> 12 ===
>
> +LWLockTrancheNamesBEInit(void)
> +{
> +       /* already attached, nothing to do */
> +       if (LWLockTrancheNames.dsa)
> +               return;
> +
> +       LWLockTrancheNamesAttach();
> +
> +       /* Set up a process-exit hook to clean up */
>
> s/already/Already/?

done.

> For 0002, a quick review:
>
> 13 ===
>
> +       if (log)
> +               elog(DEBUG3, "current_allocated: %d in tranche names shared memory", new_alloc);
>
> Instead of this, could we elog distinct messages where the patch currently sets
> log = true?

ok, that was my initial thought, but I did not like repeating the
messages. I went
back to distinct messages. I have no strong feelings either way.

>
> 14 ===
>
> -                 xid_wraparound
> +                 xid_wraparound \
> +                 test_lwlock_tranches
>
> breaks the ordering.
>
> 15 ===
>
>  subdir('worker_spi')
>  subdir('xid_wraparound')
> +subdir('test_lwlock_tranches')
>
> Same, breaks the ordering.
>
> 16 ===

done and done.

> +my $ENABLE_LOG_WAIT = 1;
> +
> +my $isession;
> +my $log_location;
> +
> +# Helper function: wait for one or more logs if $ENABLE_LOG_WAIT is true
> +sub maybe_wait_for_log {
> +    my ($node, $logs, $log_loc) = @_;
> +    return $log_loc unless $ENABLE_LOG_WAIT;
>
> is ENABLE_LOG_WAIT needed as it does not change?

That was a remnant of my testing. removed.

see v9

--

Sami

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> see v9

Sorry about the quick follow-up I was not happy with one of the
comment updates in
SyncLWLockTrancheNames. I updated it.

Attached is v10,

--

Sami

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Mon, Aug 18, 2025 at 01:06:42PM -0500, Sami Imseih wrote:
> Attached is v10,

I've been staring at the latest patch for a bit, and I'm a bit concerned
at how much complexity it adds.  I think it's a good idea to keep a local
array of tranche names indexed by tranche ID, but the code for managing the
list of DSA pointers scares me.  I know we were trying to avoid using
dshash earlier, if for no other reason than it's perhaps not the best data
structure for the job, but ISTM we'd eliminate a lot of complexity if we
offloaded the shmem pieces to a dshash table (or some other shmem-based
data structure we have yet to introduce, like a dslist/dsarray).  WDYT?

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> I've been staring at the latest patch for a bit, and I'm a bit concerned
> at how much complexity it adds.

The complexity is the fact we have to track a dsa_pointer that points
to an array of dsa_pointers that track the tranche names. We also
do have to copy the array of dsa_pointers into a new pointer when
enlarging. That adds some complexity that dshash would hide away,
but I don't think it's overly complex.

> I know we were trying to avoid using
> dshash earlier, if for no other reason than it's perhaps not the best data
> structure for the job,

yeah, we really just need an append only structure, so dshash did not
seems unnecessary. But also, we moved away from dshash before it was
decided to have a local cache. Perhaps, it's ok now to go back to dshash
considering that it will only be used in rare scenarios: creations of
new tranches and
syncs to local memory. Most of the time the tranches will be satisfied
directly from local cache.

> (or some other shmem-based
> data structure we have yet to introduce, like a dslist/dsarray).

This will be an interesting API to invest time in, if there could be more
use-cases. I think it's a separate discussion at this point.

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Bertrand Drouvot
Дата:
Hi,

On Mon, Aug 18, 2025 at 05:53:44PM -0500, Sami Imseih wrote:
> > I've been staring at the latest patch for a bit, and I'm a bit concerned
> > at how much complexity it adds.
> 
> The complexity is the fact we have to track a dsa_pointer that points
> to an array of dsa_pointers that track the tranche names. We also
> do have to copy the array of dsa_pointers into a new pointer when
> enlarging. That adds some complexity that dshash would hide away,
> but I don't think it's overly complex.

I also think that's not overly complex but OTOH why add extra complexity if
we have already a machinery at our disposal that takes care of what we need i.e
resizing and so on.

> > I know we were trying to avoid using
> > dshash earlier, if for no other reason than it's perhaps not the best data
> > structure for the job,

I can see that Nathan's concern about dshash was that it could be a bit slower
([1]). That's probably true but that does not worry me that much given that 
adding tranches should be rare enough and the lookups should be rare (to sync the
local cache) and not be that sensible to performance.

> yeah, we really just need an append only structure, so dshash did not
> seems unnecessary. But also, we moved away from dshash before it was
> decided to have a local cache. Perhaps, it's ok now to go back to dshash
> considering that it will only be used in rare scenarios: creations of
> new tranches and
> syncs to local memory. Most of the time the tranches will be satisfied
> directly from local cache.

Yes.

> > (or some other shmem-based
> > data structure we have yet to introduce, like a dslist/dsarray).
> 
> This will be an interesting API to invest time in, if there could be more
> use-cases.

I did a quick check and I did not find current use cases: possible candidates
could be in ExecParallelHashTableAlloc(), PTIterationArray, PTEntryArray for
examples but I think they all know the final size upfront so there is no real
need for dsarray for those).

> I think it's a separate discussion at this point.

OTOH, that would be a valid use case to introduce this new API but I'm not sure
it's worth it given that the dshash looks good enough for our case (even if not
ideal though).

[1]: https://www.postgresql.org/message-id/aHV5BsKoSeOkIsNL%40nathan

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Tue, Aug 19, 2025 at 08:09:53AM +0000, Bertrand Drouvot wrote:
> On Mon, Aug 18, 2025 at 05:53:44PM -0500, Sami Imseih wrote:
>> > (or some other shmem-based
>> > data structure we have yet to introduce, like a dslist/dsarray).
>> 
>> This will be an interesting API to invest time in, if there could be more
>> use-cases.
> 
> I did a quick check and I did not find current use cases: possible candidates
> could be in ExecParallelHashTableAlloc(), PTIterationArray, PTEntryArray for
> examples but I think they all know the final size upfront so there is no real
> need for dsarray for those).
> 
>> I think it's a separate discussion at this point.
> 
> OTOH, that would be a valid use case to introduce this new API but I'm not sure
> it's worth it given that the dshash looks good enough for our case (even if not
> ideal though).

IMHO it'd be okay to proceed with dshash for now.  It would be pretty easy
to switch to something like a "dslist" in the future.

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Andres Freund
Дата:
Hi,

On 2025-08-19 12:29:14 -0500, Nathan Bossart wrote:
> On Tue, Aug 19, 2025 at 08:09:53AM +0000, Bertrand Drouvot wrote:
> > On Mon, Aug 18, 2025 at 05:53:44PM -0500, Sami Imseih wrote:
> >> > (or some other shmem-based
> >> > data structure we have yet to introduce, like a dslist/dsarray).
> >> 
> >> This will be an interesting API to invest time in, if there could be more
> >> use-cases.
> > 
> > I did a quick check and I did not find current use cases: possible candidates
> > could be in ExecParallelHashTableAlloc(), PTIterationArray, PTEntryArray for
> > examples but I think they all know the final size upfront so there is no real
> > need for dsarray for those).
> > 
> >> I think it's a separate discussion at this point.
> > 
> > OTOH, that would be a valid use case to introduce this new API but I'm not sure
> > it's worth it given that the dshash looks good enough for our case (even if not
> > ideal though).
> 
> IMHO it'd be okay to proceed with dshash for now.  It would be pretty easy
> to switch to something like a "dslist" in the future.

Possibly stupid question - is it really worth having a dynamic structure here?
The number of tranches is strictly bound, it seems like it'd be simpler to
have an array of tranch nmes in shared memory.

Greetings,

Andres Freund



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Tue, Aug 19, 2025 at 02:06:50PM -0400, Andres Freund wrote:
> Possibly stupid question - is it really worth having a dynamic structure here?
> The number of tranches is strictly bound, it seems like it'd be simpler to
> have an array of tranch nmes in shared memory.

Tranches can be allocated post-startup with LWLockNewTrancheId() (e.g.,
autoprewarm).

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Andres Freund
Дата:
Hi,

On 2025-08-19 13:31:35 -0500, Nathan Bossart wrote:
> On Tue, Aug 19, 2025 at 02:06:50PM -0400, Andres Freund wrote:
> > Possibly stupid question - is it really worth having a dynamic structure here?
> > The number of tranches is strictly bound, it seems like it'd be simpler to
> > have an array of tranch nmes in shared memory.
> 
> Tranches can be allocated post-startup with LWLockNewTrancheId() (e.g.,
> autoprewarm).

Sure, but we don't need to support a large number of tranches. Just make it,
idk, 128 entries long. Adding a dynamically allocated dsm to every server
seems like a waste - ever shared mapping makes fork / exit slower...

Greetings,

Andres Freund



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Tue, Aug 19, 2025 at 02:37:19PM -0400, Andres Freund wrote:
> On 2025-08-19 13:31:35 -0500, Nathan Bossart wrote:
>> On Tue, Aug 19, 2025 at 02:06:50PM -0400, Andres Freund wrote:
>> > Possibly stupid question - is it really worth having a dynamic structure here?
>> > The number of tranches is strictly bound, it seems like it'd be simpler to
>> > have an array of tranch nmes in shared memory.
>> 
>> Tranches can be allocated post-startup with LWLockNewTrancheId() (e.g.,
>> autoprewarm).
> 
> Sure, but we don't need to support a large number of tranches. Just make it,
> idk, 128 entries long. Adding a dynamically allocated dsm to every server
> seems like a waste - ever shared mapping makes fork / exit slower...

The other issue is that there's presently no limit on the length of a
tranche name registered via LWLockRegisterTranche().  Life would become
much simpler if we're willing to put a limit on both that and the number of
tranches, but thus far we've been trying to avoid it.

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Tom Lane
Дата:
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Tue, Aug 19, 2025 at 02:37:19PM -0400, Andres Freund wrote:
>> Sure, but we don't need to support a large number of tranches. Just make it,
>> idk, 128 entries long. Adding a dynamically allocated dsm to every server
>> seems like a waste - ever shared mapping makes fork / exit slower...

> The other issue is that there's presently no limit on the length of a
> tranche name registered via LWLockRegisterTranche().  Life would become
> much simpler if we're willing to put a limit on both that and the number of
> tranches, but thus far we've been trying to avoid it.

I can hardly imagine a reason why it wouldn't be okay to limit the
lengths of tranche names.  But especially so if an unlimited length
causes practical problems.

            regards, tom lane



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> Nathan Bossart <nathandbossart@gmail.com> writes:
> > On Tue, Aug 19, 2025 at 02:37:19PM -0400, Andres Freund wrote:
> >> Sure, but we don't need to support a large number of tranches. Just make it,
> >> idk, 128 entries long. Adding a dynamically allocated dsm to every server
> >> seems like a waste - ever shared mapping makes fork / exit slower...
>
> > The other issue is that there's presently no limit on the length of a
> > tranche name registered via LWLockRegisterTranche().  Life would become
> > much simpler if we're willing to put a limit on both that and the number of
> > tranches, but thus far we've been trying to avoid it.
>
> I can hardly imagine a reason why it wouldn't be okay to limit the
> lengths of tranche names.  But especially so if an unlimited length
> causes practical problems.

If we limit the tranche name to NAMEDATALEN and also limit the
number of tranches an extension can register, we can put this
all in static shared memory (We would still need to have a backend local
cache to allow lookups to avoid going to shared memory).

However, I am not sure what the limit would be for the number of tranches,
and if we do impose something, will it break anything that is out there?

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Tue, Aug 19, 2025 at 03:52:33PM -0500, Sami Imseih wrote:
> If we limit the tranche name to NAMEDATALEN and also limit the
> number of tranches an extension can register, we can put this
> all in static shared memory (We would still need to have a backend local
> cache to allow lookups to avoid going to shared memory).

I bet we could avoid the local cache by keeping a backend-local copy of
LWLockCounter that gets updated as needed.

> However, I am not sure what the limit would be for the number of tranches,
> and if we do impose something, will it break anything that is out there?

I can think of contrived scenarios where these limits would present
problems, but I'm not aware of anything that folks are actually doing that
would be affected.  In general, the examples I've seen involve allocating a
small number of tranches for an extension, so my assumption is that the
most likely cause of breakage would be someone installing many, many
extensions.

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> On Tue, Aug 19, 2025 at 03:52:33PM -0500, Sami Imseih wrote:
> > If we limit the tranche name to NAMEDATALEN and also limit the
> > number of tranches an extension can register, we can put this
> > all in static shared memory (We would still need to have a backend local
> > cache to allow lookups to avoid going to shared memory).
>
> I bet we could avoid the local cache by keeping a backend-local copy of
> LWLockCounter that gets updated as needed.

maybe. If we agree to impose limits ( both name length and # of tranches ),
that will allow us to do things a bit different.

If there is agreement on setting limits, may I propose
1024 tranches and NAMEDATALEN. Both seem reasonably sufficient.

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Tue, Aug 19, 2025 at 04:16:26PM -0500, Sami Imseih wrote:
> If there is agreement on setting limits, may I propose
> 1024 tranches and NAMEDATALEN. Both seem reasonably sufficient.

Let's proceed with that approach for now.  We can worry about the exact
limits once this is closer to commit.

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
>> If there is agreement on setting limits, may I propose
>> 1024 tranches and NAMEDATALEN. Both seem reasonably sufficient.

> Let's proceed with that approach for now.  We can worry about the exact
> limits once this is closer to commit.

v11 implements the fixed size shared memory as discussed. So, as previous
versions, we totally got rid of LWLockRegisterTranche and users can
register tranches with either RequestNamedLWLockTranche ( during startup )
or LWLockNewTrancheId ( after startup ).

The NamedLWLockTrancheArray, allocated at the end of MainLWLockArray
with a fixed size limited to the number of tranches requested via
RequestNamedLWLockTranche, has been changed in v11.
It can now grow, up to a maximum number of total tranches,
currently 1024, indexed by LWLockCounter. We don't need to keep track
of the number of tranches in a separate variable, and can rely on
LWLockCounter only.

I kept the local array to serve consecutive reads and to avoid having to
take a shared lock on shared memory every time
GetLWTrancheName is called. A new LWLock to protect this array is
required.

LWLockInitialize also errors if the tranche used has not been registered,
as was the case in the previous patch versions.

Several error message to enforce the fixed size limits have also been added.

The next patch set has 0001 which is the core change, and 0002 are the
tests ( not sure if we need them, but they help in the review ).

The tests in 0002 fail on EXEC_BACKEND due to a newly discovered bug [0],
that repros on HEAD and likely other branches.

--
Sami

[0] https://www.postgresql.org/message-id/CAA5RZ0v1_15QPg5Sqd2Qz5rh_qcsyCeHHmRDY89xVHcy2yt5BQ%40mail.gmail.com

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
On Fri, Aug 22, 2025 at 3:01 PM Sami Imseih <samimseih@gmail.com> wrote:
>
> >> If there is agreement on setting limits, may I propose
> >> 1024 tranches and NAMEDATALEN. Both seem reasonably sufficient.
>
> > Let's proceed with that approach for now.  We can worry about the exact
> > limits once this is closer to commit.
>
> v11 implements the fixed size shared memory as discussed.

One point I did not make earlier is that the tranche name lengths will
need to be as long as we allow in dsm_registry.c.

```
#define DSMR_NAME_LEN 128

#define DSMR_DSA_TRANCHE_SUFFIX " DSA"
#define DSMR_DSA_TRANCHE_SUFFIX_LEN (sizeof(DSMR_DSA_TRANCHE_SUFFIX) - 1)
#define DSMR_DSA_TRANCHE_NAME_LEN (DSMR_NAME_LEN + DSMR_DSA_TRANCHE_SUFFIX_LEN)
```

We should make the limits external (in lwlock.h) and dsa_registery.c
can enforce
based on those values.

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Tom Lane
Дата:
Sami Imseih <samimseih@gmail.com> writes:
> One point I did not make earlier is that the tranche name lengths will
> need to be as long as we allow in dsm_registry.c.

> #define DSMR_NAME_LEN 128

Huh.  Why is that different from NAMEDATALEN in the first place?

            regards, tom lane



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Fri, Aug 22, 2025 at 05:33:55PM -0400, Tom Lane wrote:
> Sami Imseih <samimseih@gmail.com> writes:
>> One point I did not make earlier is that the tranche name lengths will
>> need to be as long as we allow in dsm_registry.c.
> 
>> #define DSMR_NAME_LEN 128
> 
> Huh.  Why is that different from NAMEDATALEN in the first place?

One of the reviewers of commit fe07100 requested it [0].  At the time,
there was no reason to use NAMEDATALEN.  I'm fine with lowering it if
needed for the shared tranche name work.

Presently, we also append a " DSA" suffix to the DSA tranche name in
GetNamedDSHash().  I was originally trying to use separate tranches for the
DSA and the dshash table for observability purposes, but since it seems to
be more trouble than it's worth, perhaps we should only allocate one
tranche.

[0] https://postgr.es/m/58DB9EB7-4646-4508-84BA-F0F067A7E8BA%40gmail.com

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Fri, Aug 22, 2025 at 03:01:53PM -0500, Sami Imseih wrote:
> I kept the local array to serve consecutive reads and to avoid having to
> take a shared lock on shared memory every time GetLWTrancheName is
> called. A new LWLock to protect this array is required.

I'm not seeing why we need this cache anymore.  This is an append-only
list, so we could instead keep a backend-local copy of LWLockCounter that
gets updated as needed.  As long as the ID is less than our backend-local
counter, we can go straight to the shared array.  If it is greater, we'll
have to first update our counter, which should be rare and inexpensive.

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> On Fri, Aug 22, 2025 at 03:01:53PM -0500, Sami Imseih wrote:
> > I kept the local array to serve consecutive reads and to avoid having to
> > take a shared lock on shared memory every time GetLWTrancheName is
> > called. A new LWLock to protect this array is required.
>
> I'm not seeing why we need this cache anymore.  This is an append-only
> list, so we could instead keep a backend-local copy of LWLockCounter that
> gets updated as needed.  As long as the ID is less than our backend-local
> counter, we can go straight to the shared array.  If it is greater, we'll
> have to first update our counter, which should be rare and inexpensive.

When we lookup from shared array only, we need to take a shared lock
every lookup. Acquiring that lock is what I am trying to avoid. You
are saying it's not worth optimizing that part, correct?

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Mon, Aug 25, 2025 at 04:37:21PM -0500, Sami Imseih wrote:
> When we lookup from shared array only, we need to take a shared lock
> every lookup. Acquiring that lock is what I am trying to avoid. You
> are saying it's not worth optimizing that part, correct?

Why do we need a shared lock here?  IIUC there's no chance that existing
entries will change.  We'll only ever add new ones to the end.

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> On Mon, Aug 25, 2025 at 04:37:21PM -0500, Sami Imseih wrote:
> > When we lookup from shared array only, we need to take a shared lock
> > every lookup. Acquiring that lock is what I am trying to avoid. You
> > are saying it's not worth optimizing that part, correct?
>
> Why do we need a shared lock here?  IIUC there's no chance that existing
> entries will change.  We'll only ever add new ones to the end.

hmm, can we really avoid a shared lock when reading from shared memory?
considering access for both reads and writes can be concurrent to shared
memory. We are also taking an exclusive lock when writing a new tranche.

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Mon, Aug 25, 2025 at 04:59:41PM -0500, Sami Imseih wrote:
> hmm, can we really avoid a shared lock when reading from shared memory?
> considering access for both reads and writes can be concurrent to shared
> memory. We are also taking an exclusive lock when writing a new tranche.

We probably want to hold a lock while we 1) increment LWLockCounter and
copy a new tranche name to memory and 2) while we copy the current value of
LWLockCounter to our backend-local variable.  Otherwise, AFAICT we don't
need one.  We could probably use ShmemLock for this.

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> On Mon, Aug 25, 2025 at 04:59:41PM -0500, Sami Imseih wrote:
> > hmm, can we really avoid a shared lock when reading from shared memory?
> > considering access for both reads and writes can be concurrent to shared
> > memory. We are also taking an exclusive lock when writing a new tranche.
>
> We probably want to hold a lock while we 1) increment LWLockCounter and
> copy a new tranche name to memory and

In the last rev, I removed the spinlock acquired on ShmemLock in-lieu of
a LWLock. This is because I wanted a single LWLock acquisition while
both incrementing LWLockCounter and writing to shared memory, and
doing this much work, particularly writing to shared memory,
with a spinlock seemed inappropriate. With that said, this is not high
concurrency of performance sensitive activity at all, so perhaps I was
being overly paranoid.

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> > On Mon, Aug 25, 2025 at 04:59:41PM -0500, Sami Imseih wrote:
> > > hmm, can we really avoid a shared lock when reading from shared memory?
> > > considering access for both reads and writes can be concurrent to shared
> > > memory. We are also taking an exclusive lock when writing a new tranche.
> >
> > We probably want to hold a lock while we 1) increment LWLockCounter and
> > copy a new tranche name to memory and
>
> In the last rev, I removed the spinlock acquired on ShmemLock in-lieu of
> a LWLock. This is because I wanted a single LWLock acquisition while
> both incrementing LWLockCounter and writing to shared memory, and
> doing this much work, particularly writing to shared memory,
> with a spinlock seemed inappropriate. With that said, this is not high
> concurrency of performance sensitive activity at all, so perhaps I was
> being overly paranoid.

Here is v12 that replaces the LWLock to access the shared memory with a
ShmemLock and implements a local counter.

--
Sami

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Tue, Aug 26, 2025 at 02:56:22PM -0500, Sami Imseih wrote:
> Here is v12 that replaces the LWLock to access the shared memory with a
> ShmemLock and implements a local counter.

This looks much closer to what I was imagining.

         /* Initialize the LWLock tranche for the DSA. */
-        dsa_state->tranche = LWLockNewTrancheId();
+        dsa_state->tranche = LWLockNewTrancheId(dsa_state->tranche_name);
         sprintf(dsa_state->tranche_name, "%s%s", name, DSMR_DSA_TRANCHE_SUFFIX);
-        LWLockRegisterTranche(dsa_state->tranche, dsa_state->tranche_name);
 
         /* Initialize the LWLock tranche for the dshash table. */
-        dsh_state->tranche = LWLockNewTrancheId();
+        dsh_state->tranche = LWLockNewTrancheId(dsh_state->tranche_name);
         strcpy(dsh_state->tranche_name, name);
-        LWLockRegisterTranche(dsh_state->tranche, dsh_state->tranche_name);

We probably need to do the sprintf/strcpy before LWLockNewTrancheId().
Also, I'm thinking we should just use the same tranche for both the DSA and
the dshash table [0] to evade the DSMR_DSA_TRANCHE_SUFFIX problem, i.e.,
those tranche names potentially require more space.

+    /* Space for name of each tranche. */
+    size = add_size(size, mul_size(MAX_NAMED_TRANCHES, MAX_NAMED_TRANCHES));

Presumably one of the mul_size() arguments should be
MAX_NAMED_TRANCHES_NAME_LEN.

+NamedLWLockTrancheArray    "Waiting to access the named LWLock tranches array."

+PG_LWLOCK(54, NamedLWLockTrancheArray)

These can be removed now, right?

[0] https://postgr.es/m/aKzIg1JryN1qhNuy%40nathan

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
fixed the issues mentioned above in v13.

> We probably need to do the sprintf/strcpy before LWLockNewTrancheId().
> Also, I'm thinking we should just use the same tranche for both the DSA and
> the dshash table [0] to evade the DSMR_DSA_TRANCHE_SUFFIX problem, i.e.,
> those tranche names potentially require more space.

v13 also includes a separate patch for this. It removes the
DSMR_DSA_TRANCHE_SUFFIX, and dsh and dsa now use the same
user defined tranche. The tranche name is also limited to
the max length of a named tranche, MAX_NAMED_TRANCHES_NAME_LEN,
coming from lwlock.h

--
Sami

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Bertrand Drouvot
Дата:
Hi,

On Tue, Aug 26, 2025 at 05:50:34PM -0500, Sami Imseih wrote:
> fixed the issues mentioned above in v13.
> 
> > We probably need to do the sprintf/strcpy before LWLockNewTrancheId().
> > Also, I'm thinking we should just use the same tranche for both the DSA and
> > the dshash table [0] to evade the DSMR_DSA_TRANCHE_SUFFIX problem, i.e.,
> > those tranche names potentially require more space.
> 
> v13 also includes a separate patch for this. It removes the
> DSMR_DSA_TRANCHE_SUFFIX, and dsh and dsa now use the same
> user defined tranche. The tranche name is also limited to
> the max length of a named tranche, MAX_NAMED_TRANCHES_NAME_LEN,
> coming from lwlock.h

Thanks for the new version.

=== 1

+LWLockNewTrancheId(const char *tranche_name)
 {
-       int                     result;
-       int                *LWLockCounter;
+       int                     tranche_id,
+                               index,
+                          *LWLockCounter;
+       Size            tranche_name_length = strlen(tranche_name) + 1;

We need to check if tranche_name is NULL and report an error if that's the case.
If not, strlen() would segfault.

=== 2

+       if (tranche_name_length > MAX_NAMED_TRANCHES_NAME_LEN)
+               elog(ERROR, "tranche name too long");

I think that we should mention in the doc that the tranche name is limited to
63 bytes.

=== 3


-       /* Convert to array index. */
-       tranche_id -= LWTRANCHE_FIRST_USER_DEFINED;
+       tranche_id = (*LWLockCounter)++;
+       index = tranche_id - LWTRANCHE_FIRST_USER_DEFINED;

-       /* If necessary, create or enlarge array. */
-       if (tranche_id >= LWLockTrancheNamesAllocated)
+       if (index >= MAX_NAMED_TRANCHES)
        {
-               int                     newalloc;
+               SpinLockRelease(ShmemLock);
+               elog(ERROR, "too many LWLock tranches registered");
+       }

-               newalloc = pg_nextpower2_32(Max(8, tranche_id + 1));
+       /* Directly copy into the NamedLWLockTrancheArray */
+       strcpy((char *) NamedLWLockTrancheArray + (index * MAX_NAMED_TRANCHES_NAME_LEN),
+                  tranche_name);

I was skeptical about using strcpy() while we hold a spinlock. I do see some
examples with strlcpy() though (walreceiver.c for example), so that looks OK-ish.

Using strcpy() might be OK too, as we already have validated the length, but maybe
it would be safer to switch to strlcpy(), instead?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
Thanks for reviewing!

> === 1
>
> We need to check if tranche_name is NULL and report an error if that's the case.
> If not, strlen() would segfault.

Added an error. Good call. The error message follows previously used
convention.

```
+       if (!tranche_name)
+               elog(ERROR, "tranche name cannot be null");
```

> === 2
>
> +       if (tranche_name_length > MAX_NAMED_TRANCHES_NAME_LEN)
> +               elog(ERROR, "tranche name too long");
>
> I think that we should mention in the doc that the tranche name is limited to
> 63 bytes.

Done. I just mentioned NAMEDATALEN -1 in the docs.

> === 3
>
> I was skeptical about using strcpy() while we hold a spinlock. I do see some
> examples with strlcpy() though (walreceiver.c for example), so that looks OK-ish.
>
> Using strcpy() might be OK too, as we already have validated the length, but maybe
> it would be safer to switch to strlcpy(), instead?

OK, since that is the pattern used, I changed to strlcpy. But since we are doing
checks in advance, I think it will be safe either way.

--
Sami

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Bertrand Drouvot
Дата:
Hi,

On Wed, Aug 27, 2025 at 02:13:39PM -0500, Sami Imseih wrote:
> Thanks for reviewing!
> 
> > === 1
> >
> > We need to check if tranche_name is NULL and report an error if that's the case.
> > If not, strlen() would segfault.
> 
> Added an error. Good call. The error message follows previously used
> convention.
> 
> ```
> +       if (!tranche_name)
> +               elog(ERROR, "tranche name cannot be null");
> ```

+LWLockNewTrancheId(const char *tranche_name)
 {
-       int                     result;
-       int                *LWLockCounter;
+       int                     tranche_id,
+                               index,
+                          *LWLockCounter;
+       Size            tranche_name_length = strlen(tranche_name) + 1;

-       LWLockCounter = (int *) ((char *) MainLWLockArray - sizeof(int));
-       /* We use the ShmemLock spinlock to protect LWLockCounter */
-       SpinLockAcquire(ShmemLock);
-       result = (*LWLockCounter)++;
-       SpinLockRelease(ShmemLock);
+       if (!tranche_name)
+               elog(ERROR, "tranche name cannot be null");

The check has to be done before the strlen() call, if not it segfault:

Breakpoint 1, LWLockNewTrancheId (tranche_name=0x0) at lwlock.c:569
569             Size            tranche_name_length = strlen(tranche_name) + 1;
(gdb) n

Program received signal SIGSEGV, Segmentation fault.

> > === 2
> >
> > +       if (tranche_name_length > MAX_NAMED_TRANCHES_NAME_LEN)
> > +               elog(ERROR, "tranche name too long");
> >
> > I think that we should mention in the doc that the tranche name is limited to
> > 63 bytes.
> 
> Done. I just mentioned NAMEDATALEN -1 in the docs.

Most of the places where NAMEDATALEN is mentioned in sgml files also mention
it's 64 bytes. Should we do the same here?

> > === 3
> >
> > I was skeptical about using strcpy() while we hold a spinlock. I do see some
> > examples with strlcpy() though (walreceiver.c for example), so that looks OK-ish.
> >
> > Using strcpy() might be OK too, as we already have validated the length, but maybe
> > it would be safer to switch to strlcpy(), instead?
> 
> OK, since that is the pattern used, I changed to strlcpy. But since we are doing
> checks in advance, I think it will be safe either way.

Agree.

- * stores the names of all dynamically-created tranches known to the current
- * process.  Any unused entries in the array will contain NULL.
+ * stores the names of all dynamically created tranches in shared memory.
+ * Any unused entries in the array will contain NULL. This variable is
+ * non-static so that postmaster.c can copy it to child processes in
+ * EXEC_BACKEND builds.

"Any unused entries in the array will contain NULL": this is not true anymore.
It now contains empty strings:

(gdb) p *(char(*)[64])NamedLWLockTrancheArray@4
$5 = {"tutu", '\000' <repeats 59 times>, '\000' <repeats 63 times>, '\000' <repeats 63 times>, '\000' <repeats 63
times>}

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> The check has to be done before the strlen() call, if not it segfault:

I don't know what I was thinking there :( silly mistake.
It was also missed in RequestNamedLWLockTranche.

> Most of the places where NAMEDATALEN is mentioned in sgml files also mention
> it's 64 bytes. Should we do the same here?

I don't have an issue with that, although I did not like how the
various docs are not
consistent. Some claim 64 characters, others claim 63 bytes. They are
not the same
since multibyte characters are allowed. For example [0] is definitely wrong,
and it should say "bytes".
I think it would be nice if all references to NAMEDATALEN point to a single doc
(a follow-up topic, perhaps)

> "Any unused entries in the array will contain NULL": this is not true anymore.
> It now contains empty strings:

Yes. Fixed the comment.

Also, it has not yet been discussed that the max number of named
tranches should be.
Thus far I have been using a likely extremely high value of 1024. In
v15, I set it to 128
as that was a possibly more realistic number and one suggested earlier [1].

There maybe some odd cases out there in which this value may not be enough;
many extensions that init lwlocks  or maybe some extension out there that uses
partitioned locks and assign a unique tranche for every lock. I have
not seen anything
like this. Maybe it's not be something to worry about and we can say
128 is reasonable sufficiently
large. I would hate to reserve too much shared memory for this unnecessarily.

Thoughts?

[0] https://www.postgresql.org/docs/18/runtime-config-logging.html#GUC-APPLICATION-NAME
[1] https://www.postgresql.org/message-id/5to6tftuml6nkas4jaaljfzecasvslxq3mumeslh74wsol4mzw%40rgxpxxlqqwtf


--
Sami

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
I've spent some time getting this prepared for commit, so apologies if I am
steamrolling over some of the latest discussion points.  The majority of
what I've changed amounts to what I'd characterize as relatively minor
editorialization, but I'd imagine reasonable people could disagree.  The
bigger things I've changed include:

* I've moved the DSM registry adjustments to be a prerequisite patch.  I
  noticed that we can also stop tracking the tranche names there since we
  always use the name of the DSM registry entry.

* I modified the array of tranche names to be a char ** to simplify
  lookups.  I also changed how it is first initialized in CreateLWLocks() a
  bit.

* I've left out the tests for now.  Those are great for the development
  phase, but I'm not completely sold on committing them.  In any case, I'd
  probably treat that as a follow-up effort once this stuff is committed.

I think this patch set will require reworking the "GetNamedLWLockTranche
crashes on Windows in normal backend" patch [0], but AFAICT we can easily
adjust it to scan through NamedLWLockTrancheNames instead.

Overall, I'm pretty happy about these patches.  They simplify the API and
the code while also fixing the problem with tranche name visibility.

[0] https://postgr.es/m/CAA5RZ0v1_15QPg5Sqd2Qz5rh_qcsyCeHHmRDY89xVHcy2yt5BQ%40mail.gmail.com

-- 
nathan

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> * I modified the array of tranche names to be a char ** to simplify
>   lookups.  I also changed how it is first initialized in CreateLWLocks() a
>   bit.

That works also.

> * I've left out the tests for now.  Those are great for the development
>   phase, but I'm not completely sold on committing them.  In any case, I'd
>   probably treat that as a follow-up effort once this stuff is committed.

I was mainly using them to verify the functionality since we lack tests in
this area. I think long-term, it will be good to have test coverage. We don't
need to decide now.

> I think this patch set will require reworking the "GetNamedLWLockTranche
> crashes on Windows in normal backend" patch [0], but AFAICT we can easily
> adjust it to scan through NamedLWLockTrancheNames instead.

Except, we will need to turn the char**NamedLWLockTranche into a struct
that will hold the num_lwlocks as well. Something like. But that is doable.

```
typedef struct NamedLWLockTranche
{
    char        trancheName[NAMEDATALEN];
    int            num_lwlocks;
} NamedLWLockTranche;
```
If there is no interest to backpatch [0], maybe we should just make this
change as part of this patch set. What do you think? I can make this change
in v18.

> Overall, I'm pretty happy about these patches.  They simplify the API and
> the code while also fixing the problem with tranche name visibility.

Just a few things that were discussed earlier, that I incorporated now.

1/ We should be checking that tranche_name is NOT NULL when
LWLockNewTrancheId or RequestNamedLWLockTranche is called.
Also check for the length of the tranche name inside
RequestNamedLWLockTranche, instead of relying on an Assert to check
the length. I think that is much safer.

2/ Bertrand suggested to use strlcpy instead of strcpy when copying while
holding a spinlock [1]

3/ There was a doc update to mention the 63 byte tranche length. We can
skip that for now, since the ERROR message in v16 is clear about the
limit.


[0] https://postgr.es/m/CAA5RZ0v1_15QPg5Sqd2Qz5rh_qcsyCeHHmRDY89xVHcy2yt5BQ%40mail.gmail.com
[1] https://www.postgresql.org/message-id/CAA5RZ0ve-fHuNYW-ruMwg1y1v7-aCqMm_MiNq1KOdg2Y2-pKDw%40mail.gmail.com

--
Sami

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Bertrand Drouvot
Дата:
Hi,

On Thu, Aug 28, 2025 at 05:53:23PM -0500, Sami Imseih wrote:
> Just a few things that were discussed earlier, that I incorporated now.
> 
> 1/ We should be checking that tranche_name is NOT NULL when
> LWLockNewTrancheId or RequestNamedLWLockTranche is called.

Right, if not strlen() does segfault.

In addition to checking for NULL, should we also check for empty string? Currently,
the patch does accept strlen(tranche_name) == 0.

> Also check for the length of the tranche name inside
> RequestNamedLWLockTranche, instead of relying on an Assert to check
> the length. I think that is much safer.

Same remark as above but in RequestNamedLWLockTranche().

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> On Thu, Aug 28, 2025 at 05:53:23PM -0500, Sami Imseih wrote:
> > Just a few things that were discussed earlier, that I incorporated now.
> >
> > 1/ We should be checking that tranche_name is NOT NULL when
> > LWLockNewTrancheId or RequestNamedLWLockTranche is called.
>
> Right, if not strlen() does segfault.
>
> In addition to checking for NULL, should we also check for empty string? Currently,
> the patch does accept strlen(tranche_name) == 0.

I am not inclined to prevent an empty string. It's currently allowed and rather
not change that.

> ```
> typedef struct NamedLWLockTranche
> {
>    char        trancheName[NAMEDATALEN];
>    int            num_lwlocks;
> } NamedLWLockTranche;
> ```
> if there is no interest to backpatch [0], maybe we should just make this
> change as part of this patch set. What do you think? I can make this change
> in v18.

Here is v18. It includes a third patch to fix the issue identified in
[0], which can
be applied to HEAD as part of this thread. If we want to backpatch the stable
branches, the version in [0] is suitable.

Note that I created a LWLockNewTrancheIdInternal which takes a tranch
name and number of lwlocks. The Internal version is used during startup when
requested lwlocks are appended to shared memory, and the existing
LWLockNewTrancheId  calls the internal version with 0 lwlocks.
This keeps all the logic to appending a new tranche ( while holding
the spinlock )
in the same routine.


--
Sami

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Thu, Aug 28, 2025 at 05:53:23PM -0500, Sami Imseih wrote:
>> I think this patch set will require reworking the "GetNamedLWLockTranche
>> crashes on Windows in normal backend" patch [0], but AFAICT we can easily
>> adjust it to scan through NamedLWLockTrancheNames instead.
> 
> Except, we will need to turn the char**NamedLWLockTranche into a struct
> that will hold the num_lwlocks as well. Something like. But that is doable.

Ah, right.

> If there is no interest to backpatch [0], maybe we should just make this
> change as part of this patch set. What do you think? I can make this change
> in v18.

I think we should keep that separate.  I don't see any benefit to combining
it, and it'd be good to keep the discussion in a single thread in the
archives for future reference.

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
I've committed the DSM registry changes.  I apologize for the lackluster
commit message.  My attempts at including all the details ended up being
super wordy and hard to read, so I decided to keep it terse and leave out
some of the context.

I've also attached a rebased patch that addresses all the latest feedback.
A reworked verison of the test patch is also included, but that's mostly
intended for CI purposes and is still not intended for commit (yet).

-- 
nathan

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Fri, Aug 29, 2025 at 09:51:38PM -0500, Nathan Bossart wrote:
> I've also attached a rebased patch that addresses all the latest feedback.
> A reworked verison of the test patch is also included, but that's mostly
> intended for CI purposes and is still not intended for commit (yet).

And here's an attempt at fixing the alignment problems revealed by cfbot.

-- 
nathan

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
>> I've also attached a rebased patch that addresses all the latest feedback.
>> A reworked verison of the test patch is also included, but that's mostly
>> intended for CI purposes and is still not intended for commit (yet).

> And here's an attempt at fixing the alignment problems revealed by cfbot.

This LGTM. Once we get this committed, will follow-up on [0]

[0] https://www.postgresql.org/message-id/CAA5RZ0v1_15QPg5Sqd2Qz5rh_qcsyCeHHmRDY89xVHcy2yt5BQ%40mail.gmail.com

--
Sami



Re: Improve LWLock tranche name visibility across backends

От
Bertrand Drouvot
Дата:
Hi,

On Sat, Aug 30, 2025 at 09:14:46AM -0500, Nathan Bossart wrote:
> On Fri, Aug 29, 2025 at 09:51:38PM -0500, Nathan Bossart wrote:
> > I've also attached a rebased patch that addresses all the latest feedback.
> > A reworked verison of the test patch is also included, but that's mostly
> > intended for CI purposes and is still not intended for commit (yet).
> 
> And here's an attempt at fixing the alignment problems revealed by cfbot.

Indeed:

==24409==Using libbacktrace symbolizer.
../src/backend/storage/lmgr/lwlock.c:441:31: runtime error: store to misaligned address 0x7f7644188904 for type 'char
*',which requires 8 byte alignment
 

Changes look good.

Not directly related, but I think that we can get rid of:

size = add_size(size, LWLOCK_PADDED_SIZE);

in LWLockShmemSize() and of:

ptr += LWLOCK_PADDED_SIZE - ((uintptr_t) ptr) % LWLOCK_PADDED_SIZE;

in CreateLWLocks(), and just make use of CACHELINEALIGN().

Attached, a patch doing so. It applies on top of your v20.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Rahila Syed
Дата:
Hi,


I've also attached a rebased patch that addresses all the latest feedback.
A reworked verison of the test patch is also included, but that's mostly
intended for CI purposes and is still not intended for commit (yet).


Please see below for some comments regarding v20 patch. 

1. Following unused declaration is present in the lwlock.h file.

 extern void LWLockRegisterTranche(int tranche_id, const char *tranche_name);  

2. I am struggling to understand the use of LocalLWLockCounter in the patch.

If the intention is to delay acquiring the spin lock when trancheId is less than LocalLWLockCounter,
it seems possible that we might read NamedLWLockTrancheNames in the GetLWTrancheName function
after LocalLWLockCounter has been updated, but before NamedLWLockTrancheNames has been updated
in the LWLockNewTrancheId function.
To prevent reading an outdated tranche name, we should acquire the spin lock unconditionally in the
GetLWTrancheName function

3. 
+               ereport(ERROR,
+                               (errmsg("maximum number of tranches already registered"),
+                                errdetail("No more than %d tranches may be registered.",
+                                                  MAX_NAMED_TRANCHES)));
+       }

Since this patch sets a maximum limit on the number of LW lock tranches that can be registered,
would it make sense to offer a configurable option rather than using a hard-coded MAX_NUM_TRANCHES?
This will allow users who have reached the maximum limit to register their LW Locks.

Thank you,
Rahila Syed
  

Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Mon, Sep 01, 2025 at 10:18:46AM +0000, Bertrand Drouvot wrote:
> Changes look good.

Thanks for looking.

> Not directly related, but I think that we can get rid of:
> 
> size = add_size(size, LWLOCK_PADDED_SIZE);
> 
> in LWLockShmemSize() and of:
> 
> ptr += LWLOCK_PADDED_SIZE - ((uintptr_t) ptr) % LWLOCK_PADDED_SIZE;
> 
> in CreateLWLocks(), and just make use of CACHELINEALIGN().

Let's take care of the tranche name stuff first.

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Mon, Sep 01, 2025 at 06:35:55PM +0530, Rahila Syed wrote:
> Please see below for some comments regarding v20 patch.

Thanks for looking.

> 1. Following unused declaration is present in the lwlock.h file.
> 
>  extern void LWLockRegisterTranche(int tranche_id, const char *tranche_name);

Whoops, good catch.

> If the intention is to delay acquiring the spin lock when trancheId is
> less than LocalLWLockCounter, it seems possible that we might read
> NamedLWLockTrancheNames in the GetLWTrancheName function after
> LocalLWLockCounter has been updated, but before NamedLWLockTrancheNames
> has been updated in the LWLockNewTrancheId function. To prevent reading
> an outdated tranche name, we should acquire the spin lock unconditionally
> in the GetLWTrancheName function

We only ever add new tranche names to the end of the list, so we should be
able to avoid the lock for the vast majority of tranche name lookups.  We
hold the lock while we increment LWLockCounter and add a name, and also
whenever we update LocalLWLockCounter.  This should prevent any unsafe
accesses to NamedLWLockTrancheNames.

I've added some additional commentary about this in v21.

> +               ereport(ERROR,
> +                               (errmsg("maximum number of tranches already registered"),
> +                                errdetail("No more than %d tranches may be registered.",
> +                                                  MAX_NAMED_TRANCHES)));
> +       }
> 
> Since this patch sets a maximum limit on the number of LW lock tranches
> that can be registered, would it make sense to offer a configurable
> option rather than using a hard-coded MAX_NUM_TRANCHES? This will allow
> users who have reached the maximum limit to register their LW Locks.

Hm.  I'm not sure.  I think it would be good to offer a way to accommodate
more tranche names that didn't involve recompiling, but I don't know
whether that situation is likely enough in practice to add yet another GUC.
Thus far, we've been operating under the assumption that we'll be able to
choose a number far beyond any realistic usage.  If others have opinions
about this, please share...

-- 
nathan

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
Committed.

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Rahila Syed
Дата:
Hi Nathan,
 
We only ever add new tranche names to the end of the list, so we should be
able to avoid the lock for the vast majority of tranche name lookups.  We
hold the lock while we increment LWLockCounter and add a name, and also
whenever we update LocalLWLockCounter.  This should prevent any unsafe
accesses to NamedLWLockTrancheNames.

I've added some additional commentary about this in v21.

Thanks for adding the comments. 
I think there may be a brief period in the current code where unsafe access to
LWLockTrancheNames is possible.

Since updates to LocalLWLockCounter and LWLockTrancheNames are performed while
holding the lock, but reading LocalLWLockCounter and then LWLockTrancheNames in
GetLWTrancheName can occur without acquiring the same lock in case trancheID < LocalLWLockCounter,
There is a small window between updating LocalLWLockCounter and adding the name
to LWLockTrancheNames. During this window, if GetLWTrancheNames is called, it might attempt
to access a name in LWLockTrancheNames that hasn't been added yet.  

I’ll see if I can demonstrate the situation with a test.  

 

> +               ereport(ERROR,
> +                               (errmsg("maximum number of tranches already registered"),
> +                                errdetail("No more than %d tranches may be registered.",
> +                                                  MAX_NAMED_TRANCHES)));
> +       }
>
> Since this patch sets a maximum limit on the number of LW lock tranches
> that can be registered, would it make sense to offer a configurable
> option rather than using a hard-coded MAX_NUM_TRANCHES? This will allow
> users who have reached the maximum limit to register their LW Locks.

Hm.  I'm not sure.  I think it would be good to offer a way to accommodate
more tranche names that didn't involve recompiling, but I don't know
whether that situation is likely enough in practice to add yet another GUC.
Thus far, we've been operating under the assumption that we'll be able to
choose a number far beyond any realistic usage.  If others have opinions
about this, please share...

 
Thank you for clarifying the rationale behind the decision.

Thank you,
Rahila Syed

Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Thu, Sep 04, 2025 at 03:29:52PM +0530, Rahila Syed wrote:
> Since updates to LocalLWLockCounter and LWLockTrancheNames are performed
> while holding the lock, but reading LocalLWLockCounter and then
> LWLockTrancheNames in GetLWTrancheName can occur without acquiring the
> same lock in case trancheID < LocalLWLockCounter, There is a small window
> between updating LocalLWLockCounter and adding the name to
> LWLockTrancheNames. During this window, if GetLWTrancheNames is called,
> it might attempt to access a name in LWLockTrancheNames that hasn't been
> added yet.

We hold the lock when reading/writing the shared-memory LWLockCounter.
There's no need to take a lock when reading the backend-local
LocalLWLockCounter, as it can't be modified by any other process.
Additionally, when adding a new tranche name, we hold the lock the entire
time while we bump the shared-memory counter and copy the name, so there's
no chance another backend will see the counter update but not the name or
attempt to use the same LWLockTrancheNames slot.

-- 
nathan



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Wed, Sep 03, 2025 at 02:01:14PM -0500, Nathan Bossart wrote:
> Committed.

I'm having some regrets about the changes to RequestNamedLWLockTranche().
Specifically, when it is first called, it immediately allocates an array
big enough to hold 256 requests (~17 KB), whereas it used to only allocate
space for 16 requests (~1 KB) and resize as needed.  Furthermore, the
MAX_NAMED_TRANCHES check isn't actually needed because InitializeLWLocks()
will do the same check via its calls to LWLockNewTrancheId() for all the
named tranche requests.

-- 
nathan

Вложения

Re: Improve LWLock tranche name visibility across backends

От
Sami Imseih
Дата:
> I'm having some regrets about the changes to RequestNamedLWLockTranche().
> Specifically, when it is first called, it immediately allocates an array
> big enough to hold 256 requests (~17 KB), whereas it used to only allocate
> space for 16 requests (~1 KB) and resize as needed.

I liked removing the repalloc calls inside this routine and did not think
it was worth optimizing. I am OK with reverting it back. Although v1
is incorrect since it's still initializing
NamedLWLockTrancheRequestArray to MAX_NAMED_TRANCHES
```

     if (NamedLWLockTrancheRequestArray == NULL)
     {
+        NamedLWLockTrancheRequestsAllocated = 16;
         NamedLWLockTrancheRequestArray = (NamedLWLockTrancheRequest *)
             MemoryContextAlloc(TopMemoryContext,
                                MAX_NAMED_TRANCHES
                                * sizeof(NamedLWLockTrancheRequest));
     }

```

instead of MAX_NAMED_TRANCHES, it should be
NamedLWLockTrancheRequestsAllocated .

Also, Previously NamedLWLockTrancheRequestsAllocated was global, but I don't
think it should ever be used outside of this function, so it's OK to declare it
as you have.

> Furthermore, the
> MAX_NAMED_TRANCHES check isn't actually needed because InitializeLWLocks()
> will do the same check via its calls to LWLockNewTrancheId() for all the
> named tranche requests.

I thought about that one and decided to add the error message there, since
requesting a tranche happens way before LWLockNewTrancheId is called
during CreateLWLocks, so it was more about erroring out slightly earlier.
But it may be ok to also just remove it.

--
Sami Imseih
Amazon Web Services (AWS)



Re: Improve LWLock tranche name visibility across backends

От
Nathan Bossart
Дата:
On Thu, Sep 04, 2025 at 12:30:27PM -0500, Sami Imseih wrote:
> I liked removing the repalloc calls inside this routine and did not think
> it was worth optimizing. I am OK with reverting it back. Although v1
> is incorrect since it's still initializing
> NamedLWLockTrancheRequestArray to MAX_NAMED_TRANCHES

Committed with that fix.

>> Furthermore, the
>> MAX_NAMED_TRANCHES check isn't actually needed because InitializeLWLocks()
>> will do the same check via its calls to LWLockNewTrancheId() for all the
>> named tranche requests.
> 
> I thought about that one and decided to add the error message there, since
> requesting a tranche happens way before LWLockNewTrancheId is called
> during CreateLWLocks, so it was more about erroring out slightly earlier.
> But it may be ok to also just remove it.

We needed it before because the array could only ever hold
MAX_NAMED_TRANCHES requests.

-- 
nathan