Re: [PATCH] Add Windows support for backtrace_functions (MSVC only)
| От | Bryan Green | 
|---|---|
| Тема | Re: [PATCH] Add Windows support for backtrace_functions (MSVC only) | 
| Дата | |
| Msg-id | 31f43d88-9949-48c5-9497-b9a9b7cdee2b@gmail.com обсуждение исходный текст  | 
		
| Ответ на | Re: [PATCH] Add Windows support for backtrace_functions (MSVC only) (Bryan Green <dbryan.green@gmail.com>) | 
| Ответы | 
                	
            		Re: [PATCH] Add Windows support for backtrace_functions (MSVC only)
            		
            		 | 
		
| Список | pgsql-hackers | 
On 10/31/2025 2:07 PM, Bryan Green wrote:
> On 10/31/2025 1:46 PM, Euler Taveira wrote:
>> On Thu, Oct 30, 2025, at 11:51 AM, Bryan Green wrote:
>>> I had reservations about the value the tests were adding, and
>>> considering I am getting more concern around having the tests than not
>>> having them for this initial release I have decided to remove them.  v4
>>> patch is attached.  It is the same as the initial 0001-* patch.
>>>
>>
>> I spent some time on this patch. Here are some comments and suggestions.
>>
> 
> Thanks for the review.
> 
>> +#ifdef _MSC_VER
>> +#include <windows.h>
>> +#include <dbghelp.h>
>> +static bool win32_backtrace_symbols_initialized = false;
>> +static HANDLE win32_backtrace_process = NULL;
>> +#endif
>>
>> We usually a different style. Headers go to the top on the same 
>> section as
>> system headers and below postgres.h. It is generally kept sorted. The 
>> same
> 
> Will fix. I will rework the style (error and otherwise) to follow the 
> project tradition.
> 
>> applies to variables. Add them near the other static variables. BTW 
>> does it need
>> the win32_ prefix for a Window-only variable?
>>
>> +        wchar_t        buffer[sizeof(SYMBOL_INFOW) + MAX_SYM_NAME * 
>> sizeof(wchar_t)];
>> +        PSYMBOL_INFOW symbol;
>>
>> According to [1], SYMBOL_INFO is an alias that automatically selects 
>> ANSI vs
>> UNICODE. Shouldn't we use it instead of SYMBOL_INFOW?
>>
> 
> Good point. I was being overly explicit about wanting wide chars, but 
> you're right that the generic versions are the way to go.
> 
>> +                elog(WARNING, "SymInitialize failed with error %lu", 
>> error);
>>
>> Is there a reason to continue if SymInitialize failed? It should 
>> return after
> 
> None at all.  Will return immediately.
> 
>> printing the message. Per the error message style guide [2], my 
>> suggestion is
>> "could not initialize the symbol handler: error code %lu". You can 
>> also use
>> GetLastError() directly in the elog call.
>>
>> +        symbol = (PSYMBOL_INFOW) buffer;
>> +        symbol      ->MaxNameLen = MAX_SYM_NAME;
>> +        symbol      ->SizeOfStruct = sizeof(SYMBOL_INFOW);
>>
>> We generally don't add spaces between variable and a member.
>>
>> +        DWORD        i;
>>
>> I'm curious why did you declare this variable as DWORD? Shouldn't int be
>> sufficient? The CaptureStackBackTrace function returns an unsigned short
>> (UShort). You can also declare it in the for loop.
>>
> 
> Out of habit.  I will change it to int.
> 
>> +            DWORD64        address = (DWORD64) (stack[i]);
>>
>> The parenthesis around stack is superfluous. The code usually doesn't 
>> contain
>> additional parenthesis (unless it improves readability).
>>
> 
> I will remove the parenthesis.
> 
>> +        if (frames == 0)
>> +        {
>> +            appendStringInfoString(&errtrace, "\nNo stack frames 
>> captured");
>> +            edata->backtrace = errtrace.data;
>> +            return;
>> +        }
>>
>> It seems CaptureStackBackTrace function may return zero frames under 
>> certain
>> conditions. It is a good point having this additional message. I 
>> noticed that
>> the current code path (HAVE_BACKTRACE_SYMBOLS) doesn't have this 
>> block. IIUC,
>> in certain circumstances (ARM vs unwind-tables flag), the backtrace() 
>> also
>> returns zero frames. Should we add this block for the backtrace() code 
>> path?
>>
> 
> Probably, though that seems like separate cleanup. Want me to include it
> here or handle separately (in another patch)?
> 
>> +            sym_result = SymFromAddrW(win32_backtrace_process,
>> +                                      address,
>> +                                      &displacement,
>> +                                      symbol);
>>
>> You should use SymFromAddr, no? [3] I saw that you used the Unicode 
>> functions
>> instead of the generic functions [4].
>>
>> +                    /* Convert symbol name to UTF-8 */
>> +                    utf8_len = WideCharToMultiByte(CP_UTF8, 0, 
>> symbol->Name, -1,
>> +                                                   NULL, 0, NULL, NULL);
>> +                    if (utf8_len > 0)
>> +                    {
>> +                        char       *filename_utf8;
>> +                        int            filename_len;
>> +
>> +                        utf8_buffer = palloc(utf8_len);
>> +                        WideCharToMultiByte(CP_UTF8, 0, symbol->Name, 
>> -1,
>> +                                            utf8_buffer, utf8_len, 
>> NULL, NULL);
>>
>> +                    /* Convert symbol name to UTF-8 */
>> +                    utf8_len = WideCharToMultiByte(CP_UTF8, 0, 
>> symbol->Name, -1,
>> +                                                   NULL, 0, NULL, NULL);
>>
>> You are calling WideCharToMultiByte twice. The reason is to allocate 
>> the exact
>> memory size. However, you can adopt another logic to avoid the first 
>> call.
>>
>> maxlen = symbol->NameLen * pg_database_encoding_max_length();
>> symbol_name = palloc(maxlen + 1);
>>
>> (I suggest symbol_name instead of ut8_buffer.)
>>
>> You are considering only the UTF-8 case. Shouldn't it use wcstombs or
>> wcstombs_l? Maybe (re)use wchar2char -- see pg_locale_libc.c.
>>
> 
> Hmm, you're probably right. I was thinking these Windows API strings 
> needed special handling, but wchar2char should handle the conversion to 
> database encoding correctly. Let me test that approach.
> 
>> +                        if (filename_len > 0)
>> +                        {
>> +                            filename_utf8 = palloc(filename_len);
>> +                            WideCharToMultiByte(CP_UTF8, 0, 
>> line.FileName, -1,
>> +                                                filename_utf8, 
>> filename_len,
>> +                                                NULL, NULL);
>> +
>> +                            appendStringInfo(&errtrace,
>> +                                             "\n%s+0x%llx [%s:%lu]",
>> +                                             utf8_buffer,
>> +                                             (unsigned long long) 
>> displacement,
>> +                                             filename_utf8,
>> +                                             (unsigned long) 
>> line.LineNumber);
>> +
>> +                            pfree(filename_utf8);
>> +                        }
>> +                        else
>> +                        {
>> +                            appendStringInfo(&errtrace,
>> +                                             "\n%s+0x%llx [0x%llx]",
>> +                                             utf8_buffer,
>> +                                             (unsigned long long)
>> displacement,
>> +                                             (unsigned long long) 
>> address);
>> +                        }
>>
>> Maybe I missed something but is there a reason for not adding the 
>> address in the
>> first condition?
>>
> 
> No particular reason. I was trying to keep it concise when we have file/ 
> line info, but for consistency it probably should be there.
> 
> Will send v5 with these fixes.
> 
>>
>> [1] https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/ns- 
>> dbghelp-symbol_infow
>> [2] https://www.postgresql.org/docs/current/error-style-guide.html
>> [3] https://learn.microsoft.com/en-us/windows/win32/api/dbghelp/nf- 
>> dbghelp-symfromaddrw
>> [4] https://learn.microsoft.com/en-us/windows/win32/intl/conventions- 
>> for-function-prototypes
>>
>>
> 
> Thanks again for the review,
> Bryan
Hi all,
v5 patch attached, incorporating all of Euler's feedback with a caveat 
around unicode.
The most interesting aspect turned out to be the encoding conversion for
symbol names and file paths. Initially I tried using the generic 
SYMBOL_INFO and SymFromAddr functions as Euler suggested, but ran into a 
subtle issue: on PostgreSQL's Windows builds, these become SYMBOL_INFOA 
and SymFromAddrA (the ANSI versions), which return strings in whatever 
Windows ANSI codepage happens to be active (CP1252, etc). This doesn't 
necessarily match the database encoding.
When I tried converting these with wchar2char(), it failed because the
input wasn't actually wide characters - leading to backtraces showing 
only raw addresses even though symbols were present.
The solution was to use the explicit Unicode versions (SYMBOL_INFOW and
SymFromAddrW), which reliably return UTF-16 strings that wchar2char() 
can properly convert to the database encoding. This handles both UTF-8 
and non-UTF-8 databases correctly, and wchar2char() gracefully returns 
-1 on conversion failure rather than throwing errors during error 
handling. Of course this also necessitated using IMAGEHLP_LINEW64 and 
SymGetLineFromAddrW64.
Tested with both UTF-8 and WIN1252 databases - backtraces now show 
proper symbol names in both cases.
This patch also adds a check for zero frames returned by backtrace() on
Unix/Linux platforms, which can occur in certain circumstances such as 
ARM builds without unwind tables.
Bryan
		
	Вложения
В списке pgsql-hackers по дате отправления: