Don't crash when walking callstack roots in the debugger#124402
Don't crash when walking callstack roots in the debugger#124402leculver wants to merge 1 commit intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @agocke |
There was a problem hiding this comment.
Pull request overview
This pull request fixes a critical crash that occurs when debuggers walk callstack roots on Linux amd64 coredumps. The root cause is that context pointers in DAC (Data Access Component) builds resolve through a cache that can evict entries before the pointers are consumed, leading to invalid memory access.
Changes:
- Fixed AMD64 FaultingExceptionFrame to use local context pointers in DAC builds instead of frame member references
- Extended SoftwareExceptionFrame context pointer fix from X86-only to all architectures
- Added defensive checks in GC info decoder to use captured register values in DAC scenarios across all architectures
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/coreclr/vm/amd64/cgenamd64.cpp | Added DACCESS_COMPILE guards in FaultingExceptionFrame::UpdateRegDisplay_Impl to point context pointers at local pCurrentContext copy instead of m_ctx members |
| src/coreclr/vm/excep.cpp | Expanded DACCESS_COMPILE context pointer fix from TARGET_X86 to all architectures in SoftwareExceptionFrame::UpdateRegDisplay_Impl |
| src/coreclr/vm/gcinfodecoder.cpp | Added DACCESS_COMPILE && TARGET_UNIX guards to use GetCapturedRegister instead of GetRegisterSlot in ReportRegisterToGC and GetStackSlot across all architectures |
src/coreclr/vm/gcinfodecoder.cpp
Outdated
|
|
||
| OBJECTREF* pObjRef = GetRegisterSlot( regNum, pRD ); | ||
| OBJECTREF* pObjRef; | ||
| #if defined(DACCESS_COMPILE) && defined(TARGET_UNIX) |
There was a problem hiding this comment.
GetCapturedRegister is only defined under TARGET_UNIX && !FEATURE_NATIVEAOT. I can probably change that if you want me to expand its availability. I'm not deeply familiar with this area of the codebase anymore, so I was just trying to follow conventions.
There was a problem hiding this comment.
@janvorli Something like this I think? I updated the PR to widen that function to all platforms.
There was a problem hiding this comment.
GetCapturedRegister was a workaround for HelperMethodFrames. We got rid of HelperMethodFrames. GetCapturedRegister is dead code at the moment that can be deleted (or kept under DECODE_OLD_FORMATS to make it clear that is not used anymore).
Does the debugger fix require GetCapturedRegister part of the change?
There was a problem hiding this comment.
Ahh that's great! No it doesn't. Let me strip that out of here then, no need to complicate dead code. I'll retest without this to make sure.
There was a problem hiding this comment.
Ok, I've removed that side of the code. It was only an extra check here, the current changeset fixes the issue.
|
Looking at the FaultingExceptionFrame::UpdateRegDisplay_Impl variants, it seems the other architectures would suffer from the same issue. |
|
I'll take a look and keep working on it when I'm back in office Tuesday morning. |
Changeset 1fa1745 introduced a stack walking regression where we will crash trying to unwind/report registers. The real fix here is in cgenamd64.cpp/excep.cpp, which will point the context to a local copy before we try to use it.
I've also added some changes in gcinfodecoder.cpp that aren't strictly necessary but would have avoided the problem to begin with.
Fixes #124401.