Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix foreign exception handling on Windows #113323

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

janvorli
Copy link
Member

@janvorli janvorli commented Mar 10, 2025

There is a problem in a special case with the new
exception handling:

  • A managed code calls a 3rd party native C++ code
  • then a C++ exception is thrown in that code
  • then it is propagated through managed code
  • then it is caught in a 3rd party native C++ code
  • then a managed callback is called from the catch handler
  • then the exception is rethrown using C++ "throw"
  • then it propagates back to the initial managed caller

The problem is as follows:

  • When a native exception passes through managed frames and it is rethrown once a native frame is encountered, the RaiseException doesn't use the original exception's EXCEPTION_RECORD, but creates a new CLR exception instead.
  • When a managed callback called from the native C++ catch handler throws and catches another exception, the LastThrownObject is cleared.
  • When the C++ code in the catch handler rethrows the exception and it is propagated to managed code, the ProcessCLRException personality routine is invoked for the first managed frame. It can see that the exception is a CLR exception, but the LastThrownObject is NULL. It asserts in debug builds and crashes a bit later in release builds attempting to dereference the NULL.

The fix is to pass the original EXCEPTION_RECORD to the RaiseException in case an external exception is propagated. And use just RaiseException instead of the machinery for regular managed exceptions.

This is yet another case of a fix for backward compatibility w.r.t. the legacy exception handling.

The issue was discovered in Autodesk Revit 2025 that does this kind of exception propagation.

Close #113158

There is a problem in a special case with the new
exception handling:

* A managed code calls a 3rd party native C++ code
* then a C++ exception is thrown in that code
* then it is propagated through managed code
* then it is caught in a 3rd party native C++ code
* then a managed callback is called from the catch handler
* then the exception is rethrown using C++ "throw"
* then it propagates back to the initial managed caller

The problem is as follows:
* When a native exception passes through managed frames and it is
  rethrown once a native frame is encountered, the RaiseException
  doesn't use the original exception's `EXCEPTION_RECORD`, but creates a
  new CLR exception instead.
* When a managed callback called from the native C++ catch handler throws
  and catches another exception, the LastThrownObject is cleared.
* When the C++ code in the catch handler rethrows the exception and it
  is propagated to managed code, the ProcessCLRException personality
  routine is invoked for the first managed frame. It can see that the
  exception is a CLR exception, but the LastThrownObject is NULL. It
  asserts in debug builds and crashes a bit later in release builds
  attempting to dereference the NULL.

The fix is to pass the original `EXCEPTION_RECORD` to the `RaiseException`
in case an external exception is propagated. And use just
`RaiseException` instead of the machinery for regular managed exceptions.

The issue was discovered in Autodesk Revit 2025 that does this kind of
exception propagation.

Close dotnet#113158
@janvorli janvorli requested a review from jkotas March 10, 2025 15:35
@janvorli janvorli self-assigned this Mar 10, 2025
@Copilot Copilot bot review requested due to automatic review settings March 10, 2025 15:35

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR addresses issues with foreign exception handling on Windows by preserving the original EXCEPTION_RECORD during propagation through managed and native frames.

  • Introduces a new test to simulate the exception propagation scenario between managed and native code.
  • Validates the rethrow behavior and exception logging for backward compatibility with legacy exception handling.

Reviewed Changes

File Description
src/tests/baseservices/exceptions/PropagateThroughNative/PropagateThroughNative.cs New test file to verify proper exception propagation across managed/native boundaries

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

src/tests/baseservices/exceptions/PropagateThroughNative/PropagateThroughNative.cs:38

  • Consider adding assertions to verify the expected exception propagation behavior instead of relying solely on console output. This can help ensure that the test fails when the behavior deviates from expectations.
[Fact]
@janvorli
Copy link
Member Author

cc: @mangod9, @jeffschwMSFT

@jkotas
Copy link
Member

jkotas commented Mar 10, 2025

This is yet another case of best effort for backward compatibility w.r.t. the legacy exception handling.

Does our documentation have a mention of whatever the problematic pattern is here?

Similar to what we say for setjmp/longjmp: https://learn.microsoft.com/en-us/dotnet/standard/native-interop/exceptions-interoperability#setjmplongjmp-behaviors

@janvorli
Copy link
Member Author

@jkotas looking that that doc, it seems we still consider native exception interop as something supported. I have thought we've discussed that in the past as something legacy, guiding people to not to propagate exceptions over the native / managed boundaries.
As for this change, I would be comfortable pronouncing it "supported". Maybe I should not have said "best effort" here. What I meant is that when handling external exceptions, we don't really know what they store in the ExceptionInformation field. So I can imagine there could be some non-C++ exceptions that e.g. contain pointers to the already unwound part of the stack or whatever.

@janvorli
Copy link
Member Author

@jkotas one more thing. While investigating the issue, I have found a similar case that crashes with both the legacy and new EH. The scenario is almost the same except that instead of the C++ exception there is a managed exception propagated through the same kind of frames. The crash is again due to the LastThrownObject being cleared by the secondary exception handling. I have created this PR to fix just the case with the native exception that the customer has hit so that we can backport it to .NET 9. But if we consider these scenarios as something we want to support, then it would make sense to fix and backport both.
The fix for the managed one would be just to modify the ExceptionTracker::CreateThrowable to fall through into what's now the else branch in case the LastThrownObject is NULL

@mangod9
Copy link
Member

mangod9 commented Mar 10, 2025

If it's a simple enough fix guess we should fix that case too. Agree to only backport the regression.

@jkotas
Copy link
Member

jkotas commented Mar 10, 2025

it seems we still consider native exception interop as something supported.

Right, we try to discourage it, but it is supported on Windows.

Maybe I should not have said "best effort" here.

Right.

we can backport it to .NET 9

This issue was reported by our compat lab on .NET 10. Backport would make sense only if it is customer impacting on .NET 9. Do we have any reports?

@janvorli
Copy link
Member Author

I have found a problem with the fix when I've migrated my test to the exception interop test suite as @jkoritzinsky has suggested. The problem is that when the throw throws an exception by value, the exception is allocated in the throwing frame. But when the propagate exception back to native frames, we physically unwind the stack and throw from just below the native frame. That means that the exception object location is now in an unwound part of the stack and it is not valid anymore. The original test I had was working ok, as I was throwing just an int and its slot was not overwritten. But after moving the the exception interop tests, I've changed my test to throw and catch the std::exception. And the caught exception was corrupted, leading to a crash when I've tried to log std::exception::what() output.
The only solution to this issue seems to be to not to unwind the stack before rethrowing the external extension and then preventing the exception handling from processing the personality routine calls for the managed code below the frame to which we currently unwind. It is kind of ugly, but I don't see other solution to that.

@janvorli
Copy link
Member Author

Fortunately, I have found another better way to fix the problem with the physical stack unwinding. C++ exception handling uses the same thing. The RtlRestoreContext has a special mode. When passed EXCEPTION_RECORD with ExceptionCode set to STATUS_UNWIND_CONSOLIDATE, it calls a personality routine specified in the ExceptionInformation[0] with a machine frame pushed for the unwinder. That way, the frames to be unwound are still there and the unwinder just doesn't see them as the machine frame allows the unwinder to skip to the target context. So there is no problem with the C++ exception object becoming overwritten.

@janvorli
Copy link
Member Author

I've added a commit with a fix for the problem with C++ exception object being in a reclaimed part of stack.

@janvorli
Copy link
Member Author

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@janvorli
Copy link
Member Author

/azp run runtime-coreclr outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@janvorli janvorli closed this Mar 20, 2025
@janvorli janvorli reopened this Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[dotnet-sdk-10.0.100-preview.2.25126.21] Autodesk Revit 2025 pop up error report when opening a ".rvt" file.
4 participants