Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use bitwise operations to get the absolute values of integers #113679

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Corniel
Copy link

@Corniel Corniel commented Mar 19, 2025

Helps the JIT resolving it by applying CMOV.

Closes #113651

Helps the JIT resolving it as CMOV
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 19, 2025
@huoyaoyuan
Copy link
Member

I'd suggest to hold on before more conclusion is made by area owners of System.Numerics and JIT in the original issue.

For smaller types (short and sbyte), the performance characteristics can be drastically different because they must be widen to 32bit when doing arithmetic or bitwise operations.

@jakobbotsch
Copy link
Member

Your benchmarks in #113651 measure the cost of branch misprediction primarily since your picked input data makes the branch completely unpredictable. This is worst case scenario for the branch based version. In practice branches are typically easy to predict. What do your results look like if the branch is predictable? (e.g. make all array elements positive or negative, or make it a more predictable mix).

I think we would rather want to write this in a way that makes it amenable to if-conversion (such that cmov/csel get emitted on x64/arm64), and then work on #96380 to improve the JIT's predictions around if generating cmov/csel is beneficial or not. Changing control flow dependencies to data dependencies inside loops (which this PR or a transformation to cmov does) can result in large regressions in hot loops (e.g. #82106 (comment)).

@Corniel
Copy link
Author

Corniel commented Mar 19, 2025

Your benchmarks in #113651 measure the cost of branch misprediction primarily since your picked input data makes the branch completely unpredictable. This is worst case scenario for the branch based version. In practice branches are typically easy to predict. What do your results look like if the branch is predictable? (e.g. make all array elements positive or negative, or make it a more predictable mix).

I think we would rather want to write this in a way that makes it amenable to if-conversion (such that cmov/csel get emitted on x64/arm64), and then work on #96380 to improve the JIT's predictions around if generating cmov/csel is beneficial or not. Changing control flow dependencies to data dependencies inside loops (which this PR or a transformation to cmov does) can result in large regressions in hot loops (e.g. #82106 (comment)).

Although I'm not sure if the branch prediction is that good in regular cases (that might be), I fully agree with your (and @huoyaoyuan ) suggestion to address this the JIT should be able to use cmov/csel when available.

}
return value;

return unchecked((short)((value + (value >>= 15)) ^ value));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really be broken apart, to make it clearer and potentially have an explanatory comment on how it works (sign is 0 for positive and 1 for negative, thus it computes the original value for positive and ~value + 1 for negative, which is to say -value):

int sign = value >> ((sizeof(T) * 8) - 1);
return unchecked((T)((value + sign) ^ sign));

However, since we need a branch regardless we're basically looking at 2 branches vs 3 instructions + 1 branch

I agree with @jakobbotsch that it would be better to improve the conditional select logic first and foremost. Reducing it down to one branch in typical cases would also be goodness, but ideally we could have that participate with PGO/DPGO so that we don't pessimize it unnecessarily.

Ideally we could generate something like:

mov    eax, ecx
neg    ecx
cmovns eax, ecx
js     THROW
ret

Which would require some minor optimization work and flag reuse on top of (avoiding the redundant test reg, reg its currently generating):

int result = x;
        
if (result < 0)
{
    result = -x;
}

if (result < 0)
{
    Throw();
}

return result;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding Should be the reasoning behind such changes be documented in the code? In this case all the logic written down is to help the JIT doing JIT things,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the code isn't "obvious" or the reasoning for a change isn't necessarily clear, yes. This helps future readers understand and can help avoid accidental regressions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Numerics community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speed improvements for Maths.Abs() on integers
4 participants