Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance for 'bitmap' render method #680

Merged
merged 7 commits into from
Mar 12, 2025
Merged

Conversation

almarklein
Copy link
Member

@almarklein almarklein commented Mar 11, 2025

I did some experiments and this is a relatively easy way to improve performance by about 20%, going from ~24 to ~30 fps on fullscreen window on a 4k monitor on my M1 MacBook.

Tricks applied:

  • re-use the target texture -> let's not for memory concerns, also seems to not help much.
  • Reuse (and share) the buffer needed to copy the texture data to the CPU.
  • Avoid a data-copy when bytes-per-row is not a multiple of 256.
  • use numpy for copying does not seem to help much, and would introduce a new optional dependency.

We can go much faster, but for that we need to have our async stuff sorted out better.
More details here: pygfx/rendercanvas#40 (comment)

@almarklein almarklein requested a review from Korijn as a code owner March 11, 2025 13:27
@hmaarrfk
Copy link
Contributor

Is this "bitmap" method the current path for Wayland + Qt?

@almarklein
Copy link
Member Author

Is this "bitmap" method the current path for Wayland + Qt?

yes

@almarklein
Copy link
Member Author

mmm ... I'm a bit worried about what this means for memory.

Previously, if you had say 10 canvases, they'd be rendered to one by one, and after each draw, the texture is released, allowing the GPU to re-use that memory for e.g. the texture target for the next canvas. But now all canvases hold onto their texture and a buffer of the same size. 🤔 This easily happens in e.g. a notebook.

@almarklein
Copy link
Member Author

Solved this by using a shared copy-buffer. It looks like re-using the texture does not do much; re-using the temporary buffer was what contributed by far the most to the speed-up.

@hmaarrfk
Copy link
Contributor

e-using the temporary buffer was what contributed by far the most to the speed-up.

can you point to the one you are referring to in the diff (for my own learning)

@almarklein
Copy link
Member Author

This does not seem to clash with #673, except for a change in tests_mem/testutils.py which seems to be exactly the same.

@almarklein
Copy link
Member Author

almarklein commented Mar 12, 2025

I tested performance on Windows and Linux, and did not observe a difference. That's on a machine with integrated graphics. Will test on a Windows machine with a GPU later today.

edit: tested again, now with the same 4k monitor. The integrated graphics have trouble rendering that many pixels already ... it looks like maybe there is a tiny improvement from 18-ish to 19-ish fps, compared to 35 fps when rendering to screen.

@almarklein
Copy link
Member Author

Tested on a laptop with an nvidia gpu. Can observe a similar performance enhancement as for macos.

@almarklein almarklein enabled auto-merge (squash) March 12, 2025 15:32
@almarklein almarklein merged commit 1f86cd4 into main Mar 12, 2025
20 checks passed
@almarklein almarklein deleted the bitmap-permformance branch March 12, 2025 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants