Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Impeller] cache descriptor set layouts. #164952

Merged
merged 8 commits into from
Mar 13, 2025

Conversation

jonahwilliams
Copy link
Member

@jonahwilliams jonahwilliams commented Mar 11, 2025

Currently we reset the entire descriptor pool(s) each frame.

We can actually reuse the allocated descriptor sets, as that is safe to do as soon as the cmd buffer that references them has been submitted (AFAIK). This provides a minor speed up for a large amount of small drawing ops, but not much beyond that.

@github-actions github-actions bot added engine flutter/engine repository. See also e: labels. e: impeller Impeller rendering backend issues and features requests labels Mar 11, 2025
@jonahwilliams jonahwilliams marked this pull request as ready for review March 11, 2025 15:46
@jonahwilliams jonahwilliams requested a review from gaaclarke March 11, 2025 20:49
Copy link
Member

@gaaclarke gaaclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, here's my high level feedback:

  1. This is adding a fair bit of complexity around code that relies on thread safety, have you measured the difference so we can justify the risk and added complexity?
  2. There are a few places where I wonder if they are thread-safe or not
  3. There is new functionality being added that could have test coverage.

std::shared_ptr<DescriptorPoolVK>
DescriptorPoolRecyclerVK::GetDescriptorPool() {
{
Lock recycled_lock(recycled_mutex_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure this is faster? We are introducing locks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were already locks. Before:

  1. Raster: Finish work, push to resource manager thread.
  2. Resource Manager: call reset, push to context (acquire lock).
  3. Raster: Pick recycled pool.

Now we just skip the resource manager part, as we don't reset the pool unless it was completely unused.

std::vector<vk::DescriptorSet> used;
};

using DescriptorCacheMap = std::unordered_map<uint64_t, DescriptorCache>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use absl::flat_hash_map

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 89 to 91
if (existing == descriptor_sets_.end()) {
descriptor_sets_[pipeline_key] = DescriptorCache{};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use try_emplace to avoid double lookups

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need some pointers here...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

descriptor_sets_.try_emplace(pipeline_key, DescriptorCache{});
result.first->used.push_back(set);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually lookup_result.first->second.used.push_back(set);

Its an iterator to an iterator. What a language.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no new tests being added for this functionality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test that shows the descriptors being recycled. Otherwise, everything not exploding should count for something I hope.

Comment on lines +549 to +550
descriptor_pool = (cached_descriptor_pool_[std::this_thread::get_id()] =
descriptor_pool_recycler_->GetDescriptorPool());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are doing a lookup based on what thread we are on. Can we expand that to include the recycler too so we don't have to introduce mutexes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change doesn't introduce any new mutexes

@jonahwilliams
Copy link
Member Author

Haven't addressed all feedback yet. RE: performance, yes - but let me pull some numbers

Recycling descriptor sets is in the general arm/adreno guidance if you want that too

@jonahwilliams
Copy link
Member Author

oops, push rejected ... sec...

@jonahwilliams
Copy link
Member Author

It looks like using absl::flat_hash_map regresses performance. maybe it has a bad default hash for a sequentially incremented int? So I switched back to unordered map.

Anyway. Running the qr.flutter example app:

Before:

image

After:

image

@jonahwilliams jonahwilliams requested a review from gaaclarke March 13, 2025 18:19
@gaaclarke
Copy link
Member

It looks like using absl::flat_hash_map regresses performance. maybe it has a bad default hash for a sequentially incremented int? So I switched back to unordered map.

That's surprising. Absl's recommendation is flat_hash_map by default and the benchmarks show it being faster for every usage.

Screenshot 2025-03-13 at 11 44 04 AM

@gaaclarke
Copy link
Member

Your profile that has flat_hash_map, may just be variance in your measurement. I'm pretty sure flat_hash_map is faster and more memory efficient.

Copy link
Member

@gaaclarke gaaclarke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo the try_emplace, usage of auto, and I recommend the flat_hash_map since that is faster, but the difference is so little it isn't showing up past the standard test variance. So, not a huge thing, up to you.

@jonahwilliams
Copy link
Member Author

The difference is pretty consistently faster for me. might be usage based.

@jonahwilliams jonahwilliams added the autosubmit Merge PR when tree becomes green via auto submit App label Mar 13, 2025
@auto-submit auto-submit bot added this pull request to the merge queue Mar 13, 2025
Merged via the queue into flutter:master with commit 7101095 Mar 13, 2025
171 checks passed
@flutter-dashboard flutter-dashboard bot removed the autosubmit Merge PR when tree becomes green via auto submit App label Mar 13, 2025
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Mar 14, 2025
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Mar 14, 2025
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Mar 14, 2025
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Mar 15, 2025
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Mar 15, 2025
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Mar 16, 2025
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Mar 16, 2025
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Mar 16, 2025
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Mar 17, 2025
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
e: impeller Impeller rendering backend issues and features requests engine flutter/engine repository. See also e: labels.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants