-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Impeller] cache descriptor set layouts. #164952
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, here's my high level feedback:
- This is adding a fair bit of complexity around code that relies on thread safety, have you measured the difference so we can justify the risk and added complexity?
- There are a few places where I wonder if they are thread-safe or not
- There is new functionality being added that could have test coverage.
std::shared_ptr<DescriptorPoolVK> | ||
DescriptorPoolRecyclerVK::GetDescriptorPool() { | ||
{ | ||
Lock recycled_lock(recycled_mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure this is faster? We are introducing locks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There were already locks. Before:
- Raster: Finish work, push to resource manager thread.
- Resource Manager: call reset, push to context (acquire lock).
- Raster: Pick recycled pool.
Now we just skip the resource manager part, as we don't reset the pool unless it was completely unused.
engine/src/flutter/impeller/renderer/backend/vulkan/compute_pipeline_vk.h
Outdated
Show resolved
Hide resolved
std::vector<vk::DescriptorSet> used; | ||
}; | ||
|
||
using DescriptorCacheMap = std::unordered_map<uint64_t, DescriptorCache>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use absl::flat_hash_map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (existing == descriptor_sets_.end()) { | ||
descriptor_sets_[pipeline_key] = DescriptorCache{}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use try_emplace
to avoid double lookups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need some pointers here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
descriptor_sets_.try_emplace(pipeline_key, DescriptorCache{});
result.first->used.push_back(set);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually lookup_result.first->second.used.push_back(set);
Its an iterator to an iterator. What a language.
engine/src/flutter/impeller/renderer/backend/vulkan/descriptor_pool_vk.cc
Outdated
Show resolved
Hide resolved
engine/src/flutter/impeller/renderer/backend/vulkan/descriptor_pool_vk_unittests.cc
Show resolved
Hide resolved
engine/src/flutter/impeller/renderer/backend/vulkan/pipeline_library_vk.cc
Outdated
Show resolved
Hide resolved
engine/src/flutter/impeller/renderer/backend/vulkan/pipeline_library_vk.h
Outdated
Show resolved
Hide resolved
engine/src/flutter/impeller/renderer/backend/vulkan/pipeline_vk.h
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no new tests being added for this functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test that shows the descriptors being recycled. Otherwise, everything not exploding should count for something I hope.
descriptor_pool = (cached_descriptor_pool_[std::this_thread::get_id()] = | ||
descriptor_pool_recycler_->GetDescriptorPool()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are doing a lookup based on what thread we are on. Can we expand that to include the recycler too so we don't have to introduce mutexes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change doesn't introduce any new mutexes
Haven't addressed all feedback yet. RE: performance, yes - but let me pull some numbers Recycling descriptor sets is in the general arm/adreno guidance if you want that too |
oops, push rejected ... sec... |
That's surprising. Absl's recommendation is flat_hash_map by default and the benchmarks show it being faster for every usage. |
Your profile that has flat_hash_map, may just be variance in your measurement. I'm pretty sure flat_hash_map is faster and more memory efficient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo the try_emplace
, usage of auto
, and I recommend the flat_hash_map since that is faster, but the difference is so little it isn't showing up past the standard test variance. So, not a huge thing, up to you.
The difference is pretty consistently faster for me. might be usage based. |
Currently we reset the entire descriptor pool(s) each frame.
We can actually reuse the allocated descriptor sets, as that is safe to do as soon as the cmd buffer that references them has been submitted (AFAIK). This provides a minor speed up for a large amount of small drawing ops, but not much beyond that.