Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add trie level metadata to table blocks and trie catalog #4240

Merged
merged 1 commit into from
Mar 11, 2025

Conversation

FiV0
Copy link
Member

@FiV0 FiV0 commented Mar 7, 2025

This addresses #4174. The new metadata is not (yet) used anywhere for pruning.

I am also moving a bunch of bloom logic to Kotlin.

Todo:

  • figure out correct iid bloom size for larger files
  • recency? wasn't sure if recency still plays a role in the new plans.
    - Add recency metadata to table-block files #4257
  • trie metadata test for compactor?
  • move kotlin work directly to main

@FiV0 FiV0 linked an issue Mar 7, 2025 that may be closed by this pull request
@FiV0 FiV0 added this to the 2.0.0-beta7 milestone Mar 7, 2025
@FiV0 FiV0 self-assigned this Mar 7, 2025
@FiV0 FiV0 requested a review from jarohen March 10, 2025 07:34
Copy link
Member

@jarohen jarohen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good PR, @FiV0, thanks 🙏

  • next time, can you land the bloom change separately - if need be, stash/commit what you have
  • a few Kotlin idiom suggestions in here - happy to pair up and walk through if that'd help

@@ -26,6 +28,16 @@ class LiveTable
) : AutoCloseable {

private val trieWriter = TrieWriter(al, bp, false)
private val trieMetadataBuilder = TrieMetadata.newBuilder()
.setMinValidFrom(MAX_LONG)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be able to default these in the protobuf instead...? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it in the proto for now. The important bit to keep in mind is that if you build a TrieMetadata without calculating anything on it, the contained data doesn't make much sense.

table-cat/<-table-block
:current-tries
(mapv added-trie->edn ))))
(let [current-tries (->> (.getByteArray bp (util/->path "tables/public$foo/blocks/b01.binpb"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(out of scope of this PR) these are the tests that I'd like us to move to Kotlin too, given the code it's testing is. I understand we don't yet have the same level of fixtures there that we do in Clojure (which is why I have also done the same as you here), so when we next come up for air let's figure out what we need to make this easier

@FiV0 FiV0 force-pushed the block-trie-metadata branch 3 times, most recently from d981abb to dedb5d9 Compare March 11, 2025 17:49
@FiV0 FiV0 force-pushed the block-trie-metadata branch from dedb5d9 to b91c0dc Compare March 11, 2025 17:52
@FiV0 FiV0 merged commit e060e7e into xtdb:main Mar 11, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include trie-level metadata in the trie-list
2 participants