Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: vote state view #4834

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

jstarry
Copy link

@jstarry jstarry commented Feb 6, 2025

Problem

The stakes cache stores copies of vote account data so that things like consensus, inflation rewards, and vote timestamp calculation can access the vote account state for all staked validators. Every time when a vote is processed, the vote account entry in the stakes cache is updated and the account data is deserialized and ~1KB is allocated to store the new state. This is all unnecessary because the vote state data the protocol needs can be read on demand directly from the account data byte vector.

Summary of Changes

  • Introduce new VoteStateView type which stores a copy of the vote account's Arc<Vec<u8>> and field offset data to speed up vote state field lookups.

Fixes #

Copy link

mergify bot commented Feb 6, 2025

If this PR represents a change to the public RPC API:

  1. Make sure it includes a complementary update to rpc-client/ (example)
  2. Open a follow-up PR to update the JavaScript client @solana/web3.js (example)

Thank you for keeping the RPC clients in sync with the server API @jstarry.

Copy link

mergify bot commented Feb 6, 2025

The Firedancer team maintains a line-for-line reimplementation of the
native programs, and until native programs are moved to BPF, those
implementations must exactly match their Agave counterparts.
If this PR represents a change to a native program implementation (not
tests), please include a reviewer from the Firedancer team. And please
keep refactors to a minimum.

@jstarry jstarry force-pushed the refactor/vote-state-view branch 4 times, most recently from 387de2a to d276a8c Compare February 6, 2025 19:44
@jstarry
Copy link
Author

jstarry commented Feb 6, 2025

Before

test bench_vote_account_try_from ... bench:         321.63 ns/iter (+/- 5.15)

After

test bench_vote_account_try_from ... bench:          59.46 ns/iter (+/- 0.18)

@jstarry jstarry force-pushed the refactor/vote-state-view branch 7 times, most recently from f63e39b to 77728d5 Compare February 8, 2025 08:58
@jstarry jstarry force-pushed the refactor/vote-state-view branch from 77728d5 to 55942e3 Compare February 8, 2025 09:35
@jstarry jstarry force-pushed the refactor/vote-state-view branch 4 times, most recently from 864d673 to 2480abe Compare February 13, 2025 18:17
@jstarry jstarry force-pushed the refactor/vote-state-view branch from 2480abe to 87ec58d Compare February 13, 2025 18:30
@jstarry jstarry force-pushed the refactor/vote-state-view branch 3 times, most recently from e242f23 to 46c1c98 Compare February 27, 2025 02:37
@jstarry jstarry marked this pull request as ready for review February 27, 2025 02:38
@apfitzge
Copy link

Linking some related PRs so I do not need to find them again:

Copy link

@apfitzge apfitzge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting first round of review comments. Will still need to go over it again, but overall the structure looks to be good to me.

One missing thing is documentation on the unsafe blocks - we should always aim to document those clarifying why we meet the criteria for the opreation to be safe.

}
}

pub(super) struct VotesListFrame {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This structure seems a bit odd to me since it represents two lists of different data types, depending on the version.

V3 has LandedVote which internally just has a latency and Lockout, but afaict there's no way to iterate over the LandedVote with the latency in there, only the Lockouts.

Do we never need the latency?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't ever need to check latency in the runtime, it's only used by the vote program to accumulate credits. I just updated the PR to use an enum over the different types of votes lists. Lmk what you think!

@jstarry jstarry force-pushed the refactor/vote-state-view branch from c04b8b9 to 7d35c4b Compare March 1, 2025 05:37
};

pub(super) trait ListFrame<'frame> {
type Item: 'frame;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the constraint here do?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it helped the ListViewIter implicitly determine that ListFrame::Item lived as long as Iterator::Item. I think it's cleaner to just add the constraint directly to the iterator impl so I updated that here: b3be0f4

fn item_size(&self) -> usize {
core::mem::size_of::<Self::Item>()
}
unsafe fn read_item<'a>(&self, item_data: &'a [u8]) -> &'a Self::Item {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add some documentation to this function about what sorts of implementations are safe vs not, and do the same for any non-defaut implementation.

The safety of this ptr -> reference conversion is documented here.

Think we should have a comment along the lines of:

    /// This function is safe under the following conditions:
    /// SAFETY:
    /// - `Self::Item` is alignment 1
    /// - The passed slice is large enough for the type `Self::Item`
    /// - `Self::Item` is valid for any sequence of bytes
    unsafe fn read_item<'a>(&self, item_data: &'a [u8]) -> &'a Self::Item {
		// SAFETY:
        // - Function safety documentation conditions
		// - The cast pointer is not null because `item_data` is a valid slice
		// - The reference created is immutable, just like the slice, and thus does not violate aliasing rules
        &*(item_data.as_ptr() as *const Self::Item)
    }

^ following above comments, it seems reasonable to add a const assert on alignments.
We cannot add this into the fn unfortunately, but we could add a line for each of the types we're actually reading?

	// Verify at compile time there are no alignment constraints.
	const _: () = assert!(
	    core::mem::align_of::<Self::Item>() == 1,
	    "Item alignment"
	);

They don't technically need to be alignment 1, but afaict, there is no guarantee of alignment on our serialized slices so it would be the simplest assert.

Following these requirements, I'm actually pretty sure that the PriorVotersItem is not strictly safe to read, since it's alignment is 8, but we have no alignment guarantees, to my knowledge.
We should probably do the same [u8; 8] style of definition with accessors for that type as you have for the other type definitions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added const assertions and improved safety comments here: a3d48ac

Copy link

@alessandrod alessandrod Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: read makes me think that it makes a copy (like ptr::read), call it
item_ref or something?

we use read everywhere probably not worth the effort of renaming everything

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting rust behavior wrt to the constant evaluation and why the original assert! doesn't work but const_assert! does: rust-lang/reference#1120 (comment)

^ in summary, consts in traits (unlike free consts) are not evaluated unless used. And since we don't use them, we don't actually run the assert.

const_assert! gets around it in an interesting way:

    const ASSERT_ITEM_ALIGNMENT: () = {
        const _: [(); 0 - !{
            const ASSERT: bool = core::mem::align_of::<LockoutItem>() == 1;
            ASSERT
        } as usize] = [];
    };

the assert is used a length, and when true makes the assignment of the actual outer constant invalid if the assertion evaluates to false.

}

#[repr(C)]
pub(super) struct PriorVotersItem {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one. see above safety comment on read_item

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, just noticed this one as well, thanks for calling it out

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here: a3d48ac

alessandrod
alessandrod previously approved these changes Mar 6, 2025
Copy link

@alessandrod alessandrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super solid PR 🙌

Don't really have much to add, just a couple of super minor nits.

We should make sure that we run at least a subset of the tests under miri (although I'm getting a dejavu - we might have already set that up last time with the read_into changes?)

self.root_slot_view().root_slot()
}

pub fn get_authorized_voter(&self, epoch: Epoch) -> Option<&Pubkey> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop get_ prefix?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of like a map lookup so I think the get prefix makes sense on this one

}
}

pub(super) fn get_field_offset(&self, field: Field) -> usize {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: drop get_ here too?

self.authorized_voters_view().get_authorized_voter(epoch)
}

pub fn num_epoch_credits(&self) -> usize {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this doesn't seem to have test coverage

fn item_size(&self) -> usize {
core::mem::size_of::<Self::Item>()
}
unsafe fn read_item<'a>(&self, item_data: &'a [u8]) -> &'a Self::Item {
Copy link

@alessandrod alessandrod Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: read makes me think that it makes a copy (like ptr::read), call it
item_ref or something?

we use read everywhere probably not worth the effort of renaming everything

}
}

#[cfg(test)]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have concern on the method of testing used here.
You test with random bytes; If CI fails, how can we recreate the case?

There doesn't appear to be any tests for invalid data afaict? Shouldn't we be testing cases where we do not have enough bytes for the structure?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More test coverage added in: a93c8ba

@jstarry
Copy link
Author

jstarry commented Mar 10, 2025

We should make sure that we run at least a subset of the tests under miri (although I'm getting a dejavu - we might have already set that up last time with the read_into changes?)

Yeah good call, added here: 742c757 (we previously chatted about this in this abandoned PR: #2751)

@jstarry jstarry force-pushed the refactor/vote-state-view branch from a93c8ba to b594d26 Compare March 10, 2025 17:22
@jstarry
Copy link
Author

jstarry commented Mar 10, 2025

rebased to pick up rust audit changes to pass CI

@jstarry jstarry force-pushed the refactor/vote-state-view branch from e143323 to 2476639 Compare March 12, 2025 14:32
@jstarry
Copy link
Author

jstarry commented Mar 12, 2025

rebased again to fix conflict

/// deserializing it. This is done by parsing and caching metadata
/// about the layout of the serialized VoteState.
#[derive(Debug, Clone)]
#[cfg_attr(feature = "frozen-abi", derive(AbiExample))]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need frozen-abi?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that this struct is now part of VoteAccountInner and afaik there's no way to skip a field for the AbiExample derivation

@apfitzge apfitzge self-requested a review March 17, 2025 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants