Skip to content

feature(smp): improve hart booting, TLS setup and per-core stack initialization #1678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mrm-develop
Copy link
Contributor

This pull request improves the RISC-V boot process, multi-core startup, and stack/TLS setup for better stability, performance, and future extensibility.

Changes:

In mod.rs, the function boot_next_processor now includes detailed logging to show the current hart mask and which hart will be started next. It also checks the return value of sbi_rt::hart_start to log any startup errors. The memory ordering for atomic variables like CURRENT_STACK_ADDRESS and CPU_ONLINE was adjusted for safer multi-core synchronization.

In start.rs, a check was added to skip invalid secondary harts early. A new PerCpuData struct was introduced, aligned to 64 bytes to match cache lines. This structure allows each core to track its state individually. The boot hart now correctly sets up its TLS pointer (tp) using inline assembly if TLS info is available. The hart mask is calculated using FDT parsing, with each CPU's status checked to ensure it is enabled before adding it to the mask.

In scheduler.rs, the TLS setup now uses the alignment field provided in tls_info. Memory is allocated with proper alignment and zero-initialized. Stacks are allocated with extra pages to separate kernel, user, and interrupt stacks. The kernel stack is marked with 0xdeadbeef at the top to help debugging stack overflows. The TaskStacks type now properly handles cloning and deallocation. This also ensures safe memory reuse and cleanup when tasks end.

@mkroening mkroening self-assigned this Apr 9, 2025
@mkroening mkroening self-requested a review April 9, 2025 07:47
Copy link
Member

@mkroening mkroening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Sorry for the late review!

I have left a few comments.

Generally, I think using Orderin::Relaxed in the cases where you changed it, would be fine. Why do you think, we need to change it? We don't synchronize other memory accesses with it.

If you need assistance with git or GitHub regarding interactive rebasing, feel free to reach out! :)

sbi_rt::hart_start(next_hart_id as usize, start::_start as usize, 0).unwrap();
let result = sbi_rt::hart_start(next_hart_id as usize, start::_start as usize, 0);
if result.error == 0 {
info!("HART {next_hart_id} start requested");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this before the call to hart_start? ("Starting HART {next_hart_id}")?

let next_hart_index = lsb(new_hart_mask);

if let Some(next_hart_id) = next_hart_index {
info!("Preparing to start HART {next_hart_id}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug! would be sufficient here, right?

@@ -184,13 +184,16 @@ fn finish_processor_init() {
pub fn boot_next_processor() {
let new_hart_mask = HART_MASK.load(Ordering::Relaxed);

info!("Current HART_MASK: 0x{new_hart_mask:x}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug! would be sufficient here, right?

Comment on lines +212 to +215
error!(
"Failed to start HART {next_hart_id}: error code {}",
result.error
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test this execution path? I am not sure if Hermit would continue to boot properly if one of the expected harts would not start. If this is untested, an unwrap like before might be preferrable.

Comment on lines +12 to +37
const MAX_CORES: usize = 32;

// Cache-line aligned CPU-local data
#[repr(align(64))]
struct PerCpuData {
is_initialized: AtomicBool,
local_counter: AtomicU64,
#[allow(dead_code)]
padding: [u8; 48], // Fill to full cache line
}

impl PerCpuData {
const fn new() -> Self {
Self {
is_initialized: AtomicBool::new(false),
local_counter: AtomicU64::new(0),
padding: [0; 48],
}
}
}

#[allow(clippy::declare_interior_mutable_const)]
static CPU_DATA: [PerCpuData; MAX_CORES] = {
const CPU_LOCAL: PerCpuData = PerCpuData::new();
[CPU_LOCAL; MAX_CORES]
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we need this? Can we remove this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you split the TaskTLS changes into another PR that we can merge seperately?

Comment on lines +93 to +102
// Optimized HART_MASK calculation
let mut hart_mask = 0u64;
for cpu in fdt.cpus() {
if let Some(cpu_id) = cpu.property("reg").and_then(|p| p.as_usize()) {
if cpu
.property("status")
.and_then(|p| p.as_str())
.is_some_and(|s| s != "disabled\u{0}")
{
hart_mask |= 1 << cpu_id;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this optimized? Isn't this the same code as before but in a more functional style? I honestly found the code more readable before.

Comment on lines +75 to +82
// Optimized Hart-ID validation
if CPU_ONLINE.load(Ordering::Acquire) > 0 {
// Faster check for Secondary-HARTs
if (HART_MASK.load(Ordering::Relaxed) & (1 << hart_id)) == 0 {
error!("Invalid hart ID: {hart_id}");
processor::halt();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does optimized mean here? And what do we check for? And why would we want to halt the processor in that case?


if CPU_ONLINE.load(Ordering::Acquire) == 0 {
unsafe {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put the refactoring of the reduction of unsafe scope in a separate commit or PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants