I don’t know how I forgot to post this library and write-up on HackerNews. It’s an unusual thread-pool implementation that focuses entirely on simultaneous multithreading, skipping task queues, futures, and asynchrony. That makes workload distribution much cheaper than in Taskflow (C++) or Rayon (Rust), while getting closer to OpenMP in raw performance.
The design avoids memory allocations, focuses on lock-free and CAS-heavy atomics, and leans on modern hardware instructions for busy-waiting and NUMA-friendly execution. It still needs polishing, and I’m always open to feedback on how to push it further.
I don’t know how I forgot to post this library and write-up on HackerNews. It’s an unusual thread-pool implementation that focuses entirely on simultaneous multithreading, skipping task queues, futures, and asynchrony. That makes workload distribution much cheaper than in Taskflow (C++) or Rayon (Rust), while getting closer to OpenMP in raw performance.
The design avoids memory allocations, focuses on lock-free and CAS-heavy atomics, and leans on modern hardware instructions for busy-waiting and NUMA-friendly execution. It still needs polishing, and I’m always open to feedback on how to push it further.