Osprey compiles through LLVM to a native binary, so the fair question is how it sits against other native-compiled languages. This page measures CPU time and peak memory against Rust, C, OCaml, and Haskell on classic compute benchmarks — the same naive algorithm, the same parameters, in every language.
The tables below are generated mechanically from the benchmark harness output
by benchmarks/report.py
— never hand-edited. The Osprey column is highlighted; the fastest cell in each
row is emphasised, and ★ marks a benchmark Osprey wins outright (strictly
faster, or lighter, than every other language).
Osprey is the fastest of all five languages on digitsum, factorial, hanoi, josephus, primes, tak. Lower is better; ★ marks an Osprey win.
CPU time
| Benchmark | Osprey | Rust | C | OCaml | Haskell |
|---|---|---|---|---|---|
| ackermann | 127.9 ms | 132.5 ms | 128.0 ms | 113.6 ms | 65.6 ms |
| binarytrees | 420.3 ms | 713.2 ms | 348.9 ms | 50.7 ms | 16.7 ms |
| coins | 71.7 ms | 76.2 ms | 71.1 ms | 93.6 ms | 52.3 ms |
| collatz | 12.3 ms | 11.4 ms | 9.4 ms | 54.2 ms | 39.3 ms |
| coprime | 62.4 ms | 60.9 ms | 58.5 ms | 88.1 ms | 100.0 ms |
| digitsum | 4.9 ms ★ | 5.2 ms | 5.3 ms | 19.0 ms | 29.1 ms |
| factorial | 33.5 ms ★ | 34.8 ms | 34.6 ms | 50.2 ms | 53.9 ms |
| fib | 21.1 ms | 17.8 ms | 18.6 ms | 24.3 ms | 49.8 ms |
| gcdsum | 81.1 ms | 79.8 ms | 79.3 ms | 101.6 ms | 103.3 ms |
| hanoi | 38.4 ms ★ | 38.8 ms | 39.3 ms | 61.6 ms | 55.8 ms |
| isqrt | 13.4 ms | 11.2 ms | 10.6 ms | 20.9 ms | 40.5 ms |
| josephus | 32.8 ms ★ | 33.5 ms | 33.4 ms | 41.2 ms | 44.5 ms |
| mutual | 13.4 ms | 13.0 ms | 12.8 ms | 28.9 ms | 40.5 ms |
| nestedloop | 44.8 ms | 46.4 ms | 44.5 ms | 57.1 ms | 63.9 ms |
| pascal | 27.7 ms | 27.7 ms | 27.6 ms | 44.6 ms | 62.3 ms |
| powmod | 23.4 ms | 22.7 ms | 22.4 ms | 59.6 ms | 57.4 ms |
| primes | 6.3 ms ★ | 6.7 ms | 6.6 ms | 8.8 ms | 15.8 ms |
| tak | 32.7 ms ★ | 32.8 ms | 32.9 ms | 45.1 ms | 64.4 ms |
Peak memory
| Benchmark | Osprey | Rust | C | OCaml | Haskell |
|---|---|---|---|---|---|
| ackermann | 1.6 MiB | 1.7 MiB | 1.6 MiB | 2.6 MiB | 15.1 MiB |
| binarytrees | 905.0 MiB | 2.2 MiB | 1.7 MiB | 5.1 MiB | 11.0 MiB |
| coins | 1.4 MiB | 1.5 MiB | 1.4 MiB | 2.2 MiB | 11.1 MiB |
| collatz | 1.4 MiB | 1.5 MiB | 1.4 MiB | 2.2 MiB | 11.1 MiB |
| coprime | 1.4 MiB | 1.5 MiB | 1.4 MiB | 2.2 MiB | 11.1 MiB |
| digitsum | 1.4 MiB | 1.5 MiB | 1.4 MiB | 2.2 MiB | 11.1 MiB |
| factorial | 1.4 MiB | 1.5 MiB | 1.4 MiB | 2.2 MiB | 11.1 MiB |
| fib | 1.4 MiB | 1.5 MiB | 1.4 MiB | 2.2 MiB | 11.1 MiB |
| gcdsum | 1.4 MiB | 1.5 MiB | 1.4 MiB | 2.2 MiB | 11.1 MiB |
| hanoi | 1.4 MiB | 1.5 MiB | 1.4 MiB | 2.2 MiB | 11.1 MiB |
| isqrt | 1.4 MiB | 1.5 MiB | 1.4 MiB | 2.2 MiB | 11.1 MiB |
| josephus | 1.4 MiB | 1.5 MiB | 1.4 MiB | 2.2 MiB | 11.1 MiB |
| mutual | 1.4 MiB | 1.5 MiB | 1.4 MiB | 2.2 MiB | 11.1 MiB |
| nestedloop | 1.4 MiB | 1.5 MiB | 1.4 MiB | 2.2 MiB | 11.0 MiB |
| pascal | 1.4 MiB | 1.5 MiB | 1.4 MiB | 2.2 MiB | 11.1 MiB |
| powmod | 1.4 MiB | 1.5 MiB | 1.4 MiB | 2.2 MiB | 11.1 MiB |
| primes | 1.4 MiB | 1.5 MiB | 1.4 MiB | 2.2 MiB | 11.1 MiB |
| tak | 1.4 MiB | 1.5 MiB | 1.4 MiB | 2.2 MiB | 11.1 MiB |
Methodology
Every benchmark is implemented identically in all five languages under
benchmarks/cases/<name>/,
compiled to a native binary, checked for correct output, then timed.
- Build once, time the binary.
osprey … --compileemits a persistent native executable; we time that, never--run(which would fold compile and link into the measurement). Every language uses its standard optimizing release flags. - Correctness oracle. Each binary runs once and its output is compared to the
case's
expected.txt. A mismatch or build failure is excluded from timing — we never publish a number for a program that computed the wrong thing. Every case has a single deterministic integer result, so output is byte-comparable across languages. - CPU.
hyperfine-N --warmup 3 --min-runs 10per case → statistical mean ± standard deviation. - Memory.
/usr/bin/timepeak resident set size (-lon macOS,-von Linux), max over a few runs.
Compile commands
| Language | Command |
|---|---|
| Osprey | osprey <f>.osp --compile (LLVM IR → clang -O2; override with OSPREY_OPT) |
| Rust | rustc -C opt-level=3 -C overflow-checks=off |
| C | cc -O2 |
| OCaml | ocamlopt -O3 -unsafe |
| Haskell | ghc -O2 |
Reading the numbers fairly
- Same algorithm everywhere. Identical naive algorithm and parameters in
every language — no memoization, closed forms, SIMD, or parallelism. We measure
the language/compiler/runtime, not who is cleverest. Ranges match Osprey's
half-open
range(a, b)=[a, b)exactly. - Osprey does checked arithmetic on every
+ - * %(each returnsResult<int, MathError>, overflow-checked). The others do not by default — we even pass-C overflow-checks=offto Rust to match its release profile. Part of any Osprey gap is the cost of that safety, a real language semantic. - Osprey loops via
range |> fold, not deep linear recursion, because it has no tail-call optimization yet (a 1e6-deep recursion overflows the stack). The work is identical; only the iteration mechanism differs. - OCaml is built without flambda (stock
ocamlopt), so its numbers are conservative versus an flambda build. - Single machine, wall clock. Treat ratios as indicative; re-run locally with
make bench. The exact set of outright wins shifts run-to-run because Osprey, Rust, and C now sit within measurement noise of one another.
Where the gap remains: memory
On compute, Osprey is at parity with C and Rust and ahead of OCaml and Haskell.
Peak memory matches C on every case except binarytrees. That benchmark
builds, holds, and checksums millions of small heap nodes — they genuinely
escape, so the optimizer cannot statically free them, and Osprey's default
allocator does not reclaim memory during a run yet.
This is the contract of the Memory Management spec: allocation funnels through one swappable backend boundary, so a reclaiming manager (reference counting, a tracing collector, or an arena) can be linked in to close this last gap without changing a line of Osprey source.
Reproduce it
make bench # build everything, run the whole suite
BENCH_FILTER=fib make bench # only cases whose name contains "fib"
Results land in benchmarks/results/ — results.html (this report, standalone),
results.json (structured), and the per-case hyperfine exports.