Osprey compiles through LLVM to a native binary, so the fair question is how it sits against other native-compiled languages. This page measures CPU time and peak memory against Rust, C, OCaml, and Haskell on classic compute benchmarks — the same naive algorithm, the same parameters, in every language.

The tables below are generated mechanically from the benchmark harness output by benchmarks/report.py — never hand-edited. The Osprey column is highlighted; the fastest cell in each row is emphasised, and ★ marks a benchmark Osprey wins outright (strictly faster, or lighter, than every other language).

6CPU wins (fastest of all)
0.98×CPU vs Rust
1.05×CPU vs C
0.71×CPU vs OCaml
0.67×CPU vs Haskell

Osprey is the fastest of all five languages on digitsum, factorial, hanoi, josephus, primes, tak. Lower is better; ★ marks an Osprey win.

CPU time

BenchmarkOspreyRustCOCamlHaskell
ackermann127.9 ms132.5 ms128.0 ms113.6 ms65.6 ms
binarytrees420.3 ms713.2 ms348.9 ms50.7 ms16.7 ms
coins71.7 ms76.2 ms71.1 ms93.6 ms52.3 ms
collatz12.3 ms11.4 ms9.4 ms54.2 ms39.3 ms
coprime62.4 ms60.9 ms58.5 ms88.1 ms100.0 ms
digitsum4.9 ms ★5.2 ms5.3 ms19.0 ms29.1 ms
factorial33.5 ms ★34.8 ms34.6 ms50.2 ms53.9 ms
fib21.1 ms17.8 ms18.6 ms24.3 ms49.8 ms
gcdsum81.1 ms79.8 ms79.3 ms101.6 ms103.3 ms
hanoi38.4 ms ★38.8 ms39.3 ms61.6 ms55.8 ms
isqrt13.4 ms11.2 ms10.6 ms20.9 ms40.5 ms
josephus32.8 ms ★33.5 ms33.4 ms41.2 ms44.5 ms
mutual13.4 ms13.0 ms12.8 ms28.9 ms40.5 ms
nestedloop44.8 ms46.4 ms44.5 ms57.1 ms63.9 ms
pascal27.7 ms27.7 ms27.6 ms44.6 ms62.3 ms
powmod23.4 ms22.7 ms22.4 ms59.6 ms57.4 ms
primes6.3 ms ★6.7 ms6.6 ms8.8 ms15.8 ms
tak32.7 ms ★32.8 ms32.9 ms45.1 ms64.4 ms

Peak memory

BenchmarkOspreyRustCOCamlHaskell
ackermann1.6 MiB1.7 MiB1.6 MiB2.6 MiB15.1 MiB
binarytrees905.0 MiB2.2 MiB1.7 MiB5.1 MiB11.0 MiB
coins1.4 MiB1.5 MiB1.4 MiB2.2 MiB11.1 MiB
collatz1.4 MiB1.5 MiB1.4 MiB2.2 MiB11.1 MiB
coprime1.4 MiB1.5 MiB1.4 MiB2.2 MiB11.1 MiB
digitsum1.4 MiB1.5 MiB1.4 MiB2.2 MiB11.1 MiB
factorial1.4 MiB1.5 MiB1.4 MiB2.2 MiB11.1 MiB
fib1.4 MiB1.5 MiB1.4 MiB2.2 MiB11.1 MiB
gcdsum1.4 MiB1.5 MiB1.4 MiB2.2 MiB11.1 MiB
hanoi1.4 MiB1.5 MiB1.4 MiB2.2 MiB11.1 MiB
isqrt1.4 MiB1.5 MiB1.4 MiB2.2 MiB11.1 MiB
josephus1.4 MiB1.5 MiB1.4 MiB2.2 MiB11.1 MiB
mutual1.4 MiB1.5 MiB1.4 MiB2.2 MiB11.1 MiB
nestedloop1.4 MiB1.5 MiB1.4 MiB2.2 MiB11.0 MiB
pascal1.4 MiB1.5 MiB1.4 MiB2.2 MiB11.1 MiB
powmod1.4 MiB1.5 MiB1.4 MiB2.2 MiB11.1 MiB
primes1.4 MiB1.5 MiB1.4 MiB2.2 MiB11.1 MiB
tak1.4 MiB1.5 MiB1.4 MiB2.2 MiB11.1 MiB

Methodology

Every benchmark is implemented identically in all five languages under benchmarks/cases/<name>/, compiled to a native binary, checked for correct output, then timed.

  1. Build once, time the binary. osprey … --compile emits a persistent native executable; we time that, never --run (which would fold compile and link into the measurement). Every language uses its standard optimizing release flags.
  2. Correctness oracle. Each binary runs once and its output is compared to the case's expected.txt. A mismatch or build failure is excluded from timing — we never publish a number for a program that computed the wrong thing. Every case has a single deterministic integer result, so output is byte-comparable across languages.
  3. CPU. hyperfine -N --warmup 3 --min-runs 10 per case → statistical mean ± standard deviation.
  4. Memory. /usr/bin/time peak resident set size (-l on macOS, -v on Linux), max over a few runs.

Compile commands

Language Command
Osprey osprey <f>.osp --compile (LLVM IR → clang -O2; override with OSPREY_OPT)
Rust rustc -C opt-level=3 -C overflow-checks=off
C cc -O2
OCaml ocamlopt -O3 -unsafe
Haskell ghc -O2

Reading the numbers fairly

  • Same algorithm everywhere. Identical naive algorithm and parameters in every language — no memoization, closed forms, SIMD, or parallelism. We measure the language/compiler/runtime, not who is cleverest. Ranges match Osprey's half-open range(a, b) = [a, b) exactly.
  • Osprey does checked arithmetic on every + - * % (each returns Result<int, MathError>, overflow-checked). The others do not by default — we even pass -C overflow-checks=off to Rust to match its release profile. Part of any Osprey gap is the cost of that safety, a real language semantic.
  • Osprey loops via range |> fold, not deep linear recursion, because it has no tail-call optimization yet (a 1e6-deep recursion overflows the stack). The work is identical; only the iteration mechanism differs.
  • OCaml is built without flambda (stock ocamlopt), so its numbers are conservative versus an flambda build.
  • Single machine, wall clock. Treat ratios as indicative; re-run locally with make bench. The exact set of outright wins shifts run-to-run because Osprey, Rust, and C now sit within measurement noise of one another.

Where the gap remains: memory

On compute, Osprey is at parity with C and Rust and ahead of OCaml and Haskell. Peak memory matches C on every case except binarytrees. That benchmark builds, holds, and checksums millions of small heap nodes — they genuinely escape, so the optimizer cannot statically free them, and Osprey's default allocator does not reclaim memory during a run yet.

This is the contract of the Memory Management spec: allocation funnels through one swappable backend boundary, so a reclaiming manager (reference counting, a tracing collector, or an arena) can be linked in to close this last gap without changing a line of Osprey source.

Reproduce it

make bench                       # build everything, run the whole suite
BENCH_FILTER=fib make bench      # only cases whose name contains "fib"

Results land in benchmarks/results/results.html (this report, standalone), results.json (structured), and the per-case hyperfine exports.