Benchmarks

Osprey compiles through LLVM to a native binary, so the fair question is how it sits against other native-compiled languages. This page measures CPU time and peak memory against Rust, C, OCaml, and Haskell on classic compute benchmarks — the same naive algorithm, the same parameters, in every language.

The tables below are generated mechanically from the benchmark harness output by benchmarks/report.py — never hand-edited. The Osprey column is highlighted; the fastest cell in each row is emphasised, and ★ marks a benchmark Osprey wins outright (strictly faster, or lighter, than every other language).

6CPU wins (fastest of all)

0.98×CPU vs Rust

1.05×CPU vs C

0.71×CPU vs OCaml

0.67×CPU vs Haskell

Osprey is the fastest of all five languages on digitsum, factorial, hanoi, josephus, primes, tak. Lower is better; ★ marks an Osprey win.

CPU time

Benchmark	Osprey	Rust	C	OCaml	Haskell
ackermann	127.9 ms	132.5 ms	128.0 ms	113.6 ms	65.6 ms
binarytrees	420.3 ms	713.2 ms	348.9 ms	50.7 ms	16.7 ms
coins	71.7 ms	76.2 ms	71.1 ms	93.6 ms	52.3 ms
collatz	12.3 ms	11.4 ms	9.4 ms	54.2 ms	39.3 ms
coprime	62.4 ms	60.9 ms	58.5 ms	88.1 ms	100.0 ms
digitsum	4.9 ms ★	5.2 ms	5.3 ms	19.0 ms	29.1 ms
factorial	33.5 ms ★	34.8 ms	34.6 ms	50.2 ms	53.9 ms
fib	21.1 ms	17.8 ms	18.6 ms	24.3 ms	49.8 ms
gcdsum	81.1 ms	79.8 ms	79.3 ms	101.6 ms	103.3 ms
hanoi	38.4 ms ★	38.8 ms	39.3 ms	61.6 ms	55.8 ms
isqrt	13.4 ms	11.2 ms	10.6 ms	20.9 ms	40.5 ms
josephus	32.8 ms ★	33.5 ms	33.4 ms	41.2 ms	44.5 ms
mutual	13.4 ms	13.0 ms	12.8 ms	28.9 ms	40.5 ms
nestedloop	44.8 ms	46.4 ms	44.5 ms	57.1 ms	63.9 ms
pascal	27.7 ms	27.7 ms	27.6 ms	44.6 ms	62.3 ms
powmod	23.4 ms	22.7 ms	22.4 ms	59.6 ms	57.4 ms
primes	6.3 ms ★	6.7 ms	6.6 ms	8.8 ms	15.8 ms
tak	32.7 ms ★	32.8 ms	32.9 ms	45.1 ms	64.4 ms

Peak memory

Benchmark	Osprey	Rust	C	OCaml	Haskell
ackermann	1.6 MiB	1.7 MiB	1.6 MiB	2.6 MiB	15.1 MiB
binarytrees	905.0 MiB	2.2 MiB	1.7 MiB	5.1 MiB	11.0 MiB
coins	1.4 MiB	1.5 MiB	1.4 MiB	2.2 MiB	11.1 MiB
collatz	1.4 MiB	1.5 MiB	1.4 MiB	2.2 MiB	11.1 MiB
coprime	1.4 MiB	1.5 MiB	1.4 MiB	2.2 MiB	11.1 MiB
digitsum	1.4 MiB	1.5 MiB	1.4 MiB	2.2 MiB	11.1 MiB
factorial	1.4 MiB	1.5 MiB	1.4 MiB	2.2 MiB	11.1 MiB
fib	1.4 MiB	1.5 MiB	1.4 MiB	2.2 MiB	11.1 MiB
gcdsum	1.4 MiB	1.5 MiB	1.4 MiB	2.2 MiB	11.1 MiB
hanoi	1.4 MiB	1.5 MiB	1.4 MiB	2.2 MiB	11.1 MiB
isqrt	1.4 MiB	1.5 MiB	1.4 MiB	2.2 MiB	11.1 MiB
josephus	1.4 MiB	1.5 MiB	1.4 MiB	2.2 MiB	11.1 MiB
mutual	1.4 MiB	1.5 MiB	1.4 MiB	2.2 MiB	11.1 MiB
nestedloop	1.4 MiB	1.5 MiB	1.4 MiB	2.2 MiB	11.0 MiB
pascal	1.4 MiB	1.5 MiB	1.4 MiB	2.2 MiB	11.1 MiB
powmod	1.4 MiB	1.5 MiB	1.4 MiB	2.2 MiB	11.1 MiB
primes	1.4 MiB	1.5 MiB	1.4 MiB	2.2 MiB	11.1 MiB
tak	1.4 MiB	1.5 MiB	1.4 MiB	2.2 MiB	11.1 MiB

Methodology

Every benchmark is implemented identically in all five languages under benchmarks/cases/<name>/, compiled to a native binary, checked for correct output, then timed.

Build once, time the binary. osprey … --compile emits a persistent native executable; we time that, never --run (which would fold compile and link into the measurement). Every language uses its standard optimizing release flags.
Correctness oracle. Each binary runs once and its output is compared to the case's expected.txt. A mismatch or build failure is excluded from timing — we never publish a number for a program that computed the wrong thing. Every case has a single deterministic integer result, so output is byte-comparable across languages.
CPU. hyperfine -N --warmup 3 --min-runs 10 per case → statistical mean ± standard deviation.
Memory. /usr/bin/time peak resident set size (-l on macOS, -v on Linux), max over a few runs.

Compile commands

Language	Command
Osprey	`osprey <f>.osp --compile` (LLVM IR → clang `-O2`; override with `OSPREY_OPT`)
Rust	`rustc -C opt-level=3 -C overflow-checks=off`
C	`cc -O2`
OCaml	`ocamlopt -O3 -unsafe`
Haskell	`ghc -O2`

Reading the numbers fairly

Same algorithm everywhere. Identical naive algorithm and parameters in every language — no memoization, closed forms, SIMD, or parallelism. We measure the language/compiler/runtime, not who is cleverest. Ranges match Osprey's half-open range(a, b) = [a, b) exactly.
Osprey does checked arithmetic on every + - * % (each returns Result<int, MathError>, overflow-checked). The others do not by default — we even pass -C overflow-checks=off to Rust to match its release profile. Part of any Osprey gap is the cost of that safety, a real language semantic.
Osprey loops via range |> fold, not deep linear recursion, because it has no tail-call optimization yet (a 1e6-deep recursion overflows the stack). The work is identical; only the iteration mechanism differs.
OCaml is built without flambda (stock ocamlopt), so its numbers are conservative versus an flambda build.
Single machine, wall clock. Treat ratios as indicative; re-run locally with make bench. The exact set of outright wins shifts run-to-run because Osprey, Rust, and C now sit within measurement noise of one another.

Where the gap remains: memory

On compute, Osprey is at parity with C and Rust and ahead of OCaml and Haskell. Peak memory matches C on every case except binarytrees. That benchmark builds, holds, and checksums millions of small heap nodes — they genuinely escape, so the optimizer cannot statically free them, and Osprey's default allocator does not reclaim memory during a run yet.

This is the contract of the Memory Management spec: allocation funnels through one swappable backend boundary, so a reclaiming manager (reference counting, a tracing collector, or an arena) can be linked in to close this last gap without changing a line of Osprey source.

Reproduce it

make bench                       # build everything, run the whole suite
BENCH_FILTER=fib make bench      # only cases whose name contains "fib"

Results land in benchmarks/results/ — results.html (this report, standalone), results.json (structured), and the per-case hyperfine exports.