ML Flavor Syntax
The ML flavor is a layout-based source surface for Osprey: indentation
delimits blocks, functions curry by default (whitespace application reads as
curried and lowers to the Default flavor's explicit-curry nested-lambda shape),
and effect handlers are first-class values. It is one of Osprey's language flavors — a
parsing-and-lowering profile, not a separate language. Every construct here
lowers to the same osprey_ast::Program the Default (brace) flavor produces,
and from there shares one type checker, effect checker, and backend.
This chapter is the surface reference. The boundary rules, the lowering contract, currying canonicalisation, and the shared-core handler-value feature are normative in Language Flavors; this chapter is subordinate to that contract. Implementation is tracked in plan 0013.
- Status
- Layout Model
- Bindings and Mutation
- Functions and Currying
- Function Calls
- Effects
- Handlers
- Match
- Records
- Blocks
- Canonical Lowering Table
- Worked Example
- Resolved Syntax Questions
- References
Status
Partially implemented; in active development. The Default flavor (specs
0001–0022) remains the primary frontend. Select the ML surface with
--flavor ml, the .ospml extension, or a // osprey: flavor=ml marker (see
Flavor Selection).
- Phase 1 — flavor frontend seam: implemented and green. The
Flavorenum,Parsed.flavor, andparse_program_with_flavorare live, withparse_programkept as theFlavor::Defaultspecialisation. - Phase 4 — flavor selection: implemented and green. The CLI
--flavor default|mlflag, the.ospmlextension, and the// osprey: flavor=mlmarker are resolved by the precedence flag > marker > extension > Default, with a hard error when extension and marker disagree. The differential harness (crates/diff_examples.sh) discovers.ospmlfixtures additively, leaving every existing.ospexample untouched. - Phases 2–3 — ML lexer/parser/lowerer: in active development. The frontend
is a hand-written Rust layout lexer + recursive-descent (Pratt /
precedence-climbing) parser in
crates/osprey-syntax/src/ml/(see[FLAVOR-ML-LAYOUT]). - Phase 0 — first-class handler values + effects: deferred. ML handler/effect syntax errors loudly until this shared-core feature lands.
The parsing techniques and the offside rule are cited in the References section.
Layout Model
[FLAVOR-ML-LAYOUT] The ML flavor uses the offside rule. A block is
introduced by a header line and continued by the lines indented under it; a line
indented less than the block's column closes it. Blocks nest by indentation.
INDENT ::= (* start of a more-indented region *)
DEDENT ::= (* return to a less-indented region *)
NEWLINE ::= (* significant end-of-line within a layout region *)
Implementation decision — hand-written Rust layout lexer. These tokens are
produced by a hand-written Rust layout lexer in
crates/osprey-syntax/src/ml/lexer.rs
(with token.rs, parser.rs, mod.rs alongside). The lexer derives the layout
markers (Indent/Dedent/Newline) from the offside rule (Landin 1966)
via an explicit indentation stack, with bracket depth suppressing layout
inside parentheses; it ignores blank lines and comment-only lines, and
preserves source positions (row/column) on every token so diagnostics and the
LSP keep working
(FLAVOR-LOWER-CONTRACT). This is
now wired end-to-end in the editor: the language server selects the ML frontend
for a .ospml document through osprey_syntax::parse_program_for_path, so a
layout-flavor file is analysed by its own parser instead of being flagged as
broken Default syntax — see
FLAVOR-SELECT. The
parser above it is a recursive-descent (Pratt / precedence-climbing) parser
that produces an ML concrete syntax tree (CST); a separate lowerer
(lower.rs) then converts that CST to canonical osprey_ast::Program, keeping a
clean CST→AST separation.
This supersedes the earlier plan of a tree-sitter-osprey-ml grammar with
an external C scanner. Rationale: the offside rule is naturally expressed with
an explicit indent stack in safe Rust; the frontend stays panic-free /
Result-returning and unit-testable (project rules), with no unsafe C and no
codegen-tool build dependency. Per
[FLAVOR-BOUNDARY] the parser
mechanism is a below-the-AST, flavor-internal concern, so this swap does not
change the architecture (many CSTs, one AST). The parsing techniques are cited
in the References section.
Escape hatch (documented fallback, not the primary path). If the hand-written layout frontend becomes onerous or accrues parsing bugs we cannot tame, we fall back to a
tree-sitter-osprey-mlgrammar with an externalINDENT/DEDENT/NEWLINEscanner.c. The boundary law ([FLAVOR-BOUNDARY]) makes the parser mechanism a flavor-internal swap that leaves the AST and everything above it untouched. (The tree-sitter brace grammar has none today —tree-sitter-osprey/ships noscanner.c— so the fallback scanner would be new work.)
String interpolation keeps ${…}. Parentheses remain available for grouping and
precedence; they are not mandatory call punctuation.
Bindings and Mutation
[FLAVOR-ML-BIND] = binds, := mutates. There is no let: a bare
name = expr introduces an immutable binding in the current layout block. mut
marks a mutable binding, and every write to it uses :=, so mutation is visible
without scanning back to the declaration.
binding ::= "mut"? bindingHead "=" expr
bindingHead ::= ID paramPattern* (* zero patterns ⇒ value; one+ ⇒ function *)
mutation ::= ID ":=" expr
answer = 42
mut requests = 0
requests := requests + 1
Same-scope rebinding with = is rejected; the diagnostic suggests := if
mutation was meant. Shadowing in a nested block or pattern is allowed.
Lowering: name = e → Stmt::Let { mutable: false }; mut name = e →
Stmt::Let { mutable: true }; name := e → Stmt::Assignment. These are the
same canonical nodes the Default flavor emits for let, mut, and =
reassignment respectively — only the spelling differs.
Functions and Currying
[FLAVOR-ML-FN] A function definition is a binding whose head has one or more
parameter patterns. The optional signature line above it uses ML arrows.
signature ::= ID ":" type
funDef ::= ID paramPattern+ "=" blockOrExpr (* curried: one arg per pattern *)
| ID "(" param ("," param)* ")" "=" blockOrExpr (* uncurried: one flat arg list *)
type ::= type "->" type (* right-associative: a -> b -> c = a -> (b -> c) *)
| "(" type ("," type)* ")" "->" type (* uncurried multi-argument *)
| typeAtom
inc : int -> int
inc x = x + 1
add : int -> int -> int
add x y = x + y
[FLAVOR-ML-CURRY] ML curries by default. A multi-parameter binding
add x y = body reads as curried: it lowers to the nested-lambda shape — a
one-parameter Stmt::Function whose body is a one-parameter Expr::Lambda —
byte-identical to the Default flavor's explicit-curry
fn add(x) = fn(y) => body, not to the Default multi-parameter
fn add(x, y) (a deliberately different value, normative in
FLAVOR-CURRY). An ML program
and its Default explicit-curry twin emit byte-identical
IR (FLAVOR-IR-EQUIV).
Application is curried and left-associative: add 1 2 is ((add) 1) 2, lowering
to nested single-argument calls Call(Call(add, [1]), [2]); a function-typed
signature's arrows are right-associative (int -> int -> int is
int -> (int -> int)), mirroring the application. Partial application just
works: add 1 is the inner saturated call returning a function value — the
idiom ML reaches for (rose = c256 "213" makes a one-argument colouriser from
the two-argument c256).
ML also has an uncurried, multi-argument form — for a binding that should not curry — written with parenthesised, comma-separated parameters:
add : (int, int) -> int
add (x, y) = x + y
sum = add (10, 20)
add (x, y) = body lowers to a flat two-parameter Stmt::Function — the
same canonical node as the Default multi-parameter fn add(x, y) = body —
and the saturated call add (10, 20) lowers to a single multi-argument
Call(add, [10, 20]), matching Default's add(x: 10, y: 20). It does not
partially apply; the parenthesised comma-list is an argument grouping, not a
tuple value (Osprey has no tuple type). It is the deliberate not-equivalent of
the curried add x y.
ML therefore has two function forms, and they twin the two Default forms exactly:
| ML form | lowers to | Default twin |
|---|---|---|
curried add x y = e |
one-param Function → Lambda chain |
explicit-curry fn add(x) = fn(y) => e |
uncurried add (x, y) = e |
flat two-param Function |
multi-param fn add(x, y) = e |
This is what keeps cross-flavor IR byte-identical (FLAVOR-IR-EQUIV) with no backend currying magic: a twin's author picks the ML form matching its Default original's currying — curried Default ↔ ML whitespace, uncurried Default ↔ ML parens — so both sides lower to the same AST and emit the same IR.
Lowering (normative in
FLAVOR-CURRY): curried
add x y = body → a one-parameter Stmt::Function returning a one-parameter
Expr::Lambda; \x y => body → the same curried Expr::Lambda chain; add 1 2
→ nested one-argument Expr::Calls — each byte-identical to Default
explicit-curry fn add(x) = fn(y) => body and add(1)(2). Uncurried
add (x, y) = body → a flat multi-parameter Stmt::Function, and add (1, 2) →
a single Call(add, [1, 2]) — byte-identical to Default fn add(x, y) and
add(x: 1, y: 2). No flavor-only node shape survives lowering; ML reuses
Default's value vocabulary. (The backend may still fold a saturated curried
call into a direct multi-argument call as an independent optimisation, but the
lowered AST of add x y stays the curried nested form.)
API guidance: put stable, configuration-like arguments first and the data
argument last, so partial application is useful (replace " " "" ⇒ a
space-remover).
Function Calls
[FLAVOR-ML-CALL] Calls use whitespace application; parentheses group.
application ::= app atom
| atom
atom ::= ID | literal | "(" expr ")"
length snap
textResp 201 "created\n"
c256 "213" (blocks 0 (mn n 28))
Lowering: whitespace application f a b → nested Expr::Call, one argument each
(Call(Call(f,[a]),[b])) — curried. A parenthesised comma-list f (a, b) is the
uncurried saturated call → a single Call(f, [a, b]) (matching Default's
f(x: a, y: b)); a single parenthesised expression f (a) is just grouping and
lowers to Call(f, [a]).
Effects
[FLAVOR-ML-EFFECT] An effect declaration is a layout block of operation
signatures. Operations use => so that -> keeps its one meaning — function
and currying type. An operation is a request with a payload and a result,
not a curried function.
effectDecl ::= "effect" ID INDENT opSig+ DEDENT
opSig ::= ID ":" type "=>" type
effect Db
add : string => int
list : Unit => string
count : Unit => int
effect Log
info : string => Unit
Zero-payload operations take Unit. Multi-field requests use a record payload,
not a fake multi-argument operation:
type AddTask =
body : string
priority : int
effect Db
add : AddTask => int
Lowering: effect E + arms → Stmt::Effect { operations }, where each
op : P => R becomes EffectOperation { name, parameters: [P], return_type: R }
— the same canonical node the Default op : fn(P) -> R produces.
perform E.op a → Expr::Perform.
->belongs to functions and currying.=>belongs to clauses and requests that yield a result: it appears ineffectoperations,handlerarms, andmatcharms, always meaning "the left yields the right."
Handlers
[FLAVOR-ML-HANDLER] Handlers are first-class values. handler E followed
by indented arms evaluates to a value of type Handler E. handle installs one
or more such values around a computation, with do marking the handled body.
handlerValue ::= "handler" ID INDENT handlerArm+ DEDENT
handlerArm ::= ID param* "=>" blockOrExpr
install ::= "handle" expr+ "do" blockOrExpr
memoryDb : Unit -> Handler Db
memoryDb () =
mut tasks = ""
mut taskCount = 0
handler Db
add t =>
taskCount := taskCount + 1
tasks := "${tasks}#${toString taskCount} ${t}\n"
taskCount
list =>
tasks
count =>
taskCount
Installing several at once replaces the Default flavor's repeated nesting:
db = memoryDb ()
log = silentLog ()
handle db log
do
createTask "buy milk"
The mutable cells belong to the handler value: a fresh handler makes fresh
state; passing the same value around shares it. Parameterised handlers compose
with currying (filePersist path = … handler Persist …).
First-class handler values, the Handler E type, and multi-install are a
shared-core feature, not ML-only sugar — see
FLAVOR-HANDLER-VALUE. They
lower to Expr::HandlerValue and Expr::Install; handle a b c do body
desugars to nested installs. The Default flavor gains the same feature in brace
spelling.
Match
[FLAVOR-ML-MATCH] match uses the same clause style as handlers: the
scrutinee follows match, and each indented arm is Pattern => body. A
one-payload constructor binds its payload directly — Success value, not
Success { value }.
matchExpr ::= "match" expr INDENT matchArm+ DEDENT
matchArm ::= pattern "=>" blockOrExpr
diskBytes =
match saved
Success value => length snap
Error message => -1
Lowering: Expr::Match + MatchArm; Success value →
Pattern::Constructor { name: "Success", fields: ["value"] } — the same node the
Default Success { value } produces. Wildcard _ → Pattern::Wildcard.
Records
[FLAVOR-ML-RECORD] Record construction is a layout block headed by the
constructor name, with field = value lines. Inside a record literal the left
of = is a field name, not a new binding; the indentation under a constructor
makes that unambiguous.
recordExpr ::= ID INDENT fieldInit+ DEDENT
fieldInit ::= ID "=" expr
textResp status bodyText =
HttpResponse
status = status
headers = "Content-Type: text/plain"
contentType = "text/plain"
streamFd = -1
isComplete = true
partialBody = bodyText
Lowering: Expr::TypeConstructor { name, fields }; record update lowers to
Expr::Update.
Blocks
[FLAVOR-ML-BLOCK] A function body, match arm, handler arm, or do body is an
ordinary layout region containing bindings, mutations, performs, and a final
expression. The final expression is the block's value. There is no separate
{ … } expression form in this flavor.
onPost body =
id = perform Db.add body
snap = perform Db.list
written = perform Persist.flush snap
perform Log.info "created"
textResp 201 "created\n"
Lowering: Expr::Block { statements, value }, where value is the trailing
expression — the same node the Default { … } block produces.
Canonical Lowering Table
Every ML form on the left lowers to the canonical node on the right
(crates/osprey-ast/src/lib.rs). The Default-flavor spelling of the same node
is in FLAVOR-LAYER.
| ML surface | Canonical AST node |
|---|---|
x = e |
Stmt::Let { mutable: false } |
mut x = e |
Stmt::Let { mutable: true } |
x := e |
Stmt::Assignment |
f x y = e (curried) |
one-param Stmt::Function returning a Lambda chain |
f (x, y) = e (uncurried) |
flat multi-param Stmt::Function |
\x y => e |
curried Expr::Lambda chain |
f a b |
nested one-arg Expr::Call — Call(Call(f,[a]),[b]) |
f (a, b) (saturated) |
single multi-arg Expr::Call — Call(f, [a, b]) |
type T = + variant/field layout |
Stmt::Type + TypeVariant |
[a, b, c] / xs[i] |
Expr::List / Expr::Index |
| layout block | Expr::Block |
match v + arms |
Expr::Match + MatchArm |
Success value |
Pattern::Constructor { fields: ["value"] } |
T + f = v lines |
Expr::TypeConstructor |
effect E + op : P => R |
Stmt::Effect + EffectOperation |
perform E.op a |
Expr::Perform |
handler E + arms |
Expr::HandlerValue (shared-core addition) |
handle a b do body |
Expr::Install (shared-core addition) |
Worked Example
The same program a Default-flavor author would write with braces, fn, named
arguments, and nested handle … in. It exercises curried definitions, partial
application (textResp 201, c256 "213"), => effect operations, first-class
handler values with owned mut state, and one grouped handle … do.
effect Db
add : string => int
list : Unit => string
count : Unit => int
effect Log
info : string => Unit
c256 : string -> string -> string
c256 n s =
"\e[38;5;${n}m${s}\e[0m"
rose : string -> string
rose = c256 "213"
textResp : int -> string -> HttpResponse
textResp status bodyText =
HttpResponse
status = status
headers = "Content-Type: text/plain"
contentType = "text/plain"
streamFd = -1
isComplete = true
partialBody = bodyText
memoryDb : Unit -> Handler Db
memoryDb () =
mut tasks = ""
mut taskCount = 0
handler Db
add t =>
taskCount := taskCount + 1
tasks := "${tasks}#${toString taskCount} ${t}\n"
taskCount
list => tasks
count => taskCount
silentLog : Unit -> Handler Log
silentLog () =
handler Log
info m => ()
createTask : string -> HttpResponse
createTask body =
id = perform Db.add body
snap = perform Db.list
perform Log.info "created #${toString id} ${snap}"
textResp 201 "created task #${toString id}\n"
db = memoryDb ()
log = silentLog ()
handle db log
do
response = createTask "buy milk"
print (httpResponseBody response)
The first-class handlers make test doubles trivial — a test installs spy or
stub handlers that close over the test's own mut cells around the call under
test, with no Db/Log parameters polluting the production signature:
test "createTask stores the task and logs" =
mut stored = ""
mut logLine = ""
db =
handler Db
add task =>
stored := task
1
list => "#1 ${stored}\n"
count => 1
log =
handler Log
info message => logLine := message
response =
handle db log
do
createTask "buy milk"
expectEqual 201 (httpResponseStatus response)
expectEqual "buy milk" stored
expectEqual "created #1 #1 buy milk\n" logLine
Resolved Syntax Questions
- Zero-argument functions: a parameterless
name = expris a value binding; aname () = expris aUnit -> Tfunction. Pure constants are values (banner);()is used where recursion or effects make the call boundary meaningful (serveForever ()). - Lambdas: anonymous functions are written
\param* => body(lowering toExpr::Lambda), keeping=>as the clause/yield arrow and->as the type arrow. - Effect annotations on signatures: the effect row follows the result type,
as in the Default flavor (
saveTask : string -> int ![Store, Log]).
References
These are the verified sources behind the hand-written ML frontend
([FLAVOR-ML-LAYOUT]): the recursive-descent / predictive
parser, its Pratt (precedence-climbing) expression layer, and the offside-rule
layout lexer.
1. Recursive-descent / predictive parsing foundations
- Compilers: Principles, Techniques, and Tools (the "Dragon Book"), 2nd ed. — Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman. 2006. Pearson. ISBN 9780321486813. https://www.pearson.com/en-us/subject-catalog/p/compilers-principles-techniques-and-tools/P200000003472/9780321486813 — Authorizes the canonical predictive recursive-descent / LL(1) construction (FIRST/FOLLOW-driven procedure-per-nonterminal parsing) the hand-written parser implements.
- Compiler Construction — Niklaus Wirth. 1996 (Addison-Wesley; author's free PDF). ETH Zürich. Landing page: https://people.inf.ethz.ch/wirth/CompilerConstruction/index.html · PDF: https://people.inf.ethz.ch/wirth/CompilerConstruction/CompilerConstruction1.pdf — Authorizes the single-symbol-lookahead, single-pass recursive-descent strategy of deriving one recursive procedure per grammar production directly from an EBNF grammar.
- Crafting Interpreters (ch. 6 "Parsing Expressions", ch. 17 "Compiling Expressions") — Robert Nystrom. 2021. Genever Benning (freely readable online). https://craftinginterpreters.com/compiling-expressions.html — Practitioner reference authorizing the by-hand, no-generator recursive-descent parser and its Pratt-based expression layer for a real language.
2. Operator-precedence / Pratt parsing
- Top Down Operator Precedence — Vaughan R. Pratt. 1973. Proc. 1st ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL '73), pp. 41–51. DOI: https://doi.org/10.1145/512927.512931 — The primary source authorizing the Pratt (top-down operator-precedence) expression parser: per-token prefix/infix handlers driven by binding powers.
- Parsing Expressions by Precedence Climbing — Eli Bendersky. 2 Aug 2012. https://eli.thegreenplace.net/2012/08/02/parsing-expressions-by-precedence-climbing — Authorizes the precedence-climbing formulation of operator-precedence parsing (the loop-based, min-precedence variant used in production front-ends such as Clang).
- Parsing Expressions by Recursive Descent: From Precedence Climbing to Pratt Parsing — Theodore S. Norvell, Memorial University of Newfoundland. https://www.engr.mun.ca/~theo/Misc/pratt_parsing.htm — Authorizes treating precedence climbing and Pratt parsing as the same algorithm, justifying a single binding-power table for prefix/infix/postfix/ternary operators.
3. The offside rule / layout-sensitive (indentation) syntax
- The Next 700 Programming Languages — Peter J. Landin. 1966. Communications of the ACM 9(3), pp. 157–166. DOI: https://doi.org/10.1145/365230.365257 — The origin of the "offside rule"; the primary source authorizing indentation-as-structure (a token left of the line's first significant token starts a new construct).
- Principled Parsing for Indentation-Sensitive Languages: Revisiting Landin's Offside Rule — Michael D. Adams. 2013. Proc. 40th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '13), pp. 511–522. DOI: https://doi.org/10.1145/2429069.2429129 · author PDF: https://michaeldadams.org/papers/layout_parsing/LayoutParsing.pdf — Authorizes a grammar-integrated, principled treatment of indentation sensitivity rather than an ad-hoc lexer hack.
- Haskell 2010 Language Report — §2.7 "Layout" (informal) and §10.3 (formal layout algorithm) — Simon Marlow (ed.). 2010. haskell.org. Lexical chapter: https://www.haskell.org/onlinereport/haskell2010/haskellch2.html · Syntax-reference chapter: https://www.haskell.org/onlinereport/haskell2010/haskellch10.html — Secondary/reference source authorizing a concrete, fully specified offside layout algorithm (brace/semicolon insertion from indentation) suitable for a hand-written lexer/parser.
4. Error recovery in recursive-descent (panic-mode / synchronization)
- Compilers: Principles, Techniques, and Tools (the "Dragon Book"), 2nd ed., §4.1.3–4.1.4 (Error-Recovery Strategies; panic-mode and phrase-level recovery) — Aho, Lam, Sethi, Ullman. 2006. Pearson. ISBN 9780321486813. https://www.pearson.com/en-us/subject-catalog/p/compilers-principles-techniques-and-tools/P200000003472/9780321486813 — The foundational reference authorizing panic-mode error recovery: on a syntax error, discard input tokens until a synchronizing token (e.g. statement terminators / FOLLOW sets) is reached, then resume.
Verification (research subagent, 2026): the three DOIs (Pratt 10.1145/512927.512931, Landin 10.1145/365230.365257, Adams 10.1145/2429069.2429129) resolve through doi.org to the correct ACM DL records (ACM landing pages 403 to automated fetches; corroborated via doi.org redirect + dblp). Wirth ETH page + PDF, Adams author PDF, Nystrom, Bendersky, Norvell, and both Haskell 2010 chapters were each fetched and matched. Dragon Book §4.1.3–4.1.4 are the standard 2nd-ed. TOC section numbers (book/publisher confirmed; exact section numbers not page-verified).
Cross-references
- Language Flavors — the normative boundary, contract, currying canonicalisation, and shared-core handler-value feature.
- Algebraic Effects — effect semantics shared by both flavors.
- Plan 0013 — ML Flavor Frontend.