ML Flavor Syntax

The ML flavor is a layout-based source surface for Osprey: indentation delimits blocks, functions curry by default (whitespace application reads as curried and lowers to the Default flavor's explicit-curry nested-lambda shape), and effect handlers are first-class values. It is one of Osprey's language flavors — a parsing-and-lowering profile, not a separate language. Every construct here lowers to the same osprey_ast::Program the Default (brace) flavor produces, and from there shares one type checker, effect checker, and backend.

This chapter is the surface reference. The boundary rules, the lowering contract, currying canonicalisation, and the shared-core handler-value feature are normative in Language Flavors; this chapter is subordinate to that contract. Implementation is tracked in plan 0013.

Status

Partially implemented; in active development. The Default flavor (specs 00010022) remains the primary frontend. Select the ML surface with --flavor ml, the .ospml extension, or a // osprey: flavor=ml marker (see Flavor Selection).

  • Phase 1 — flavor frontend seam: implemented and green. The Flavor enum, Parsed.flavor, and parse_program_with_flavor are live, with parse_program kept as the Flavor::Default specialisation.
  • Phase 4 — flavor selection: implemented and green. The CLI --flavor default|ml flag, the .ospml extension, and the // osprey: flavor=ml marker are resolved by the precedence flag > marker > extension > Default, with a hard error when extension and marker disagree. The differential harness (crates/diff_examples.sh) discovers .ospml fixtures additively, leaving every existing .osp example untouched.
  • Phases 2–3 — ML lexer/parser/lowerer: in active development. The frontend is a hand-written Rust layout lexer + recursive-descent (Pratt / precedence-climbing) parser in crates/osprey-syntax/src/ml/ (see [FLAVOR-ML-LAYOUT]).
  • Phase 0 — first-class handler values + effects: deferred. ML handler/effect syntax errors loudly until this shared-core feature lands.

The parsing techniques and the offside rule are cited in the References section.

Layout Model

[FLAVOR-ML-LAYOUT] The ML flavor uses the offside rule. A block is introduced by a header line and continued by the lines indented under it; a line indented less than the block's column closes it. Blocks nest by indentation.

INDENT  ::= (* start of a more-indented region *)
DEDENT  ::= (* return to a less-indented region *)
NEWLINE ::= (* significant end-of-line within a layout region *)

Implementation decision — hand-written Rust layout lexer. These tokens are produced by a hand-written Rust layout lexer in crates/osprey-syntax/src/ml/lexer.rs (with token.rs, parser.rs, mod.rs alongside). The lexer derives the layout markers (Indent/Dedent/Newline) from the offside rule (Landin 1966) via an explicit indentation stack, with bracket depth suppressing layout inside parentheses; it ignores blank lines and comment-only lines, and preserves source positions (row/column) on every token so diagnostics and the LSP keep working (FLAVOR-LOWER-CONTRACT). This is now wired end-to-end in the editor: the language server selects the ML frontend for a .ospml document through osprey_syntax::parse_program_for_path, so a layout-flavor file is analysed by its own parser instead of being flagged as broken Default syntax — see FLAVOR-SELECT. The parser above it is a recursive-descent (Pratt / precedence-climbing) parser that produces an ML concrete syntax tree (CST); a separate lowerer (lower.rs) then converts that CST to canonical osprey_ast::Program, keeping a clean CST→AST separation.

This supersedes the earlier plan of a tree-sitter-osprey-ml grammar with an external C scanner. Rationale: the offside rule is naturally expressed with an explicit indent stack in safe Rust; the frontend stays panic-free / Result-returning and unit-testable (project rules), with no unsafe C and no codegen-tool build dependency. Per [FLAVOR-BOUNDARY] the parser mechanism is a below-the-AST, flavor-internal concern, so this swap does not change the architecture (many CSTs, one AST). The parsing techniques are cited in the References section.

Escape hatch (documented fallback, not the primary path). If the hand-written layout frontend becomes onerous or accrues parsing bugs we cannot tame, we fall back to a tree-sitter-osprey-ml grammar with an external INDENT/DEDENT/NEWLINE scanner.c. The boundary law ([FLAVOR-BOUNDARY]) makes the parser mechanism a flavor-internal swap that leaves the AST and everything above it untouched. (The tree-sitter brace grammar has none today — tree-sitter-osprey/ ships no scanner.c — so the fallback scanner would be new work.)

String interpolation keeps ${…}. Parentheses remain available for grouping and precedence; they are not mandatory call punctuation.

Bindings and Mutation

[FLAVOR-ML-BIND] = binds, := mutates. There is no let: a bare name = expr introduces an immutable binding in the current layout block. mut marks a mutable binding, and every write to it uses :=, so mutation is visible without scanning back to the declaration.

binding   ::= "mut"? bindingHead "=" expr
bindingHead ::= ID paramPattern*          (* zero patterns ⇒ value; one+ ⇒ function *)
mutation  ::= ID ":=" expr
answer = 42
mut requests = 0
requests := requests + 1

Same-scope rebinding with = is rejected; the diagnostic suggests := if mutation was meant. Shadowing in a nested block or pattern is allowed.

Lowering: name = eStmt::Let { mutable: false }; mut name = eStmt::Let { mutable: true }; name := eStmt::Assignment. These are the same canonical nodes the Default flavor emits for let, mut, and = reassignment respectively — only the spelling differs.

Functions and Currying

[FLAVOR-ML-FN] A function definition is a binding whose head has one or more parameter patterns. The optional signature line above it uses ML arrows.

signature ::= ID ":" type
funDef    ::= ID paramPattern+ "=" blockOrExpr                 (* curried: one arg per pattern *)
            | ID "(" param ("," param)* ")" "=" blockOrExpr    (* uncurried: one flat arg list *)
type      ::= type "->" type            (* right-associative: a -> b -> c = a -> (b -> c) *)
            | "(" type ("," type)* ")" "->" type   (* uncurried multi-argument *)
            | typeAtom
inc : int -> int
inc x = x + 1

add : int -> int -> int
add x y = x + y

[FLAVOR-ML-CURRY] ML curries by default. A multi-parameter binding add x y = body reads as curried: it lowers to the nested-lambda shape — a one-parameter Stmt::Function whose body is a one-parameter Expr::Lambda — byte-identical to the Default flavor's explicit-curry fn add(x) = fn(y) => body, not to the Default multi-parameter fn add(x, y) (a deliberately different value, normative in FLAVOR-CURRY). An ML program and its Default explicit-curry twin emit byte-identical IR (FLAVOR-IR-EQUIV).

Application is curried and left-associative: add 1 2 is ((add) 1) 2, lowering to nested single-argument calls Call(Call(add, [1]), [2]); a function-typed signature's arrows are right-associative (int -> int -> int is int -> (int -> int)), mirroring the application. Partial application just works: add 1 is the inner saturated call returning a function value — the idiom ML reaches for (rose = c256 "213" makes a one-argument colouriser from the two-argument c256).

ML also has an uncurried, multi-argument form — for a binding that should not curry — written with parenthesised, comma-separated parameters:

add : (int, int) -> int
add (x, y) = x + y

sum = add (10, 20)

add (x, y) = body lowers to a flat two-parameter Stmt::Function — the same canonical node as the Default multi-parameter fn add(x, y) = body — and the saturated call add (10, 20) lowers to a single multi-argument Call(add, [10, 20]), matching Default's add(x: 10, y: 20). It does not partially apply; the parenthesised comma-list is an argument grouping, not a tuple value (Osprey has no tuple type). It is the deliberate not-equivalent of the curried add x y.

ML therefore has two function forms, and they twin the two Default forms exactly:

ML form lowers to Default twin
curried add x y = e one-param FunctionLambda chain explicit-curry fn add(x) = fn(y) => e
uncurried add (x, y) = e flat two-param Function multi-param fn add(x, y) = e

This is what keeps cross-flavor IR byte-identical (FLAVOR-IR-EQUIV) with no backend currying magic: a twin's author picks the ML form matching its Default original's currying — curried Default ↔ ML whitespace, uncurried Default ↔ ML parens — so both sides lower to the same AST and emit the same IR.

Lowering (normative in FLAVOR-CURRY): curried add x y = body → a one-parameter Stmt::Function returning a one-parameter Expr::Lambda; \x y => body → the same curried Expr::Lambda chain; add 1 2 → nested one-argument Expr::Calls — each byte-identical to Default explicit-curry fn add(x) = fn(y) => body and add(1)(2). Uncurried add (x, y) = body → a flat multi-parameter Stmt::Function, and add (1, 2) → a single Call(add, [1, 2]) — byte-identical to Default fn add(x, y) and add(x: 1, y: 2). No flavor-only node shape survives lowering; ML reuses Default's value vocabulary. (The backend may still fold a saturated curried call into a direct multi-argument call as an independent optimisation, but the lowered AST of add x y stays the curried nested form.)

API guidance: put stable, configuration-like arguments first and the data argument last, so partial application is useful (replace " " "" ⇒ a space-remover).

Function Calls

[FLAVOR-ML-CALL] Calls use whitespace application; parentheses group.

application ::= app atom
             | atom
atom        ::= ID | literal | "(" expr ")"
length snap
textResp 201 "created\n"
c256 "213" (blocks 0 (mn n 28))

Lowering: whitespace application f a b → nested Expr::Call, one argument each (Call(Call(f,[a]),[b])) — curried. A parenthesised comma-list f (a, b) is the uncurried saturated call → a single Call(f, [a, b]) (matching Default's f(x: a, y: b)); a single parenthesised expression f (a) is just grouping and lowers to Call(f, [a]).

Effects

[FLAVOR-ML-EFFECT] An effect declaration is a layout block of operation signatures. Operations use => so that -> keeps its one meaning — function and currying type. An operation is a request with a payload and a result, not a curried function.

effectDecl ::= "effect" ID INDENT opSig+ DEDENT
opSig      ::= ID ":" type "=>" type
effect Db
    add : string => int
    list : Unit => string
    count : Unit => int

effect Log
    info : string => Unit

Zero-payload operations take Unit. Multi-field requests use a record payload, not a fake multi-argument operation:

type AddTask =
    body : string
    priority : int

effect Db
    add : AddTask => int

Lowering: effect E + arms → Stmt::Effect { operations }, where each op : P => R becomes EffectOperation { name, parameters: [P], return_type: R } — the same canonical node the Default op : fn(P) -> R produces. perform E.op aExpr::Perform.

-> belongs to functions and currying. => belongs to clauses and requests that yield a result: it appears in effect operations, handler arms, and match arms, always meaning "the left yields the right."

Handlers

[FLAVOR-ML-HANDLER] Handlers are first-class values. handler E followed by indented arms evaluates to a value of type Handler E. handle installs one or more such values around a computation, with do marking the handled body.

handlerValue ::= "handler" ID INDENT handlerArm+ DEDENT
handlerArm   ::= ID param* "=>" blockOrExpr
install      ::= "handle" expr+ "do" blockOrExpr
memoryDb : Unit -> Handler Db
memoryDb () =
    mut tasks = ""
    mut taskCount = 0

    handler Db
        add t =>
            taskCount := taskCount + 1
            tasks := "${tasks}#${toString taskCount} ${t}\n"
            taskCount

        list =>
            tasks

        count =>
            taskCount

Installing several at once replaces the Default flavor's repeated nesting:

db = memoryDb ()
log = silentLog ()

handle db log
do
    createTask "buy milk"

The mutable cells belong to the handler value: a fresh handler makes fresh state; passing the same value around shares it. Parameterised handlers compose with currying (filePersist path = … handler Persist …).

First-class handler values, the Handler E type, and multi-install are a shared-core feature, not ML-only sugar — see FLAVOR-HANDLER-VALUE. They lower to Expr::HandlerValue and Expr::Install; handle a b c do body desugars to nested installs. The Default flavor gains the same feature in brace spelling.

Match

[FLAVOR-ML-MATCH] match uses the same clause style as handlers: the scrutinee follows match, and each indented arm is Pattern => body. A one-payload constructor binds its payload directly — Success value, not Success { value }.

matchExpr ::= "match" expr INDENT matchArm+ DEDENT
matchArm  ::= pattern "=>" blockOrExpr
diskBytes =
    match saved
        Success value => length snap
        Error message => -1

Lowering: Expr::Match + MatchArm; Success valuePattern::Constructor { name: "Success", fields: ["value"] } — the same node the Default Success { value } produces. Wildcard _Pattern::Wildcard.

Records

[FLAVOR-ML-RECORD] Record construction is a layout block headed by the constructor name, with field = value lines. Inside a record literal the left of = is a field name, not a new binding; the indentation under a constructor makes that unambiguous.

recordExpr ::= ID INDENT fieldInit+ DEDENT
fieldInit  ::= ID "=" expr
textResp status bodyText =
    HttpResponse
        status = status
        headers = "Content-Type: text/plain"
        contentType = "text/plain"
        streamFd = -1
        isComplete = true
        partialBody = bodyText

Lowering: Expr::TypeConstructor { name, fields }; record update lowers to Expr::Update.

Blocks

[FLAVOR-ML-BLOCK] A function body, match arm, handler arm, or do body is an ordinary layout region containing bindings, mutations, performs, and a final expression. The final expression is the block's value. There is no separate { … } expression form in this flavor.

onPost body =
    id = perform Db.add body
    snap = perform Db.list
    written = perform Persist.flush snap
    perform Log.info "created"
    textResp 201 "created\n"

Lowering: Expr::Block { statements, value }, where value is the trailing expression — the same node the Default { … } block produces.

Canonical Lowering Table

Every ML form on the left lowers to the canonical node on the right (crates/osprey-ast/src/lib.rs). The Default-flavor spelling of the same node is in FLAVOR-LAYER.

ML surface Canonical AST node
x = e Stmt::Let { mutable: false }
mut x = e Stmt::Let { mutable: true }
x := e Stmt::Assignment
f x y = e (curried) one-param Stmt::Function returning a Lambda chain
f (x, y) = e (uncurried) flat multi-param Stmt::Function
\x y => e curried Expr::Lambda chain
f a b nested one-arg Expr::CallCall(Call(f,[a]),[b])
f (a, b) (saturated) single multi-arg Expr::CallCall(f, [a, b])
type T = + variant/field layout Stmt::Type + TypeVariant
[a, b, c] / xs[i] Expr::List / Expr::Index
layout block Expr::Block
match v + arms Expr::Match + MatchArm
Success value Pattern::Constructor { fields: ["value"] }
T + f = v lines Expr::TypeConstructor
effect E + op : P => R Stmt::Effect + EffectOperation
perform E.op a Expr::Perform
handler E + arms Expr::HandlerValue (shared-core addition)
handle a b do body Expr::Install (shared-core addition)

Worked Example

The same program a Default-flavor author would write with braces, fn, named arguments, and nested handle … in. It exercises curried definitions, partial application (textResp 201, c256 "213"), => effect operations, first-class handler values with owned mut state, and one grouped handle … do.

effect Db
    add : string => int
    list : Unit => string
    count : Unit => int

effect Log
    info : string => Unit

c256 : string -> string -> string
c256 n s =
    "\e[38;5;${n}m${s}\e[0m"

rose : string -> string
rose = c256 "213"

textResp : int -> string -> HttpResponse
textResp status bodyText =
    HttpResponse
        status = status
        headers = "Content-Type: text/plain"
        contentType = "text/plain"
        streamFd = -1
        isComplete = true
        partialBody = bodyText

memoryDb : Unit -> Handler Db
memoryDb () =
    mut tasks = ""
    mut taskCount = 0

    handler Db
        add t =>
            taskCount := taskCount + 1
            tasks := "${tasks}#${toString taskCount} ${t}\n"
            taskCount

        list => tasks
        count => taskCount

silentLog : Unit -> Handler Log
silentLog () =
    handler Log
        info m => ()

createTask : string -> HttpResponse
createTask body =
    id = perform Db.add body
    snap = perform Db.list
    perform Log.info "created #${toString id} ${snap}"
    textResp 201 "created task #${toString id}\n"

db = memoryDb ()
log = silentLog ()

handle db log
do
    response = createTask "buy milk"
    print (httpResponseBody response)

The first-class handlers make test doubles trivial — a test installs spy or stub handlers that close over the test's own mut cells around the call under test, with no Db/Log parameters polluting the production signature:

test "createTask stores the task and logs" =
    mut stored = ""
    mut logLine = ""

    db =
        handler Db
            add task =>
                stored := task
                1
            list => "#1 ${stored}\n"
            count => 1

    log =
        handler Log
            info message => logLine := message

    response =
        handle db log
        do
            createTask "buy milk"

    expectEqual 201 (httpResponseStatus response)
    expectEqual "buy milk" stored
    expectEqual "created #1 #1 buy milk\n" logLine

Resolved Syntax Questions

  • Zero-argument functions: a parameterless name = expr is a value binding; a name () = expr is a Unit -> T function. Pure constants are values (banner); () is used where recursion or effects make the call boundary meaningful (serveForever ()).
  • Lambdas: anonymous functions are written \param* => body (lowering to Expr::Lambda), keeping => as the clause/yield arrow and -> as the type arrow.
  • Effect annotations on signatures: the effect row follows the result type, as in the Default flavor (saveTask : string -> int ![Store, Log]).

References

These are the verified sources behind the hand-written ML frontend ([FLAVOR-ML-LAYOUT]): the recursive-descent / predictive parser, its Pratt (precedence-climbing) expression layer, and the offside-rule layout lexer.

1. Recursive-descent / predictive parsing foundations

2. Operator-precedence / Pratt parsing

  • Top Down Operator Precedence — Vaughan R. Pratt. 1973. Proc. 1st ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL '73), pp. 41–51. DOI: https://doi.org/10.1145/512927.512931 — The primary source authorizing the Pratt (top-down operator-precedence) expression parser: per-token prefix/infix handlers driven by binding powers.
  • Parsing Expressions by Precedence Climbing — Eli Bendersky. 2 Aug 2012. https://eli.thegreenplace.net/2012/08/02/parsing-expressions-by-precedence-climbing — Authorizes the precedence-climbing formulation of operator-precedence parsing (the loop-based, min-precedence variant used in production front-ends such as Clang).
  • Parsing Expressions by Recursive Descent: From Precedence Climbing to Pratt Parsing — Theodore S. Norvell, Memorial University of Newfoundland. https://www.engr.mun.ca/~theo/Misc/pratt_parsing.htm — Authorizes treating precedence climbing and Pratt parsing as the same algorithm, justifying a single binding-power table for prefix/infix/postfix/ternary operators.

3. The offside rule / layout-sensitive (indentation) syntax

4. Error recovery in recursive-descent (panic-mode / synchronization)

Verification (research subagent, 2026): the three DOIs (Pratt 10.1145/512927.512931, Landin 10.1145/365230.365257, Adams 10.1145/2429069.2429129) resolve through doi.org to the correct ACM DL records (ACM landing pages 403 to automated fetches; corroborated via doi.org redirect + dblp). Wirth ETH page + PDF, Adams author PDF, Nystrom, Bendersky, Norvell, and both Haskell 2010 chapters were each fetched and matched. Dragon Book §4.1.3–4.1.4 are the standard 2nd-ed. TOC section numbers (book/publisher confirmed; exact section numbers not page-verified).

Cross-references