Quick intro to nom

A parser combinator library for Rust

What is a parser?

get these slides at

loige.link/nom-intro

What is a parser?

A program that can turn text (or bytes) into structured infromation

get these slides at

loige.link/nom-intro

What is a parser?

A program that can turn text (or bytes) into structured infromation

"Point: (22, 17, -11)"

struct Point3D {
  x: i32,
  y: i32,
  z: i32,
}

get these slides at

loige.link/nom-intro

Naive approach

String split like a mad-man 🤪

[label, remainder] = split(input, ": ", 2)
input = "Point: (22, 17, -11)"

label

remainder

get these slides at

loige.link/nom-intro

Naive approach

String split like a mad-man 🤪

[label, remainder] = split(input, ": ", 2)
input = "Point: (22, 17, -11)"
[, remainder] = split(remainder, "(", 2)

Naive approach

String split like a mad-man 🤪

[label, remainder] = split(input, ": ", 2)
input = "Point: (22, 17, -11)"
[, remainder] = split(remainder, "(", 2)
[numbers,] = split(remainder, ")", 2)

Naive approach

String split like a mad-man 🤪

[label, remainder] = split(input, ": ", 2)
input = "Point: (22, 17, -11)"
[, remainder] = split(remainder, "(", 2)
[numbers,] = split(remainder, ")", 2)
[x, y, z] = split(numbers, ", ", 3)

The problem with this approach

  1. Not always suitable (you need "delimiters")
  2. Simple for simple stuff, increasingly hard for "realistic" use cases
  3. Hard to handle errors consistently

Consuming parser

Tries to "eat and match" the source text (or bytes)

Consuming parser

Tries to "eat and match" the source text (or bytes)

input = "Point: (22, 17, -11)"

1. readLabel(input)

"Point", ": (22, 17, -11)"

Parsed data

Remainder string

Consuming parser

Tries to "eat and match" the source text (or bytes)

input = "Point: (22, 17, -11)"

1. readLabel(input)

"Point", ": (22, 17, -11)"

2. readSeparator(remainder)

":", " (22, 17, -11)"

Consuming parser

Tries to "eat and match" the source text (or bytes)

input = "Point: (22, 17, -11)"

1. readLabel(input)

"Point", ": (22, 17, -11)"

2. readSeparator(remainder)

":", " (22, 17, -11)"

3. readSpaces(remainder)

" ", "(22, 17, -11)"

Consuming parser

Tries to "eat and match" the source text (or bytes)

input = "Point: (22, 17, -11)"

1. readLabel(input)

"Point", ": (22, 17, -11)"

2. readSeparator(remainder)

":", " (22, 17, -11)"

3. readSpaces(remainder)

" ", "(22, 17, -11)"

4. readIntTuple(remainder)

[22,17,-11], ""

Consuming parser

Handling errors: if we fail to match we have to return an error

input = "Hello World"

readNumber(input)

"Cannot match number on 'Hello World'"
$ cargo add nom
fn main() {
    let s = "Point: (22, 17, -11)";
    
    let (_, point) = parse_point(s).unwrap();
    println!("{:?}", point);
}
fn parse_point(input: &str) -> IResult<&str, Point3D> {
    // ... parsing logic goes here

    Ok((input, Point3D { x, y, z }))
}

the value to parse from

A nom Result type

Input type

Output type

fn parse_point(input: &str) -> IResult<&str, Point3D> {
	let (input, _) = tag("Point: ")(input)?;
    
    // ...

    Ok((input, Point3D { x, y, z }))
}

nom combinator: exact match

what to match

the value to match against

If it fails to match

we propagate the error

the parsed value

(in this case "Point: ")

the remainder string

fn parse_point(input: &str) -> IResult<&str, Point3D> {
	let (input, _) = tag("Point: ")(input)?;
    let (input, _) = tag("(")(input)?;

    // ...

    Ok((input, Point3D { x, y, z }))
}
fn parse_point(input: &str) -> IResult<&str, Point3D> {
	let (input, _) = tag("Point: ")(input)?;
    let (input, _) = tag("(")(input)?;
    let (input, x) = i32(input)?;

    // ...

    Ok((input, Point3D { x, y, z }))
}

combinator that parses a sign (+ or -) and sequence of digits and converts them to a i32

first coordinate

fn parse_point(input: &str) -> IResult<&str, Point3D> {
	let (input, _) = tag("Point: ")(input)?;
    let (input, _) = tag("(")(input)?;
    let (input, x) = i32(input)?;
	let (input, _) = tag(", ")(input)?;
    let (input, y) = i32(input)?;
    let (input, _) = tag(", ")(input)?;
    let (input, z) = i32(input)?;
    let (input, _) = tag(")")(input)?;

    Ok((input, Point3D { x, y, z }))
}

With nom you generally break parsers into smaller and reusable parsers

/// parses "," followed by 0 or more spaces
fn separator(input: &str) -> IResult<&str, ()> {
    let (input, _) = pair(tag(","), space0)(input)?;
    Ok((input, ()))
}

nom combinator that applies 2 parsers in sequence and returns a tuple with the 2 parsed values

/// parses 3 numbers separated by a separator (e.g. "22, 17, -11")
fn parse_coordinates(input: &str) -> IResult<&str, (i32, i32, i32)> {
    let (input, (x, _, y, _, z)) = tuple((
        i32,
        separator,
        i32,
        separator,
        i32
    ))(input)?;

    Ok((input, (x, y, z)))
}

nom combinator that applies a sequence of parsers in sequence and returns a tuple with the parsed values

Note how we are reusing the parser we just created!

fn parse_point2(input: &str) -> IResult<&str, Point3D> {
    let (input, _) = tag("Point: ")(input)?;
    let (input, (x, y, z)) = delimited(
        tag("("),
        parse_coordinates,
        tag(")")
    )(input)?;
    
    Ok((input, Point3D { x, y, z }))
}

Links

get these slides at

loige.link/nom-intro