Basis

Table of Contents

On Executing Types

A type in Neut is compiled into a pointer to a binary function like the following (pseudocode):

define discard-or-copy-value(action-selector, value) {
  if eq-int(action-selector, 0) {
    discard-value(value);
    Unit
  } else {
    let new-value = copy-value(value);
    new-value
  }
}

These functions are then used to discard/copy values when necessary.

Discarding Values

Let's see how types are executed when discarding values. For example, consider the following code:

define foo(xs: list(int)) -> unit {
  Unit
}

Note that the variable xs isn't used. Because of that, the compiler translates the code above into the following (pseudocode; won't typecheck):

define foo(xs: list(int)) -> unit {
  let f = list(int);
  f(0, xs); // passing `0` to discard `xs`
  Unit
}

Note that the above example executes the type list(int) as a function.

Copying Values

Let's see how types are executed when copying values. For example, consider the following code:

define foo(!xs: list(int)) -> unit {
  some-func(xs, xs)
}

Note that the variable xs is used twice. Because of that, the compiler translates the above code into the following (pseudocode; won't typecheck):

define foo(!xs: list(int)) -> unit {
  let f = list(int);
  let xs-clone = f(1, xs); // passing `1` to copy `xs`
  some-func(xs-clone, xs)
}

Note that the above example executes the type list(int) as a function.

You must prefix a variable with ! at its definition site if the variable may need to be copied. Likewise, if a free variable captured by a term-level define cannot be copied for free, that free variable must have been defined with the ! prefix.

The prefix ! is unnecessary if the variable can be copied for free.

On Immediate Values

We don't have to discard immediates like integers or floats because their internal representations don't depend on memory-related operations like malloc or free. Because of that, "discarding" immediate values does nothing. Also, "copying" immediate values means reusing the original values.

More specifically, the type of an immediate is compiled into a pointer to the following function (pseudocode):

inline discard-or-copy-immediate(selector, value) {
  if eq-int(selector, 0) {
    0     // discard: we have nothing to do on `value`
  } else {
    value // copy: we can simply reuse the immediate `value`
  }
}

These fake discard/copy operations are optimized away at compile time.

Also, this function is internally called "base.#.imm". Try compiling your project as follows:

neut build TARGET --emit llvm --skip-link

Then, take a peek at the build directory. You'll find the name here and there.

Since every type is translated into a pointer to a function, a type is an immediate value. Thus, type is compiled into base.#.imm.

Free-Malloc Canceling

Thanks to its static nature, memory allocation in Neut can sometimes be optimized away. Consider the following code:

data int-list {
| Nil
| Cons(int, int-list)
}

// [1, 5, 9] => [2, 6, 10]
define increment(xs: int-list) -> int-list {
  match xs {
  | Nil =>
    Nil
  // ↓ the `Cons` clause
  | Cons(x, rest) =>
    Cons(add-int(x, 1), increment(rest))
  }
}

The expected behavior of the Cons clause above would be something like the following:

  1. obtain x and rest from xs
  2. free the outer tuple of xs
  3. calculate add-int(x, 1) and increment(rest)
  4. allocate a memory region using malloc to hold the result
  5. store the calculated values to the pointer and return it

However, since the size of Cons(x, rest) and Cons(add-int(x, 1), increment(rest)) is known to be the same at compile time, the pair of free and malloc can be optimized away, as follows:

  1. obtain x and rest from xs
  2. calculate add-int(x, 1) and increment(rest)
  3. store the calculated values to xs (overwrite)

Neut performs this optimization. When a free is required, Neut looks for a malloc of the same size and optimizes away such a pair if one exists. The resulting assembly code thus performs in-place updates.

Free-Malloc Canceling and Branching

This optimization works across branches. For example, consider the following:

// (an `insert` function in bubble sort)
define insert(v: int, xs: int-list) -> int-list {
  match xs {
  | Nil =>
    // ...
  | Cons(y, ys) =>           // (X)
    if gt-int(v, y) {
      Cons(y, insert(v, ys)) // (Y)
    } else {
      Cons(v, Cons(y, ys))   // (Z)
    }
  }
}

At point (X), free against xs is required. However, this free can be canceled since mallocs of the same size can be found in all the possible branches (here, (Y) and (Z)). Thus, in the code above, the deallocation of xs at (X) is removed, and the memory region of xs is reused at (Y) and (Z), resulting in an in-place update of xs.

On the other hand, consider rewriting the code above into something like the following:

define foo(v: int, xs: int-list) -> int-list {
  match xs {
  | Nil =>
    // ...
  | Cons(y, ys) =>         // (X')
    if gt-int(v, y) {
      Nil                  // (Y')
    } else {
      Cons(v, Cons(y, ys)) // (Z')
    }
  }
}

At this point, the free against xs at (X') can't be optimized away since there is a branch (namely, (Y')) that doesn't perform a malloc of the same size as xs.

Malloc-Free Canceling

Neut also performs the opposite optimization. If a region allocated by malloc does not escape and is eventually deallocated by free, the compiler replaces that heap allocation with a stack allocation.

As a simple example, consider the following code:

define foo() -> int {
  let ptr = malloc(8);
  store-int(42, ptr);
  let value = load-int(ptr);
  free(ptr);
  value
}

After optimization, this behaves like the following pseudocode:

define foo() -> int {
  let ptr = alloca(8);
  store-int(42, ptr);
  let value = load-int(ptr);
  value
}

That is, the compiler removes the malloc/free pair and uses a stack slot instead.

Name Resolution

Resolving Module Aliases

Let's see how the name of a module alias is resolved. Here, the name of a module alias is something like the core in core.bool.and:

import {
  core.bool,
}

define use-external-module-function() -> bool {
  let value = core.bool.and(True, False);
  ...
}

When compiling a module, the compiler reads the field dependency in module.ens and adds correspondences like the following to its internal state:

// alias => (the digest of the library)
core => "jIx5FxfoymZ-X0jLXGcALSwK4J7NlR1yCdXqH2ij67o"
foo-module => "JEpjuzZ0rlqxiVuCnD000jEKIA_Y6ku1L3J139h3M6Q"
bar-module => "zptXghmyD5druBl8kx2Qrei6O6fDsKCA7z2KoHp1aqA"
...

The compiler then resolves aliases as follows:

core.bool.and

↓

jIx5FxfoymZ-X0jLXGcALSwK4J7NlR1yCdXqH2ij67o.bool.and

--------------

foo-module.path.to.some.file.my-function

↓

JEpjuzZ0rlqxiVuCnD000jEKIA_Y6ku1L3J139h3M6Q.path.to.some.file.my-function

--------------

...

Resolving this

Let's see how this is resolved. Here, this is a component of a global variable, as in the following example:

import {
  this.path.to.file,
}

define use-my-function() -> unit {
  let value = this.path.to.file.my-function();
  ...
}

The first thing to note here is that every module is marked as "main" or "library" during compilation. The main module is the module in which neut build is executed. Library modules are all the other modules that are necessary for compilation.

All the occurrences of this in the main module are kept intact during compilation. Thus, the resulting assembly file contains symbols like this.foo.bar.

On the other hand, all occurrences of this in a library module are resolved into their corresponding digests. More specifically, when processing a library module, the compiler adds correspondences like the following:

// this => (the digest of the library)
this => "jIx5FxfoymZ-X0jLXGcALSwK4J7NlR1yCdXqH2ij67o"

The compiler then resolves this as follows:

this.string.io.get-line

↓

jIx5FxfoymZ-X0jLXGcALSwK4J7NlR1yCdXqH2ij67o.string.io.get-line

Thus, the resulting assembly file contains symbols like these.

Leading Bars and Trailing Commas

Comma-Separated Sequences (And-Sequences)

Every comma-separated sequence like a, b, c can have a trailing comma like a, b, c,.

If a comma-separated sequence has a trailing comma, the sequence is formatted vertically by the built-in formatter.

Bar-Separated Sequences (Or-Sequences)

Every bar-separated sequence like a | b | c can have a leading bar like | a | b | c.

If a bar-separated sequence has a leading bar, the sequence is formatted vertically by the built-in formatter.

Compiler Configuration

The behavior of the compiler can be adjusted using the following environment variables:

Environment VariableMeaning
NEUT_CLANGthe command to call clang
NEUT_CORE_MODULE_DIGESTthe digest of the core module
NEUT_CORE_MODULE_URLthe URL of the core module

The default values are as follows:

Environment VariableDefault Value
NEUT_CLANGclang
NEUT_CORE_MODULE_DIGEST(undefined; you must set one)
NEUT_CORE_MODULE_URL(undefined; you must set one)

Other Basic Facts

  • Neut is call-by-value
  • Neut is impure
  • The type of main must be () -> unit
  • The compiler has built-in references to names under core
  • Syntactic constructs like List[1, 2, 3] depend on functions in core