2024-09-12 22:23:13 +02:00

5.1 KiB

tbsp - tree-based source-processing language

tbsp is an awk-like language that operates on tree-sitter syntax trees. to motivate the need for such a program, we could begin by writing a markdown-to-html converter using tbsp and tree-sitter-md 0. we need some markdown to begin with:

# 1 heading

content of first paragraph

## 1.1 heading

content of nested paragraph

for future reference, this markdown is parsed like so by tree-sitter-md (visualization generated by tree-viz 1):

document
|  section
|  |  atx_heading
|  |  |  atx_h1_marker "#"
|  |  |  heading_content inline "1 heading"
|  |  paragraph
|  |  |  inline "content of first paragraph"
|  |  section
|  |  |  atx_heading
|  |  |  |  atx_h2_marker "##"
|  |  |  |  heading_content inline "1.1 heading"
|  |  |  paragraph
|  |  |  |  inline "content of nested paragraph"

onto the converter itself. every tbsp program is written as a collection of stanzas. typically, we start with a stanza like so:

BEGIN {
    int depth = 0;

    print("<html>\n");
    print("<body>\n");
}

the stanza begins with a "pattern", in this case, "BEGIN", and is followed a block of code. this block specifically, is executed right at the beginning, before traversing the parse tree. in this stanza, we set a "depth" variable to keep track of nesting of markdown headers, and begin our html document by printing the "<html>" and "" tags.

we can follow this stanza with an "END" stanza, that is executed after the traversal:

END {
    print("</body>\n");
    print("</html>\n");
}

in this stanza, we close off the tags we opened at the start of the document. we can move onto the interesting bits of the conversion now:

enter section {
    depth += 1;
}
leave section {
    depth -= 1;
}

the above stanzas begin with "enter" and "leave" clauses, followed by the name of a tree-sitter node kind: "section". the "section" identifier is visible in the tree-visualization above, it encompasses a markdown-section, and is created for every markdown header. to understand how tbsp executes above stanzas:

document                                 ...  depth = 0 
|  section <-------- enter section (1)   ...  depth = 1 
|  |  atx_heading
|  |  |  inline
|  |  paragraph
|  |  |  inline
|  |  section <----- enter section (2)   ...  depth = 2 
|  |  |  atx_heading
|  |  |  | inline
|  |  |  paragraph
|  |  |  | inline
|  |  | <----------- leave section (2)   ...  depth = 1 
|  | <-------------- leave section (1)   ...  depth = 0 

the following stanzas should be self-explanatory now:

enter atx_heading {
    print("<h");
    print(depth);
    print(">");
}
leave atx_heading {
    print("</h");
    print(depth);
    print(">\n");
}

enter inline {
    print(text(node));
}

but an explanation is included nonetheless:

document                                 ...  depth = 0 
|  section <-------- enter section (1)   ...  depth = 1 
|  |  atx_heading <- enter atx_heading   ...  print "<h1>"
|  |  |  inline <--- enter inline        ...  print ..
|  |  | <----------- leave atx_heading   ...  print "</h1>"
|  |  paragraph
|  |  |  inline <--- enter inline        ...  print ..
|  |  section <----- enter section (2)   ...  depth = 2 
|  |  |  atx_heading enter atx_heading   ...  print "<h2>"
|  |  |  | inline <- enter inline        ...  print ..
|  |  |  | <-------- leave atx_heading   ...  print "</h2>"
|  |  |  paragraph
|  |  |  | inline <- enter inline        ...  print ..
|  |  | <----------- leave section (2)   ...  depth = 1 
|  | <-------------- leave section (1)   ...  depth = 0 

the examples directory contains a complete markdown-to-html converter, along with a few other motivating examples.


usage:

the tbsp evaluator is written in rust, use cargo to build and run:

cargo build --release
./target/release/tbsp --help

tbsp requires three inputs:

  • a tbsp program, referred to as "program file"
  • a language
  • an input file or some input text at stdin

you can run the interpreter like so (this program prints an overview of a rust file):

$ ./target/release/tbsp \
      -f./examples/code-overview/overview.tbsp \
      -l rust \
      src/main.rs
module
   └╴struct Cli
   └╴trait Cli
      └╴fn program
      └╴fn language
      └╴fn file
   └╴fn try_consume_stdin
   └╴fn main

roadmap:

  • interpreter performance
    • introduce a hir with arena allocated blocks, expr
    • bytecode VM?
    • look into embedding high perf VMs, lua etc.
  • pattern matching
    • allow matching on tree-sitter queries
    • support captures
  • language features
    • arrays and loops
    • access node children
    • access node fields
    • repr for ranges
    • comments
    • regexes