+documentation

2024-09-12 22:23:13 +02:00 · 2024-09-12 22:23:13 +02:00 · 0bde536f3e
commit 0bde536f3e
parent ed3920e18f
2 changed files with 281 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,78 @@
 # TBSP
 > Tree-Based Source-processing Language
 ## Notes
 I stole the idea from here:
 [https://github.com/oppiliappan/tbsp](https://github.com/oppiliappan/tbsp)
 Now, there are some obvious problems with this project:
 + its written in rust
 + it tries to be a general purpose language for no reason
 + >"[ ] bytecode VM?"; serious?
 I have tried contacting the owner, the response is pending.
 I have tried hacking Bison into this behaviour, its too noisy.
 I firmly believe code generation is the way to go, not just here,
 but for DSL-es in general.
 This project will heavy depend on tree-sitter,
 there is no sense pretending otherwise with decoupling.
 The current implementation (in python) is obviously terrible.
 It does work however.
 ## Language semantics
 Modelled half after the original, half after Flex/Bison.
 ```
 <declaration-section>
 %%
 <rule-section>
 %%
 <code-section>
 ```
 ### Declaration section
 ```
 %top { <...> }    // code to be pasted at the top of the source file
 %language <lang>  // tree-sitter langauge name (for the right includes)
 ```
 ### Rule section
 ```
 enter <node-type> { <...> } // code to run when tree-sitter node-type <node-type> is encountered
 close <node-type> { <...> } // code to run when tree-sitter node-type <node-type> is poped from
 ```
 ### Code
 The code section is verbatim pasted to the end of the output file.
 #### Globals
 ```C
 int tbtraverse(const char * const code);    // master function; rules are evaluated here
 ```
 #### In tbtraverse
 ```C
 char * tbtext;   // copy of the current nodes text value (not ts_node_string); XXX: this could be much optimized
 int tblen;       // string lenght of tbtext
 // XXX: these should probably be renamed
 TSNode current_node;    // node corresponding to the rule in enter rules
 TSNode previous_node;   // node corresponding to the rule in close rules
 ```
 ### TODO
 + port "backend" to C (from C++)
 + port from python (can wait)
  - optimize the allocation of tbtext
  - optimize from strcmp()
 ### Thinking area
 ```C
 // This should be allowed to mean 'a' or 'b'
 enter a b { <...> }
 // This should be allowed to mean 'enter' or 'leave'
 enter leave a { <...> }
 // In node type blobbing should probably be allowed, however regex sounds like overkill
 ```
--- a/documentation/original_readme.md
+++ b/documentation/original_readme.md
@ -0,0 +1,203 @@
 tbsp - tree-based source-processing language
 tbsp is an awk-like language that operates on tree-sitter
 syntax trees. to motivate the need for such a program, we
 could begin by writing a markdown-to-html converter using
 tbsp and tree-sitter-md [0]. we need some markdown to begin
 with:
    # 1 heading
    content of first paragraph
    ## 1.1 heading
    content of nested paragraph
 for future reference, this markdown is parsed like so by
 tree-sitter-md (visualization generated by tree-viz [1]):
    document
    |  section
    |  |  atx_heading
    |  |  |  atx_h1_marker "#"
    |  |  |  heading_content inline "1 heading"
    |  |  paragraph
    |  |  |  inline "content of first paragraph"
    |  |  section
    |  |  |  atx_heading
    |  |  |  |  atx_h2_marker "##"
    |  |  |  |  heading_content inline "1.1 heading"
    |  |  |  paragraph
    |  |  |  |  inline "content of nested paragraph"
 onto the converter itself. every tbsp program is written as
 a collection of stanzas. typically, we start with a stanza
 like so:
    BEGIN {
        int depth = 0;
        print("<html>\n");
        print("<body>\n");
    }
 the stanza begins with a "pattern", in this case, "BEGIN",
 and is followed a block of code. this block specifically, is
 executed right at the beginning, before traversing the parse
 tree. in this stanza, we set a "depth" variable to keep
 track of nesting of markdown headers, and begin our html
 document by printing the "<html>" and "<body>" tags.
 we can follow this stanza with an "END" stanza, that is
 executed after the traversal:
    END {
        print("</body>\n");
        print("</html>\n");
    }
 in this stanza, we close off the tags we opened at the start
 of the document. we can move onto the interesting bits of
 the conversion now:
    enter section {
        depth += 1;
    }
    leave section {
        depth -= 1;
    }
 the above stanzas begin with "enter" and "leave" clauses,
 followed by the name of a tree-sitter node kind: "section".
 the "section" identifier is visible in the
 tree-visualization above, it encompasses a markdown-section,
 and is created for every markdown header. to understand how
 tbsp executes above stanzas:
    document                                 ...  depth = 0 
    |  section <-------- enter section (1)   ...  depth = 1 
    |  |  atx_heading
    |  |  |  inline
    |  |  paragraph
    |  |  |  inline
    |  |  section <----- enter section (2)   ...  depth = 2 
    |  |  |  atx_heading
    |  |  |  | inline
    |  |  |  paragraph
    |  |  |  | inline
    |  |  | <----------- leave section (2)   ...  depth = 1 
    |  | <-------------- leave section (1)   ...  depth = 0 
 the following stanzas should be self-explanatory now:
    enter atx_heading {
        print("<h");
        print(depth);
        print(">");
    }
    leave atx_heading {
        print("</h");
        print(depth);
        print(">\n");
    }
    enter inline {
        print(text(node));
    }
 but an explanation is included nonetheless:
    document                                 ...  depth = 0 
    |  section <-------- enter section (1)   ...  depth = 1 
    |  |  atx_heading <- enter atx_heading   ...  print "<h1>"
    |  |  |  inline <--- enter inline        ...  print ..
    |  |  | <----------- leave atx_heading   ...  print "</h1>"
    |  |  paragraph
    |  |  |  inline <--- enter inline        ...  print ..
    |  |  section <----- enter section (2)   ...  depth = 2 
    |  |  |  atx_heading enter atx_heading   ...  print "<h2>"
    |  |  |  | inline <- enter inline        ...  print ..
    |  |  |  | <-------- leave atx_heading   ...  print "</h2>"
    |  |  |  paragraph
    |  |  |  | inline <- enter inline        ...  print ..
    |  |  | <----------- leave section (2)   ...  depth = 1 
    |  | <-------------- leave section (1)   ...  depth = 0 
 the examples directory contains a complete markdown-to-html
 converter, along with a few other motivating examples.
 ---
 usage:
 the tbsp evaluator is written in rust, use cargo to build
 and run:
    cargo build --release
    ./target/release/tbsp --help
 tbsp requires three inputs:
 - a tbsp program, referred to as "program file"
 - a language
 - an input file or some input text at stdin
 you can run the interpreter like so (this program prints an
 overview of a rust file):
    $ ./target/release/tbsp \
          -f./examples/code-overview/overview.tbsp \
          -l rust \
          src/main.rs
    module
       └╴struct Cli
       └╴trait Cli
          └╴fn program
          └╴fn language
          └╴fn file
       └╴fn try_consume_stdin
       └╴fn main
 ---
 roadmap:
 - interpreter performance
  - [ ] introduce a hir with arena allocated blocks, expr
  - [ ] bytecode VM?
  - [ ] look into embedding high perf VMs, lua etc.
 - pattern matching
  - [ ] allow matching on tree-sitter queries
  - [ ] support captures
 - language features
  - [ ] arrays and loops
  - [ ] access node children
  - [x] access node fields
  - [ ] repr for ranges
  - [ ] comments
  - [ ] regexes
 [0]: https://github.com/tree-sitter-grammars/tree-sitter-markdown
 [1]: https://git.peppe.rs/cli/tree-viz