+documentation

2024-09-12 22:23:13 +02:00 · 2024-09-12 22:23:13 +02:00 · 0bde536f3e
commit 0bde536f3e
parent ed3920e18f
2 changed files with 281 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,78 @@
+# TBSP
+> Tree-Based Source-processing Language
+
+## Notes
+I stole the idea from here:
+[https://github.com/oppiliappan/tbsp](https://github.com/oppiliappan/tbsp)
+
+Now, there are some obvious problems with this project:
+ its written in rust
+ it tries to be a general purpose language for no reason
+ >"[ ] bytecode VM?"; serious?
+
+I have tried contacting the owner, the response is pending.
+
+I have tried hacking Bison into this behaviour, its too noisy.
+
+I firmly believe code generation is the way to go, not just here,
+but for DSL-es in general.
+
+This project will heavy depend on tree-sitter,
+there is no sense pretending otherwise with decoupling.
+
+The current implementation (in python) is obviously terrible.
+It does work however.
+
+## Language semantics
+Modelled half after the original, half after Flex/Bison.
+```
+<declaration-section>
+%%
+<rule-section>
+%%
+<code-section>
+```
+
+### Declaration section
+```
+%top { <...> }    // code to be pasted at the top of the source file
+%language <lang>  // tree-sitter langauge name (for the right includes)
+```
+
+### Rule section
+```
+enter <node-type> { <...> } // code to run when tree-sitter node-type <node-type> is encountered
+close <node-type> { <...> } // code to run when tree-sitter node-type <node-type> is poped from
+```
+
+### Code
+The code section is verbatim pasted to the end of the output file.
+#### Globals
+```C
+int tbtraverse(const char * const code);    // master function; rules are evaluated here
+```
+#### In tbtraverse
+```C
+char * tbtext;   // copy of the current nodes text value (not ts_node_string); XXX: this could be much optimized
+int tblen;       // string lenght of tbtext
+// XXX: these should probably be renamed
+TSNode current_node;    // node corresponding to the rule in enter rules
+TSNode previous_node;   // node corresponding to the rule in close rules
+```
+
+### TODO
+ port "backend" to C (from C++)
+ port from python (can wait)
+  - optimize the allocation of tbtext
+  - optimize from strcmp()
+
+### Thinking area
+```C
+// This should be allowed to mean 'a' or 'b'
+enter a b { <...> }
+
+// This should be allowed to mean 'enter' or 'leave'
+enter leave a { <...> }
+
+// In node type blobbing should probably be allowed, however regex sounds like overkill
+```
--- a/documentation/original_readme.md
+++ b/documentation/original_readme.md
@ -0,0 +1,203 @@
+tbsp - tree-based source-processing language
+
+
+tbsp is an awk-like language that operates on tree-sitter
+syntax trees. to motivate the need for such a program, we
+could begin by writing a markdown-to-html converter using
+tbsp and tree-sitter-md [0]. we need some markdown to begin
+with:
+
+
+    # 1 heading
+
+    content of first paragraph
+
+    ## 1.1 heading
+
+    content of nested paragraph
+
+
+for future reference, this markdown is parsed like so by
+tree-sitter-md (visualization generated by tree-viz [1]):
+
+
+    document
+    |  section
+    |  |  atx_heading
+    |  |  |  atx_h1_marker "#"
+    |  |  |  heading_content inline "1 heading"
+    |  |  paragraph
+    |  |  |  inline "content of first paragraph"
+    |  |  section
+    |  |  |  atx_heading
+    |  |  |  |  atx_h2_marker "##"
+    |  |  |  |  heading_content inline "1.1 heading"
+    |  |  |  paragraph
+    |  |  |  |  inline "content of nested paragraph"
+
+
+onto the converter itself. every tbsp program is written as
+a collection of stanzas. typically, we start with a stanza
+like so:
+
+
+    BEGIN {
+        int depth = 0;
+
+        print("<html>\n");
+        print("<body>\n");
+    }
+
+
+the stanza begins with a "pattern", in this case, "BEGIN",
+and is followed a block of code. this block specifically, is
+executed right at the beginning, before traversing the parse
+tree. in this stanza, we set a "depth" variable to keep
+track of nesting of markdown headers, and begin our html
+document by printing the "<html>" and "<body>" tags.
+
+we can follow this stanza with an "END" stanza, that is
+executed after the traversal:
+
+
+    END {
+        print("</body>\n");
+        print("</html>\n");
+    }
+
+
+in this stanza, we close off the tags we opened at the start
+of the document. we can move onto the interesting bits of
+the conversion now:
+
+
+    enter section {
+        depth += 1;
+    }
+    leave section {
+        depth -= 1;
+    }
+
+
+the above stanzas begin with "enter" and "leave" clauses,
+followed by the name of a tree-sitter node kind: "section".
+the "section" identifier is visible in the
+tree-visualization above, it encompasses a markdown-section,
+and is created for every markdown header. to understand how
+tbsp executes above stanzas:
+
+
+    document                                 ...  depth = 0 
+    |  section <-------- enter section (1)   ...  depth = 1 
+    |  |  atx_heading
+    |  |  |  inline
+    |  |  paragraph
+    |  |  |  inline
+    |  |  section <----- enter section (2)   ...  depth = 2 
+    |  |  |  atx_heading
+    |  |  |  | inline
+    |  |  |  paragraph
+    |  |  |  | inline
+    |  |  | <----------- leave section (2)   ...  depth = 1 
+    |  | <-------------- leave section (1)   ...  depth = 0 
+
+
+the following stanzas should be self-explanatory now:
+
+
+    enter atx_heading {
+        print("<h");
+        print(depth);
+        print(">");
+    }
+    leave atx_heading {
+        print("</h");
+        print(depth);
+        print(">\n");
+    }
+
+    enter inline {
+        print(text(node));
+    }
+
+
+but an explanation is included nonetheless:
+
+
+    document                                 ...  depth = 0 
+    |  section <-------- enter section (1)   ...  depth = 1 
+    |  |  atx_heading <- enter atx_heading   ...  print "<h1>"
+    |  |  |  inline <--- enter inline        ...  print ..
+    |  |  | <----------- leave atx_heading   ...  print "</h1>"
+    |  |  paragraph
+    |  |  |  inline <--- enter inline        ...  print ..
+    |  |  section <----- enter section (2)   ...  depth = 2 
+    |  |  |  atx_heading enter atx_heading   ...  print "<h2>"
+    |  |  |  | inline <- enter inline        ...  print ..
+    |  |  |  | <-------- leave atx_heading   ...  print "</h2>"
+    |  |  |  paragraph
+    |  |  |  | inline <- enter inline        ...  print ..
+    |  |  | <----------- leave section (2)   ...  depth = 1 
+    |  | <-------------- leave section (1)   ...  depth = 0 
+
+
+the examples directory contains a complete markdown-to-html
+converter, along with a few other motivating examples.
+
+---
+
+usage:
+
+the tbsp evaluator is written in rust, use cargo to build
+and run:
+
+    cargo build --release
+    ./target/release/tbsp --help
+
+
+tbsp requires three inputs:
+
+- a tbsp program, referred to as "program file"
+- a language
+- an input file or some input text at stdin
+
+
+you can run the interpreter like so (this program prints an
+overview of a rust file):
+
+    $ ./target/release/tbsp \
+          -f./examples/code-overview/overview.tbsp \
+          -l rust \
+          src/main.rs
+    module
+       └╴struct Cli
+       └╴trait Cli
+          └╴fn program
+          └╴fn language
+          └╴fn file
+       └╴fn try_consume_stdin
+       └╴fn main
+
+    
+---
+
+roadmap:
+
+- interpreter performance
+  - [ ] introduce a hir with arena allocated blocks, expr
+  - [ ] bytecode VM?
+  - [ ] look into embedding high perf VMs, lua etc.
+- pattern matching
+  - [ ] allow matching on tree-sitter queries
+  - [ ] support captures
+- language features
+  - [ ] arrays and loops
+  - [ ] access node children
+  - [x] access node fields
+  - [ ] repr for ranges
+  - [ ] comments
+  - [ ] regexes
+
+
+[0]: https://github.com/tree-sitter-grammars/tree-sitter-markdown
+[1]: https://git.peppe.rs/cli/tree-viz