diff --git a/README.md b/README.md new file mode 100644 index 0000000..33a662a --- /dev/null +++ b/README.md @@ -0,0 +1,78 @@ +# TBSP +> Tree-Based Source-processing Language + +## Notes +I stole the idea from here: +[https://github.com/oppiliappan/tbsp](https://github.com/oppiliappan/tbsp) + +Now, there are some obvious problems with this project: ++ its written in rust ++ it tries to be a general purpose language for no reason ++ >"[ ] bytecode VM?"; serious? + +I have tried contacting the owner, the response is pending. + +I have tried hacking Bison into this behaviour, its too noisy. + +I firmly believe code generation is the way to go, not just here, +but for DSL-es in general. + +This project will heavy depend on tree-sitter, +there is no sense pretending otherwise with decoupling. + +The current implementation (in python) is obviously terrible. +It does work however. + +## Language semantics +Modelled half after the original, half after Flex/Bison. +``` +<declaration-section> +%% +<rule-section> +%% +<code-section> +``` + +### Declaration section +``` +%top { <...> } // code to be pasted at the top of the source file +%language <lang> // tree-sitter langauge name (for the right includes) +``` + +### Rule section +``` +enter <node-type> { <...> } // code to run when tree-sitter node-type <node-type> is encountered +close <node-type> { <...> } // code to run when tree-sitter node-type <node-type> is poped from +``` + +### Code +The code section is verbatim pasted to the end of the output file. +#### Globals +```C +int tbtraverse(const char * const code); // master function; rules are evaluated here +``` +#### In tbtraverse +```C +char * tbtext; // copy of the current nodes text value (not ts_node_string); XXX: this could be much optimized +int tblen; // string lenght of tbtext +// XXX: these should probably be renamed +TSNode current_node; // node corresponding to the rule in enter rules +TSNode previous_node; // node corresponding to the rule in close rules +``` + +### TODO ++ port "backend" to C (from C++) ++ port from python (can wait) + - optimize the allocation of tbtext + - optimize from strcmp() + +### Thinking area +```C +// This should be allowed to mean 'a' or 'b' +enter a b { <...> } + +// This should be allowed to mean 'enter' or 'leave' +enter leave a { <...> } + +// In node type blobbing should probably be allowed, however regex sounds like overkill +``` diff --git a/documentation/original_readme.md b/documentation/original_readme.md new file mode 100644 index 0000000..cb64b91 --- /dev/null +++ b/documentation/original_readme.md @@ -0,0 +1,203 @@ +tbsp - tree-based source-processing language + + +tbsp is an awk-like language that operates on tree-sitter +syntax trees. to motivate the need for such a program, we +could begin by writing a markdown-to-html converter using +tbsp and tree-sitter-md [0]. we need some markdown to begin +with: + + + # 1 heading + + content of first paragraph + + ## 1.1 heading + + content of nested paragraph + + +for future reference, this markdown is parsed like so by +tree-sitter-md (visualization generated by tree-viz [1]): + + + document + | section + | | atx_heading + | | | atx_h1_marker "#" + | | | heading_content inline "1 heading" + | | paragraph + | | | inline "content of first paragraph" + | | section + | | | atx_heading + | | | | atx_h2_marker "##" + | | | | heading_content inline "1.1 heading" + | | | paragraph + | | | | inline "content of nested paragraph" + + +onto the converter itself. every tbsp program is written as +a collection of stanzas. typically, we start with a stanza +like so: + + + BEGIN { + int depth = 0; + + print("<html>\n"); + print("<body>\n"); + } + + +the stanza begins with a "pattern", in this case, "BEGIN", +and is followed a block of code. this block specifically, is +executed right at the beginning, before traversing the parse +tree. in this stanza, we set a "depth" variable to keep +track of nesting of markdown headers, and begin our html +document by printing the "<html>" and "<body>" tags. + +we can follow this stanza with an "END" stanza, that is +executed after the traversal: + + + END { + print("</body>\n"); + print("</html>\n"); + } + + +in this stanza, we close off the tags we opened at the start +of the document. we can move onto the interesting bits of +the conversion now: + + + enter section { + depth += 1; + } + leave section { + depth -= 1; + } + + +the above stanzas begin with "enter" and "leave" clauses, +followed by the name of a tree-sitter node kind: "section". +the "section" identifier is visible in the +tree-visualization above, it encompasses a markdown-section, +and is created for every markdown header. to understand how +tbsp executes above stanzas: + + + document ... depth = 0 + | section <-------- enter section (1) ... depth = 1 + | | atx_heading + | | | inline + | | paragraph + | | | inline + | | section <----- enter section (2) ... depth = 2 + | | | atx_heading + | | | | inline + | | | paragraph + | | | | inline + | | | <----------- leave section (2) ... depth = 1 + | | <-------------- leave section (1) ... depth = 0 + + +the following stanzas should be self-explanatory now: + + + enter atx_heading { + print("<h"); + print(depth); + print(">"); + } + leave atx_heading { + print("</h"); + print(depth); + print(">\n"); + } + + enter inline { + print(text(node)); + } + + +but an explanation is included nonetheless: + + + document ... depth = 0 + | section <-------- enter section (1) ... depth = 1 + | | atx_heading <- enter atx_heading ... print "<h1>" + | | | inline <--- enter inline ... print .. + | | | <----------- leave atx_heading ... print "</h1>" + | | paragraph + | | | inline <--- enter inline ... print .. + | | section <----- enter section (2) ... depth = 2 + | | | atx_heading enter atx_heading ... print "<h2>" + | | | | inline <- enter inline ... print .. + | | | | <-------- leave atx_heading ... print "</h2>" + | | | paragraph + | | | | inline <- enter inline ... print .. + | | | <----------- leave section (2) ... depth = 1 + | | <-------------- leave section (1) ... depth = 0 + + +the examples directory contains a complete markdown-to-html +converter, along with a few other motivating examples. + +--- + +usage: + +the tbsp evaluator is written in rust, use cargo to build +and run: + + cargo build --release + ./target/release/tbsp --help + + +tbsp requires three inputs: + +- a tbsp program, referred to as "program file" +- a language +- an input file or some input text at stdin + + +you can run the interpreter like so (this program prints an +overview of a rust file): + + $ ./target/release/tbsp \ + -f./examples/code-overview/overview.tbsp \ + -l rust \ + src/main.rs + module + └╴struct Cli + └╴trait Cli + └╴fn program + └╴fn language + └╴fn file + └╴fn try_consume_stdin + └╴fn main + + +--- + +roadmap: + +- interpreter performance + - [ ] introduce a hir with arena allocated blocks, expr + - [ ] bytecode VM? + - [ ] look into embedding high perf VMs, lua etc. +- pattern matching + - [ ] allow matching on tree-sitter queries + - [ ] support captures +- language features + - [ ] arrays and loops + - [ ] access node children + - [x] access node fields + - [ ] repr for ranges + - [ ] comments + - [ ] regexes + + +[0]: https://github.com/tree-sitter-grammars/tree-sitter-markdown +[1]: https://git.peppe.rs/cli/tree-viz