+documentation
This commit is contained in:
parent
ed3920e18f
commit
0bde536f3e
78
README.md
Normal file
78
README.md
Normal file
@ -0,0 +1,78 @@
|
|||||||
|
# TBSP
|
||||||
|
> Tree-Based Source-processing Language
|
||||||
|
|
||||||
|
## Notes
|
||||||
|
I stole the idea from here:
|
||||||
|
[https://github.com/oppiliappan/tbsp](https://github.com/oppiliappan/tbsp)
|
||||||
|
|
||||||
|
Now, there are some obvious problems with this project:
|
||||||
|
+ its written in rust
|
||||||
|
+ it tries to be a general purpose language for no reason
|
||||||
|
+ >"[ ] bytecode VM?"; serious?
|
||||||
|
|
||||||
|
I have tried contacting the owner, the response is pending.
|
||||||
|
|
||||||
|
I have tried hacking Bison into this behaviour, its too noisy.
|
||||||
|
|
||||||
|
I firmly believe code generation is the way to go, not just here,
|
||||||
|
but for DSL-es in general.
|
||||||
|
|
||||||
|
This project will heavy depend on tree-sitter,
|
||||||
|
there is no sense pretending otherwise with decoupling.
|
||||||
|
|
||||||
|
The current implementation (in python) is obviously terrible.
|
||||||
|
It does work however.
|
||||||
|
|
||||||
|
## Language semantics
|
||||||
|
Modelled half after the original, half after Flex/Bison.
|
||||||
|
```
|
||||||
|
<declaration-section>
|
||||||
|
%%
|
||||||
|
<rule-section>
|
||||||
|
%%
|
||||||
|
<code-section>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Declaration section
|
||||||
|
```
|
||||||
|
%top { <...> } // code to be pasted at the top of the source file
|
||||||
|
%language <lang> // tree-sitter langauge name (for the right includes)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rule section
|
||||||
|
```
|
||||||
|
enter <node-type> { <...> } // code to run when tree-sitter node-type <node-type> is encountered
|
||||||
|
close <node-type> { <...> } // code to run when tree-sitter node-type <node-type> is poped from
|
||||||
|
```
|
||||||
|
|
||||||
|
### Code
|
||||||
|
The code section is verbatim pasted to the end of the output file.
|
||||||
|
#### Globals
|
||||||
|
```C
|
||||||
|
int tbtraverse(const char * const code); // master function; rules are evaluated here
|
||||||
|
```
|
||||||
|
#### In tbtraverse
|
||||||
|
```C
|
||||||
|
char * tbtext; // copy of the current nodes text value (not ts_node_string); XXX: this could be much optimized
|
||||||
|
int tblen; // string lenght of tbtext
|
||||||
|
// XXX: these should probably be renamed
|
||||||
|
TSNode current_node; // node corresponding to the rule in enter rules
|
||||||
|
TSNode previous_node; // node corresponding to the rule in close rules
|
||||||
|
```
|
||||||
|
|
||||||
|
### TODO
|
||||||
|
+ port "backend" to C (from C++)
|
||||||
|
+ port from python (can wait)
|
||||||
|
- optimize the allocation of tbtext
|
||||||
|
- optimize from strcmp()
|
||||||
|
|
||||||
|
### Thinking area
|
||||||
|
```C
|
||||||
|
// This should be allowed to mean 'a' or 'b'
|
||||||
|
enter a b { <...> }
|
||||||
|
|
||||||
|
// This should be allowed to mean 'enter' or 'leave'
|
||||||
|
enter leave a { <...> }
|
||||||
|
|
||||||
|
// In node type blobbing should probably be allowed, however regex sounds like overkill
|
||||||
|
```
|
203
documentation/original_readme.md
Normal file
203
documentation/original_readme.md
Normal file
@ -0,0 +1,203 @@
|
|||||||
|
tbsp - tree-based source-processing language
|
||||||
|
|
||||||
|
|
||||||
|
tbsp is an awk-like language that operates on tree-sitter
|
||||||
|
syntax trees. to motivate the need for such a program, we
|
||||||
|
could begin by writing a markdown-to-html converter using
|
||||||
|
tbsp and tree-sitter-md [0]. we need some markdown to begin
|
||||||
|
with:
|
||||||
|
|
||||||
|
|
||||||
|
# 1 heading
|
||||||
|
|
||||||
|
content of first paragraph
|
||||||
|
|
||||||
|
## 1.1 heading
|
||||||
|
|
||||||
|
content of nested paragraph
|
||||||
|
|
||||||
|
|
||||||
|
for future reference, this markdown is parsed like so by
|
||||||
|
tree-sitter-md (visualization generated by tree-viz [1]):
|
||||||
|
|
||||||
|
|
||||||
|
document
|
||||||
|
| section
|
||||||
|
| | atx_heading
|
||||||
|
| | | atx_h1_marker "#"
|
||||||
|
| | | heading_content inline "1 heading"
|
||||||
|
| | paragraph
|
||||||
|
| | | inline "content of first paragraph"
|
||||||
|
| | section
|
||||||
|
| | | atx_heading
|
||||||
|
| | | | atx_h2_marker "##"
|
||||||
|
| | | | heading_content inline "1.1 heading"
|
||||||
|
| | | paragraph
|
||||||
|
| | | | inline "content of nested paragraph"
|
||||||
|
|
||||||
|
|
||||||
|
onto the converter itself. every tbsp program is written as
|
||||||
|
a collection of stanzas. typically, we start with a stanza
|
||||||
|
like so:
|
||||||
|
|
||||||
|
|
||||||
|
BEGIN {
|
||||||
|
int depth = 0;
|
||||||
|
|
||||||
|
print("<html>\n");
|
||||||
|
print("<body>\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
the stanza begins with a "pattern", in this case, "BEGIN",
|
||||||
|
and is followed a block of code. this block specifically, is
|
||||||
|
executed right at the beginning, before traversing the parse
|
||||||
|
tree. in this stanza, we set a "depth" variable to keep
|
||||||
|
track of nesting of markdown headers, and begin our html
|
||||||
|
document by printing the "<html>" and "<body>" tags.
|
||||||
|
|
||||||
|
we can follow this stanza with an "END" stanza, that is
|
||||||
|
executed after the traversal:
|
||||||
|
|
||||||
|
|
||||||
|
END {
|
||||||
|
print("</body>\n");
|
||||||
|
print("</html>\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
in this stanza, we close off the tags we opened at the start
|
||||||
|
of the document. we can move onto the interesting bits of
|
||||||
|
the conversion now:
|
||||||
|
|
||||||
|
|
||||||
|
enter section {
|
||||||
|
depth += 1;
|
||||||
|
}
|
||||||
|
leave section {
|
||||||
|
depth -= 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
the above stanzas begin with "enter" and "leave" clauses,
|
||||||
|
followed by the name of a tree-sitter node kind: "section".
|
||||||
|
the "section" identifier is visible in the
|
||||||
|
tree-visualization above, it encompasses a markdown-section,
|
||||||
|
and is created for every markdown header. to understand how
|
||||||
|
tbsp executes above stanzas:
|
||||||
|
|
||||||
|
|
||||||
|
document ... depth = 0
|
||||||
|
| section <-------- enter section (1) ... depth = 1
|
||||||
|
| | atx_heading
|
||||||
|
| | | inline
|
||||||
|
| | paragraph
|
||||||
|
| | | inline
|
||||||
|
| | section <----- enter section (2) ... depth = 2
|
||||||
|
| | | atx_heading
|
||||||
|
| | | | inline
|
||||||
|
| | | paragraph
|
||||||
|
| | | | inline
|
||||||
|
| | | <----------- leave section (2) ... depth = 1
|
||||||
|
| | <-------------- leave section (1) ... depth = 0
|
||||||
|
|
||||||
|
|
||||||
|
the following stanzas should be self-explanatory now:
|
||||||
|
|
||||||
|
|
||||||
|
enter atx_heading {
|
||||||
|
print("<h");
|
||||||
|
print(depth);
|
||||||
|
print(">");
|
||||||
|
}
|
||||||
|
leave atx_heading {
|
||||||
|
print("</h");
|
||||||
|
print(depth);
|
||||||
|
print(">\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
enter inline {
|
||||||
|
print(text(node));
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
but an explanation is included nonetheless:
|
||||||
|
|
||||||
|
|
||||||
|
document ... depth = 0
|
||||||
|
| section <-------- enter section (1) ... depth = 1
|
||||||
|
| | atx_heading <- enter atx_heading ... print "<h1>"
|
||||||
|
| | | inline <--- enter inline ... print ..
|
||||||
|
| | | <----------- leave atx_heading ... print "</h1>"
|
||||||
|
| | paragraph
|
||||||
|
| | | inline <--- enter inline ... print ..
|
||||||
|
| | section <----- enter section (2) ... depth = 2
|
||||||
|
| | | atx_heading enter atx_heading ... print "<h2>"
|
||||||
|
| | | | inline <- enter inline ... print ..
|
||||||
|
| | | | <-------- leave atx_heading ... print "</h2>"
|
||||||
|
| | | paragraph
|
||||||
|
| | | | inline <- enter inline ... print ..
|
||||||
|
| | | <----------- leave section (2) ... depth = 1
|
||||||
|
| | <-------------- leave section (1) ... depth = 0
|
||||||
|
|
||||||
|
|
||||||
|
the examples directory contains a complete markdown-to-html
|
||||||
|
converter, along with a few other motivating examples.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
usage:
|
||||||
|
|
||||||
|
the tbsp evaluator is written in rust, use cargo to build
|
||||||
|
and run:
|
||||||
|
|
||||||
|
cargo build --release
|
||||||
|
./target/release/tbsp --help
|
||||||
|
|
||||||
|
|
||||||
|
tbsp requires three inputs:
|
||||||
|
|
||||||
|
- a tbsp program, referred to as "program file"
|
||||||
|
- a language
|
||||||
|
- an input file or some input text at stdin
|
||||||
|
|
||||||
|
|
||||||
|
you can run the interpreter like so (this program prints an
|
||||||
|
overview of a rust file):
|
||||||
|
|
||||||
|
$ ./target/release/tbsp \
|
||||||
|
-f./examples/code-overview/overview.tbsp \
|
||||||
|
-l rust \
|
||||||
|
src/main.rs
|
||||||
|
module
|
||||||
|
└╴struct Cli
|
||||||
|
└╴trait Cli
|
||||||
|
└╴fn program
|
||||||
|
└╴fn language
|
||||||
|
└╴fn file
|
||||||
|
└╴fn try_consume_stdin
|
||||||
|
└╴fn main
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
roadmap:
|
||||||
|
|
||||||
|
- interpreter performance
|
||||||
|
- [ ] introduce a hir with arena allocated blocks, expr
|
||||||
|
- [ ] bytecode VM?
|
||||||
|
- [ ] look into embedding high perf VMs, lua etc.
|
||||||
|
- pattern matching
|
||||||
|
- [ ] allow matching on tree-sitter queries
|
||||||
|
- [ ] support captures
|
||||||
|
- language features
|
||||||
|
- [ ] arrays and loops
|
||||||
|
- [ ] access node children
|
||||||
|
- [x] access node fields
|
||||||
|
- [ ] repr for ranges
|
||||||
|
- [ ] comments
|
||||||
|
- [ ] regexes
|
||||||
|
|
||||||
|
|
||||||
|
[0]: https://github.com/tree-sitter-grammars/tree-sitter-markdown
|
||||||
|
[1]: https://git.peppe.rs/cli/tree-viz
|
Loading…
x
Reference in New Issue
Block a user