mirror of
https://github.com/rust-lang/rust-analyzer
synced 2025-01-12 05:08:52 +00:00
Some architecture notes
This commit is contained in:
parent
4c10c31be3
commit
5e21ae9418
1 changed files with 96 additions and 1 deletions
|
@ -1 +1,96 @@
|
|||
# Design and open questions about libsyntax.
|
||||
# Design and open questions about libsyntax
|
||||
|
||||
|
||||
The high-level description of the architecture is in RFC.md. You might
|
||||
also want to dig through https://github.com/matklad/fall/ which
|
||||
contains some pretty interesting stuff build using similar ideas
|
||||
(warning: it is completely undocumented, poorly written and in general
|
||||
not the thing which I recommend to study (yes, this is
|
||||
self-contradictory)).
|
||||
|
||||
## Tree
|
||||
|
||||
The centerpiece of this whole endeavor is the syntax tree, in the
|
||||
`tree` module. Open questions:
|
||||
|
||||
- how to best represent errors, to take advantage of the fact that
|
||||
they are rare, but to enable fully-persistent style structure
|
||||
sharing between tree nodes?
|
||||
|
||||
- should we make red/green split from Roslyn more pronounced?
|
||||
|
||||
- one can layout nodes in a single array in such a way that children
|
||||
of the node form a continuous slice. Seems nifty, but do we need it?
|
||||
|
||||
- should we use SoA or AoS for NodeData?
|
||||
|
||||
- should we split leaf nodes and internal nodes into separate arrays?
|
||||
Can we use it to save some bits here and there? (leaves don't need
|
||||
first_child field, for example).
|
||||
|
||||
|
||||
## Parser
|
||||
|
||||
The syntax tree is produced using a three-staged process.
|
||||
|
||||
First, a raw text is split into tokens with a lexer. Lexer has a
|
||||
peculiar signature: it is an `Fn(&str) -> Token`, where token is a
|
||||
pair of `SyntaxKind` (you should have read the `tree` module and RFC
|
||||
by this time! :)) and a len. That is, lexer chomps only the first
|
||||
token of the input. This forces the lexer to be stateless, and makes
|
||||
it possible to implement incremental relexing easily.
|
||||
|
||||
Then, the bulk of work, the parser turns a stream of tokens into
|
||||
stream of events. Not that parser **does not** construct a tree right
|
||||
away. This is done for several reasons:
|
||||
|
||||
* to decouple the actual tree data structure from the parser: you can
|
||||
build any datastructre you want from the stream of events
|
||||
|
||||
* to make parsing fast: you can produce a list of events without
|
||||
allocations
|
||||
|
||||
* to make it easy to tweak tree structure. Consider this code:
|
||||
|
||||
```
|
||||
#[cfg(test)]
|
||||
pub fn foo() {}
|
||||
```
|
||||
|
||||
Here, the attribute and the `pub` keyword must be the children of
|
||||
the `fn` node. However, when parsing them, we don't yet know if
|
||||
there would be a function ahead: it very well might be a `struct`
|
||||
there. If we use events, we generally don't care about this *in
|
||||
parser* and just spit them in order.
|
||||
|
||||
* (Is this true?) to make incremental reparsing easier: you can reuse
|
||||
the same rope data structure for all of the original string, the
|
||||
tokens and the events.
|
||||
|
||||
|
||||
The parser also does not know about whitespace tokens: it's the job of
|
||||
the next layer to assign whitespace and comments to nodes. However,
|
||||
parser can remap contextual tokens, like `>>` or `union`, so it has
|
||||
access to the text.
|
||||
|
||||
And at last, the TreeBuilder converts a flat stream of events into a
|
||||
tree structure. It also *should* be responsible for attaching comments
|
||||
and rebalancing the tree, but it does not do this yet :)
|
||||
|
||||
|
||||
## Error reporing
|
||||
|
||||
TODO: describe how stuff like `skip_to_first` works
|
||||
|
||||
|
||||
## Validator
|
||||
|
||||
Parser and lexer accept a lot of *invalid* code intentionally. The
|
||||
idea is to post-process the tree and to proper error reporting,
|
||||
literal conversion and quick-fix suggestions. There is no
|
||||
design/implementation for this yet.
|
||||
|
||||
|
||||
## AST
|
||||
|
||||
Nothing yet, see `AstNode` in `fall`.
|
||||
|
|
Loading…
Reference in a new issue