mirror of
https://github.com/rust-lang/rust-analyzer
synced 2024-12-25 20:43:21 +00:00
README
This commit is contained in:
parent
6cade3f6d8
commit
2812015d40
5 changed files with 102 additions and 672 deletions
107
README.md
107
README.md
|
@ -5,13 +5,110 @@
|
||||||
[![Build status](https://ci.appveyor.com/api/projects/status/j56x1hbje8rdg6xk/branch/master?svg=true)](https://ci.appveyor.com/project/matklad/libsyntax2/branch/master)
|
[![Build status](https://ci.appveyor.com/api/projects/status/j56x1hbje8rdg6xk/branch/master?svg=true)](https://ci.appveyor.com/project/matklad/libsyntax2/branch/master)
|
||||||
|
|
||||||
|
|
||||||
|
libsyntax2.0 is an **experimental** parser of the Rust language,
|
||||||
|
intended for the use in IDEs.
|
||||||
|
[RFC](https://github.com/rust-lang/rfcs/pull/2256).
|
||||||
|
|
||||||
libsyntax2.0 is an **experimental** implementation of the corresponding [RFC](https://github.com/rust-lang/rfcs/pull/2256).
|
|
||||||
|
|
||||||
See [`docs`](./docs) folder to learn how libsyntax2 works, and check
|
## Quick Start
|
||||||
[`CONTRIBUTING.md`](./CONTRIBUTING.md) if you want to contribute!
|
|
||||||
**WARNING** everything is in a bit of a flux recently, the docs are obsolete,
|
```
|
||||||
see the recent work on red/green trees.
|
$ cargo test
|
||||||
|
$ cargo parse < crates/libsyntax2/src/lib.rs
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## Trying It Out
|
||||||
|
|
||||||
|
This installs experimental VS Code plugin
|
||||||
|
|
||||||
|
```
|
||||||
|
$ cargo install-code
|
||||||
|
```
|
||||||
|
|
||||||
|
It's better to remove existing Rust plugins to avoid interference.
|
||||||
|
Warning: plugin is not intended for general use, has a lot of rough
|
||||||
|
edges and missing features (notably, no code completion). That said,
|
||||||
|
while originally libsyntax2 was developed in IntelliJ, @matklad now
|
||||||
|
uses this plugin (and thus, libsytax2) to develop libsyntax2, and it
|
||||||
|
doesn't hurt too much :-)
|
||||||
|
|
||||||
|
|
||||||
|
### Features:
|
||||||
|
|
||||||
|
* syntax highlighting (LSP does not have API for it, so impl is hacky
|
||||||
|
and sometimes fall-backs to the horrible built-in highlighting)
|
||||||
|
|
||||||
|
* commands (`ctrl+shift+p` or keybindings)
|
||||||
|
- **Show Rust Syntax Tree** (use it to verify that plugin works)
|
||||||
|
- **Rust Extend Selection** (works with multiple cursors)
|
||||||
|
- **Rust Matching Brace** (knows the difference between `<` and `<`)
|
||||||
|
- **Rust Parent Module**
|
||||||
|
- **Rust Join Lines** (deals with trailing commas)
|
||||||
|
|
||||||
|
* **Go to symbol in file**
|
||||||
|
|
||||||
|
* **Go to symbol in workspace** (no support for Cargo deps yet)
|
||||||
|
|
||||||
|
* code actions:
|
||||||
|
- Flip `,` in comma separated lists
|
||||||
|
- Add `#[derive]` to struct/enum
|
||||||
|
- Add `impl` block to struct/enum
|
||||||
|
- Run tests at caret
|
||||||
|
|
||||||
|
* **Go to definition** ("correct" for `mod foo;` decls, index-based for functions).
|
||||||
|
|
||||||
|
|
||||||
|
## Code Walk-Through
|
||||||
|
|
||||||
|
### `crates/libsyntax2`
|
||||||
|
|
||||||
|
- `yellow`, red/green syntax tree, heavily inspired [by this](https://github.com/apple/swift/tree/ab68f0d4cbf99cdfa672f8ffe18e433fddc8b371/lib/Syntax)
|
||||||
|
- `grammar`, the actual parser
|
||||||
|
- `parser_api/parser_impl` bridges the tree-agnostic parser from `grammar` with `yellow` trees
|
||||||
|
- `grammar.ron` RON description of the grammar, which is used to
|
||||||
|
generate `syntax_kinds` and `ast` modules.
|
||||||
|
- `algo`: generic tree algorithms, including `walk` for O(1) stack
|
||||||
|
space tree traversal (this is cool) and `visit` for type-driven
|
||||||
|
visiting the nodes (this is double plus cool, if you understand how
|
||||||
|
`Visitor` works, you understand libsyntax2).
|
||||||
|
|
||||||
|
|
||||||
|
### `crates/libeditor`
|
||||||
|
|
||||||
|
Most of IDE features leave here, unlike `libanalysis`, `libeditor` is
|
||||||
|
single-file and is basically a bunch of pure functions.
|
||||||
|
|
||||||
|
|
||||||
|
### `crates/libanalysis`
|
||||||
|
|
||||||
|
A stateful library for analyzing many Rust files as they change.
|
||||||
|
`WorldState` is a mutable entity (clojure's atom) which holds current
|
||||||
|
state, incorporates changes and handles out `World`s --- immutable
|
||||||
|
consistent snapshots of `WorldState`, which actually power analysis.
|
||||||
|
|
||||||
|
|
||||||
|
### `crates/server`
|
||||||
|
|
||||||
|
An LSP implementation which uses `libanalysis` for managing state and
|
||||||
|
`libeditor` for actually doing useful stuff.
|
||||||
|
|
||||||
|
|
||||||
|
### `crates/cli`
|
||||||
|
|
||||||
|
A CLI interface to libsyntax
|
||||||
|
|
||||||
|
### `crate/tools`
|
||||||
|
|
||||||
|
Code-gen tasks, used to develop libsyntax2:
|
||||||
|
|
||||||
|
- `cargo gen-kinds` -- generate `ast` and `syntax_kinds`
|
||||||
|
- `cargo gen-tests` -- collect inline tests from grammar
|
||||||
|
- `cargo install-code` -- build and install VS Code extension and server
|
||||||
|
|
||||||
|
### `code`
|
||||||
|
|
||||||
|
VS Code plugin
|
||||||
|
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
|
@ -1,93 +0,0 @@
|
||||||
# Design and open questions about libsyntax
|
|
||||||
|
|
||||||
|
|
||||||
The high-level description of the architecture is in RFC.md. You might
|
|
||||||
also want to dig through https://github.com/matklad/fall/ which
|
|
||||||
contains some pretty interesting stuff build using similar ideas
|
|
||||||
(warning: it is completely undocumented, poorly written and in general
|
|
||||||
not the thing which I recommend to study (yes, this is
|
|
||||||
self-contradictory)).
|
|
||||||
|
|
||||||
## Tree
|
|
||||||
|
|
||||||
The centerpiece of this whole endeavor is the syntax tree, in the
|
|
||||||
`tree` module. Open questions:
|
|
||||||
|
|
||||||
- how to best represent errors, to take advantage of the fact that
|
|
||||||
they are rare, but to enable fully-persistent style structure
|
|
||||||
sharing between tree nodes?
|
|
||||||
|
|
||||||
- should we make red/green split from Roslyn more pronounced?
|
|
||||||
|
|
||||||
- one can layout nodes in a single array in such a way that children
|
|
||||||
of the node form a continuous slice. Seems nifty, but do we need it?
|
|
||||||
|
|
||||||
- should we use SoA or AoS for NodeData?
|
|
||||||
|
|
||||||
- should we split leaf nodes and internal nodes into separate arrays?
|
|
||||||
Can we use it to save some bits here and there? (leaves don't need
|
|
||||||
first_child field, for example).
|
|
||||||
|
|
||||||
|
|
||||||
## Parser
|
|
||||||
|
|
||||||
The syntax tree is produced using a three-staged process.
|
|
||||||
|
|
||||||
First, a raw text is split into tokens with a lexer (the `lexer` module).
|
|
||||||
Lexer has a peculiar signature: it is an `Fn(&str) -> Token`, where token
|
|
||||||
is a pair of `SyntaxKind` (you should have read the `tree` module and RFC
|
|
||||||
by this time! :)) and a len. That is, lexer chomps only the first
|
|
||||||
token of the input. This forces the lexer to be stateless, and makes
|
|
||||||
it possible to implement incremental relexing easily.
|
|
||||||
|
|
||||||
Then, the bulk of work, the parser turns a stream of tokens into
|
|
||||||
stream of events (the `parser` module; of particular interest are
|
|
||||||
the `parser/event` and `parser/parser` modules, which contain parsing
|
|
||||||
API, and the `parser/grammar` module, which contains actual parsing code
|
|
||||||
for various Rust syntactic constructs). Not that parser **does not**
|
|
||||||
construct a tree right away. This is done for several reasons:
|
|
||||||
|
|
||||||
* to decouple the actual tree data structure from the parser: you can
|
|
||||||
build any data structure you want from the stream of events
|
|
||||||
|
|
||||||
* to make parsing fast: you can produce a list of events without
|
|
||||||
allocations
|
|
||||||
|
|
||||||
* to make it easy to tweak tree structure. Consider this code:
|
|
||||||
|
|
||||||
```
|
|
||||||
#[cfg(test)]
|
|
||||||
pub fn foo() {}
|
|
||||||
```
|
|
||||||
|
|
||||||
Here, the attribute and the `pub` keyword must be the children of
|
|
||||||
the `fn` node. However, when parsing them, we don't yet know if
|
|
||||||
there would be a function ahead: it very well might be a `struct`
|
|
||||||
there. If we use events, we generally don't care about this *in
|
|
||||||
parser* and just spit them in order.
|
|
||||||
|
|
||||||
* (Is this true?) to make incremental reparsing easier: you can reuse
|
|
||||||
the same rope data structure for all of the original string, the
|
|
||||||
tokens and the events.
|
|
||||||
|
|
||||||
|
|
||||||
The parser also does not know about whitespace tokens: it's the job of
|
|
||||||
the next layer to assign whitespace and comments to nodes. However,
|
|
||||||
parser can remap contextual tokens, like `>>` or `union`, so it has
|
|
||||||
access to the text.
|
|
||||||
|
|
||||||
And at last, the TreeBuilder converts a flat stream of events into a
|
|
||||||
tree structure. It also *should* be responsible for attaching comments
|
|
||||||
and rebalancing the tree, but it does not do this yet :)
|
|
||||||
|
|
||||||
## Validator
|
|
||||||
|
|
||||||
Parser and lexer accept a lot of *invalid* code intentionally. The
|
|
||||||
idea is to post-process the tree and to proper error reporting,
|
|
||||||
literal conversion and quick-fix suggestions. There is no
|
|
||||||
design/implementation for this yet.
|
|
||||||
|
|
||||||
|
|
||||||
## AST
|
|
||||||
|
|
||||||
Nothing yet, see `AstNode` in `fall`.
|
|
494
docs/RFC.md
494
docs/RFC.md
|
@ -1,494 +0,0 @@
|
||||||
- Feature Name: libsyntax2.0
|
|
||||||
- Start Date: 2017-12-30
|
|
||||||
- RFC PR: (leave this empty)
|
|
||||||
- Rust Issue: (leave this empty)
|
|
||||||
|
|
||||||
|
|
||||||
>I think the lack of reusability comes in object-oriented languages,
|
|
||||||
>not functional languages. Because the problem with object-oriented
|
|
||||||
>languages is they’ve got all this implicit environment that they
|
|
||||||
>carry around with them. You wanted a banana but what you got was a
|
|
||||||
>gorilla holding the banana and the entire jungle.
|
|
||||||
>
|
|
||||||
>If you have referentially transparent code, if you have pure
|
|
||||||
>functions — all the data comes in its input arguments and everything
|
|
||||||
>goes out and leave no state behind — it’s incredibly reusable.
|
|
||||||
>
|
|
||||||
> **Joe Armstrong**
|
|
||||||
|
|
||||||
# Summary
|
|
||||||
[summary]: #summary
|
|
||||||
|
|
||||||
The long-term plan is to rewrite libsyntax parser and syntax tree data
|
|
||||||
structure to create a software component independent of the rest of
|
|
||||||
rustc compiler and suitable for the needs of IDEs and code
|
|
||||||
editors. This RFCs is the first step of this plan, whose goal is to
|
|
||||||
find out if this is possible at least in theory. If it is possible,
|
|
||||||
the next steps would be a prototype implementation as a crates.io
|
|
||||||
crate and a separate RFC for integrating the prototype with rustc,
|
|
||||||
other tools, and eventual libsyntax removal.
|
|
||||||
|
|
||||||
Note that this RFC does not propose to stabilize any API for working
|
|
||||||
with rust syntax: the semver version of the hypothetical library would
|
|
||||||
be `0.1.0`. It is intended to be used by tools, which are currently
|
|
||||||
closely related to the compiler: `rustc`, `rustfmt`, `clippy`, `rls`
|
|
||||||
and hypothetical `rustfix`. While it would be possible to create
|
|
||||||
third-party tools on top of the new libsyntax, the burden of adopting
|
|
||||||
to breaking changes would be on authors of such tools.
|
|
||||||
|
|
||||||
|
|
||||||
# Motivation
|
|
||||||
[motivation]: #motivation
|
|
||||||
|
|
||||||
There are two main drawbacks with the current version of libsyntax:
|
|
||||||
|
|
||||||
* It is tightly integrated with the compiler and hard to use
|
|
||||||
independently
|
|
||||||
|
|
||||||
* The AST representation is not well-suited for use inside IDEs
|
|
||||||
|
|
||||||
|
|
||||||
## IDE support
|
|
||||||
|
|
||||||
There are several differences in how IDEs and compilers typically
|
|
||||||
treat source code.
|
|
||||||
|
|
||||||
In the compiler, it is convenient to transform the source
|
|
||||||
code into Abstract Syntax Tree form, which is independent of the
|
|
||||||
surface syntax. For example, it's convenient to discard comments,
|
|
||||||
whitespaces and desugar some syntactic constructs in terms of the
|
|
||||||
simpler ones.
|
|
||||||
|
|
||||||
In contrast, IDEs work much closer to the source code, so it is
|
|
||||||
crucial to preserve full information about the original text. For
|
|
||||||
example, IDE may adjust indentation after typing a `}` which closes a
|
|
||||||
block, and to do this correctly, IDE must be aware of syntax (that is,
|
|
||||||
that `}` indeed closes some block, and is not a syntax error) and of
|
|
||||||
all whitespaces and comments. So, IDE suitable AST should explicitly
|
|
||||||
account for syntactic elements, not considered important by the
|
|
||||||
compiler.
|
|
||||||
|
|
||||||
Another difference is that IDEs typically work with incomplete and
|
|
||||||
syntactically invalid code. This boils down to two parser properties.
|
|
||||||
First, the parser must produce syntax tree even if some required input
|
|
||||||
is missing. For example, for input `fn foo` the function node should
|
|
||||||
be present in the parse, despite the fact that there is no parameters
|
|
||||||
or body. Second, the parser must be able to skip over parts of input
|
|
||||||
it can't recognize and aggressively recover from errors. That is, the
|
|
||||||
syntax tree data structure should be able to handle both missing and
|
|
||||||
extra nodes.
|
|
||||||
|
|
||||||
IDEs also need the ability to incrementally reparse and relex source
|
|
||||||
code after the user types. A smart IDE would use syntax tree structure
|
|
||||||
to handle editing commands (for example, to add/remove trailing commas
|
|
||||||
after join/split lines actions), so parsing time can be very
|
|
||||||
noticeable.
|
|
||||||
|
|
||||||
|
|
||||||
Currently rustc uses the classical AST approach, and preserves some of
|
|
||||||
the source code information in the form of spans in the AST. It is not
|
|
||||||
clear if this structure can full fill all IDE requirements.
|
|
||||||
|
|
||||||
|
|
||||||
## Reusability
|
|
||||||
|
|
||||||
In theory, the parser can be a pure function, which takes a `&str` as
|
|
||||||
an input, and produces a `ParseTree` as an output.
|
|
||||||
|
|
||||||
This is great for reusability: for example, you can compile this
|
|
||||||
function to WASM and use it for fast client-side validation of syntax
|
|
||||||
on the rust playground, or you can develop tools like `rustfmt` on
|
|
||||||
stable Rust outside of rustc repository, or you can embed the parser
|
|
||||||
into your favorite IDE or code editor.
|
|
||||||
|
|
||||||
This is also great for correctness: with such simple interface, it's
|
|
||||||
possible to write property-based tests to thoroughly compare two
|
|
||||||
different implementations of the parser. It's also straightforward to
|
|
||||||
create a comprehensive test suite, because all the inputs and outputs
|
|
||||||
are trivially serializable to human-readable text.
|
|
||||||
|
|
||||||
Another benefit is performance: with this signature, you can cache a
|
|
||||||
parse tree for each file, with trivial strategy for cache invalidation
|
|
||||||
(invalidate an entry when the underling file changes). On top of such
|
|
||||||
a cache it is possible to build a smart code indexer which maintains
|
|
||||||
the set of symbols in the project, watches files for changes and
|
|
||||||
automatically reindexes only changed files.
|
|
||||||
|
|
||||||
Unfortunately, the current libsyntax is far from this ideal. For
|
|
||||||
example, even the lexer makes use of the `FileMap` which is
|
|
||||||
essentially a global state of the compiler which represents all know
|
|
||||||
files. As a data point, it turned out to be easier to move `rustfmt`
|
|
||||||
into the main `rustc` repository than to move libsyntax outside!
|
|
||||||
|
|
||||||
|
|
||||||
# Guide-level explanation
|
|
||||||
[guide-level-explanation]: #guide-level-explanation
|
|
||||||
|
|
||||||
Not applicable.
|
|
||||||
|
|
||||||
|
|
||||||
# Reference-level explanation
|
|
||||||
[reference-level-explanation]: #reference-level-explanation
|
|
||||||
|
|
||||||
It is not clear if a single parser can accommodate the needs of the
|
|
||||||
compiler and the IDE, but there is hope that it is possible. The RFC
|
|
||||||
proposes to develop libsynax2.0 as an experimental crates.io crate. If
|
|
||||||
the experiment turns out to be a success, the second RFC will propose
|
|
||||||
to integrate it with all existing tools and `rustc`.
|
|
||||||
|
|
||||||
Next, a syntax tree data structure is proposed for libsyntax2.0. It
|
|
||||||
seems to have the following important properties:
|
|
||||||
|
|
||||||
* It is lossless and faithfully represents the original source code,
|
|
||||||
including explicit nodes for comments and whitespace.
|
|
||||||
|
|
||||||
* It is flexible and allows to encode arbitrary node structure,
|
|
||||||
even for invalid syntax.
|
|
||||||
|
|
||||||
* It is minimal: it stores small amount of data and has no
|
|
||||||
dependencies. For instance, it does not need compiler's string
|
|
||||||
interner or literal data representation.
|
|
||||||
|
|
||||||
* While the tree itself is minimal, it is extensible in a sense that
|
|
||||||
it possible to associate arbitrary data with certain nodes in a
|
|
||||||
type-safe way.
|
|
||||||
|
|
||||||
|
|
||||||
It is not clear if this representation is the best one. It is heavily
|
|
||||||
inspired by [PSI] data structure which used in [IntelliJ] based IDEs
|
|
||||||
and in the [Kotlin] compiler.
|
|
||||||
|
|
||||||
[PSI]: http://www.jetbrains.org/intellij/sdk/docs/reference_guide/custom_language_support/implementing_parser_and_psi.html
|
|
||||||
[IntelliJ]: https://github.com/JetBrains/intellij-community/
|
|
||||||
[Kotlin]: https://kotlinlang.org/
|
|
||||||
|
|
||||||
|
|
||||||
## Untyped Tree
|
|
||||||
|
|
||||||
The main idea is to store the minimal amount of information in the
|
|
||||||
tree itself, and instead lean heavily on the source code for the
|
|
||||||
actual data about identifier names, constant values etc.
|
|
||||||
|
|
||||||
All nodes in the tree are of the same type and store a constant for
|
|
||||||
the syntactic category of the element and a range in the source code.
|
|
||||||
|
|
||||||
Here is a minimal implementation of this data structure with some Rust
|
|
||||||
syntactic categories
|
|
||||||
|
|
||||||
|
|
||||||
```rust
|
|
||||||
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
|
|
||||||
pub struct NodeKind(u16);
|
|
||||||
|
|
||||||
pub struct File {
|
|
||||||
text: String,
|
|
||||||
nodes: Vec<NodeData>,
|
|
||||||
}
|
|
||||||
|
|
||||||
struct NodeData {
|
|
||||||
kind: NodeKind,
|
|
||||||
range: (u32, u32),
|
|
||||||
parent: Option<u32>,
|
|
||||||
first_child: Option<u32>,
|
|
||||||
next_sibling: Option<u32>,
|
|
||||||
}
|
|
||||||
|
|
||||||
#[derive(Clone, Copy)]
|
|
||||||
pub struct Node<'f> {
|
|
||||||
file: &'f File,
|
|
||||||
idx: u32,
|
|
||||||
}
|
|
||||||
|
|
||||||
pub struct Children<'f> {
|
|
||||||
next: Option<Node<'f>>,
|
|
||||||
}
|
|
||||||
|
|
||||||
impl File {
|
|
||||||
pub fn root<'f>(&'f self) -> Node<'f> {
|
|
||||||
assert!(!self.nodes.is_empty());
|
|
||||||
Node { file: self, idx: 0 }
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl<'f> Node<'f> {
|
|
||||||
pub fn kind(&self) -> NodeKind {
|
|
||||||
self.data().kind
|
|
||||||
}
|
|
||||||
|
|
||||||
pub fn text(&self) -> &'f str {
|
|
||||||
let (start, end) = self.data().range;
|
|
||||||
&self.file.text[start as usize..end as usize]
|
|
||||||
}
|
|
||||||
|
|
||||||
pub fn parent(&self) -> Option<Node<'f>> {
|
|
||||||
self.as_node(self.data().parent)
|
|
||||||
}
|
|
||||||
|
|
||||||
pub fn children(&self) -> Children<'f> {
|
|
||||||
Children { next: self.as_node(self.data().first_child) }
|
|
||||||
}
|
|
||||||
|
|
||||||
fn data(&self) -> &'f NodeData {
|
|
||||||
&self.file.nodes[self.idx as usize]
|
|
||||||
}
|
|
||||||
|
|
||||||
fn as_node(&self, idx: Option<u32>) -> Option<Node<'f>> {
|
|
||||||
idx.map(|idx| Node { file: self.file, idx })
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl<'f> Iterator for Children<'f> {
|
|
||||||
type Item = Node<'f>;
|
|
||||||
|
|
||||||
fn next(&mut self) -> Option<Node<'f>> {
|
|
||||||
let next = self.next;
|
|
||||||
self.next = next.and_then(|node| node.as_node(node.data().next_sibling));
|
|
||||||
next
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
pub const ERROR: NodeKind = NodeKind(0);
|
|
||||||
pub const WHITESPACE: NodeKind = NodeKind(1);
|
|
||||||
pub const STRUCT_KW: NodeKind = NodeKind(2);
|
|
||||||
pub const IDENT: NodeKind = NodeKind(3);
|
|
||||||
pub const L_CURLY: NodeKind = NodeKind(4);
|
|
||||||
pub const R_CURLY: NodeKind = NodeKind(5);
|
|
||||||
pub const COLON: NodeKind = NodeKind(6);
|
|
||||||
pub const COMMA: NodeKind = NodeKind(7);
|
|
||||||
pub const AMP: NodeKind = NodeKind(8);
|
|
||||||
pub const LINE_COMMENT: NodeKind = NodeKind(9);
|
|
||||||
pub const FILE: NodeKind = NodeKind(10);
|
|
||||||
pub const STRUCT_DEF: NodeKind = NodeKind(11);
|
|
||||||
pub const FIELD_DEF: NodeKind = NodeKind(12);
|
|
||||||
pub const TYPE_REF: NodeKind = NodeKind(13);
|
|
||||||
```
|
|
||||||
|
|
||||||
Here is a rust snippet and the corresponding parse tree:
|
|
||||||
|
|
||||||
```rust
|
|
||||||
struct Foo {
|
|
||||||
field1: u32,
|
|
||||||
&
|
|
||||||
// non-doc comment
|
|
||||||
field2:
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
```
|
|
||||||
FILE
|
|
||||||
STRUCT_DEF
|
|
||||||
STRUCT_KW
|
|
||||||
WHITESPACE
|
|
||||||
IDENT
|
|
||||||
WHITESPACE
|
|
||||||
L_CURLY
|
|
||||||
WHITESPACE
|
|
||||||
FIELD_DEF
|
|
||||||
IDENT
|
|
||||||
COLON
|
|
||||||
WHITESPACE
|
|
||||||
TYPE_REF
|
|
||||||
IDENT
|
|
||||||
COMMA
|
|
||||||
WHITESPACE
|
|
||||||
ERROR
|
|
||||||
AMP
|
|
||||||
WHITESPACE
|
|
||||||
FIELD_DEF
|
|
||||||
LINE_COMMENT
|
|
||||||
WHITESPACE
|
|
||||||
IDENT
|
|
||||||
COLON
|
|
||||||
ERROR
|
|
||||||
WHITESPACE
|
|
||||||
R_CURLY
|
|
||||||
```
|
|
||||||
|
|
||||||
Note several features of the tree:
|
|
||||||
|
|
||||||
* All whitespace and comments are explicitly accounted for.
|
|
||||||
|
|
||||||
* The node for `STRUCT_DEF` contains the error element for `&`, but
|
|
||||||
still represents the following field correctly.
|
|
||||||
|
|
||||||
* The second field of the struct is incomplete: `FIELD_DEF` node for
|
|
||||||
it contains an `ERROR` element, but nevertheless has the correct
|
|
||||||
`NodeKind`.
|
|
||||||
|
|
||||||
* The non-documenting comment is correctly attached to the following
|
|
||||||
field.
|
|
||||||
|
|
||||||
|
|
||||||
## Typed Tree
|
|
||||||
|
|
||||||
It's hard to work with this raw parse tree, because it is untyped:
|
|
||||||
node containing a struct definition has the same API as the node for
|
|
||||||
the struct field. But it's possible to add a strongly typed layer on
|
|
||||||
top of this raw tree, and get a zero-cost AST. Here is an example
|
|
||||||
which adds type-safe wrappers for structs and fields:
|
|
||||||
|
|
||||||
```rust
|
|
||||||
// generic infrastructure
|
|
||||||
|
|
||||||
pub trait AstNode<'f>: Copy + 'f {
|
|
||||||
fn new(node: Node<'f>) -> Option<Self>;
|
|
||||||
fn node(&self) -> Node<'f>;
|
|
||||||
}
|
|
||||||
|
|
||||||
pub fn child_of_kind<'f>(node: Node<'f>, kind: NodeKind) -> Option<Node<'f>> {
|
|
||||||
node.children().find(|child| child.kind() == kind)
|
|
||||||
}
|
|
||||||
|
|
||||||
pub fn ast_children<'f, A: AstNode<'f>>(node: Node<'f>) -> Box<Iterator<Item=A> + 'f> {
|
|
||||||
Box::new(node.children().filter_map(A::new))
|
|
||||||
}
|
|
||||||
|
|
||||||
// AST elements, specific to Rust
|
|
||||||
|
|
||||||
#[derive(Clone, Copy)]
|
|
||||||
pub struct StructDef<'f>(Node<'f>);
|
|
||||||
|
|
||||||
#[derive(Clone, Copy)]
|
|
||||||
pub struct FieldDef<'f>(Node<'f>);
|
|
||||||
|
|
||||||
#[derive(Clone, Copy)]
|
|
||||||
pub struct TypeRef<'f>(Node<'f>);
|
|
||||||
|
|
||||||
pub trait NameOwner<'f>: AstNode<'f> {
|
|
||||||
fn name_ident(&self) -> Node<'f> {
|
|
||||||
child_of_kind(self.node(), IDENT).unwrap()
|
|
||||||
}
|
|
||||||
|
|
||||||
fn name(&self) -> &'f str { self.name_ident().text() }
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
impl<'f> AstNode<'f> for StructDef<'f> {
|
|
||||||
fn new(node: Node<'f>) -> Option<Self> {
|
|
||||||
if node.kind() == STRUCT_DEF { Some(StructDef(node)) } else { None }
|
|
||||||
}
|
|
||||||
fn node(&self) -> Node<'f> { self.0 }
|
|
||||||
}
|
|
||||||
|
|
||||||
impl<'f> NameOwner<'f> for StructDef<'f> {}
|
|
||||||
|
|
||||||
impl<'f> StructDef<'f> {
|
|
||||||
pub fn fields(&self) -> Box<Iterator<Item=FieldDef<'f>> + 'f> {
|
|
||||||
ast_children(self.node())
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
impl<'f> AstNode<'f> for FieldDef<'f> {
|
|
||||||
fn new(node: Node<'f>) -> Option<Self> {
|
|
||||||
if node.kind() == FIELD_DEF { Some(FieldDef(node)) } else { None }
|
|
||||||
}
|
|
||||||
fn node(&self) -> Node<'f> { self.0 }
|
|
||||||
}
|
|
||||||
|
|
||||||
impl<'f> FieldDef<'f> {
|
|
||||||
pub fn type_ref(&self) -> Option<TypeRef<'f>> {
|
|
||||||
ast_children(self.node()).next()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
impl<'f> NameOwner<'f> for FieldDef<'f> {}
|
|
||||||
|
|
||||||
|
|
||||||
impl<'f> AstNode<'f> for TypeRef<'f> {
|
|
||||||
fn new(node: Node<'f>) -> Option<Self> {
|
|
||||||
if node.kind() == TYPE_REF { Some(TypeRef(node)) } else { None }
|
|
||||||
}
|
|
||||||
fn node(&self) -> Node<'f> { self.0 }
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Note that although AST wrappers provide a type-safe access to the
|
|
||||||
tree, they are still represented as indexes, so clients of the syntax
|
|
||||||
tree can easily associated additional data with AST nodes by storing
|
|
||||||
it in a side-table.
|
|
||||||
|
|
||||||
|
|
||||||
## Missing Source Code
|
|
||||||
|
|
||||||
The crucial feature of this syntax tree is that it is just a view into
|
|
||||||
the original source code. And this poses a problem for the Rust
|
|
||||||
language, because not all compiled Rust code is represented in the
|
|
||||||
form of source code! Specifically, Rust has a powerful macro system,
|
|
||||||
which effectively allows to create and parse additional source code at
|
|
||||||
compile time. It is not entirely clear that the proposed parsing
|
|
||||||
framework is able to handle this use case, and it's the main purpose
|
|
||||||
of this RFC to figure it out. The current idea for handling macros is
|
|
||||||
to make each macro expansion produce a triple of (expansion text,
|
|
||||||
syntax tree, hygiene information), where hygiene information is a side
|
|
||||||
table, which colors different ranges of the expansion text according
|
|
||||||
to the original syntactic context.
|
|
||||||
|
|
||||||
|
|
||||||
## Implementation plan
|
|
||||||
|
|
||||||
This RFC proposes huge changes to the internals of the compiler, so
|
|
||||||
it's important to proceed carefully and incrementally. The following
|
|
||||||
plan is suggested:
|
|
||||||
|
|
||||||
* RFC discussion about the theoretical feasibility of the proposal,
|
|
||||||
and the best representation representation for the syntax tree.
|
|
||||||
|
|
||||||
* Implementation of the proposal as a completely separate crates.io
|
|
||||||
crate, by refactoring existing libsyntax source code to produce a
|
|
||||||
new tree.
|
|
||||||
|
|
||||||
* A prototype implementation of the macro expansion on top of the new
|
|
||||||
sytnax tree.
|
|
||||||
|
|
||||||
* Additional round of discussion/RFC about merging with the mainline
|
|
||||||
compiler.
|
|
||||||
|
|
||||||
|
|
||||||
# Drawbacks
|
|
||||||
[drawbacks]: #drawbacks
|
|
||||||
|
|
||||||
- No harm will be done as long as the new libsyntax exists as an
|
|
||||||
experiemt on crates.io. However, actually using it in the compiler
|
|
||||||
and other tools would require massive refactorings.
|
|
||||||
|
|
||||||
- It's difficult to know upfront if the proposed syntax tree would
|
|
||||||
actually work well in both the compiler and IDE. It may be possible
|
|
||||||
that some drawbacks will be discovered during implementation.
|
|
||||||
|
|
||||||
|
|
||||||
# Rationale and alternatives
|
|
||||||
[alternatives]: #alternatives
|
|
||||||
|
|
||||||
- Incrementally add more information about source code to the current
|
|
||||||
AST.
|
|
||||||
|
|
||||||
- Move the current libsyntax to crates.io as is. In the past, there
|
|
||||||
were several failed attempts to do that.
|
|
||||||
|
|
||||||
- Explore alternative representations for the parse tree.
|
|
||||||
|
|
||||||
- Use parser generator instead of hand written parser. Using the
|
|
||||||
parser from libsyntax directly would be easier, and hand-written
|
|
||||||
LL-style parsers usually have much better error recovery than
|
|
||||||
generated LR-style ones.
|
|
||||||
|
|
||||||
# Unresolved questions
|
|
||||||
[unresolved]: #unresolved-questions
|
|
||||||
|
|
||||||
- Is it at all possible to represent Rust parser as a pure function of
|
|
||||||
the source code? It seems like the answer is yes, because the
|
|
||||||
language and especially macros were cleverly designed with this
|
|
||||||
use-case in mind.
|
|
||||||
|
|
||||||
|
|
||||||
- Is it possible to implement macro expansion using the proposed
|
|
||||||
framework? This is the main question of this RFC. The proposed
|
|
||||||
solution of synthesizing source code on the fly seems workable: it's
|
|
||||||
not that different from the current implementation, which
|
|
||||||
synthesizes token trees.
|
|
||||||
|
|
||||||
|
|
||||||
- How to actually phase out current libsyntax, if libsyntax2.0 turns
|
|
||||||
out to be a success?
|
|
|
@ -1,44 +0,0 @@
|
||||||
# libsyntax2.0 testing infrastructure
|
|
||||||
|
|
||||||
Libsyntax2.0 tests are in the `tests/data` directory. Each test is a
|
|
||||||
pair of files, an `.rs` file with Rust code and a `.txt` file with a
|
|
||||||
human-readable representation of syntax tree.
|
|
||||||
|
|
||||||
The test suite is intended to be independent from a particular parser:
|
|
||||||
that's why it is just a list of files.
|
|
||||||
|
|
||||||
The test suite is intended to be progressive: that is, if you want to
|
|
||||||
write a Rust parser, you can TDD it by working through the test in
|
|
||||||
order. That's why each test file begins with the number. Generally,
|
|
||||||
tests should be added in order of the appearance of corresponding
|
|
||||||
functionality in libsytnax2.0. If a bug in parser is uncovered, a
|
|
||||||
**new** test should be created instead of modifying an existing one:
|
|
||||||
it is preferable to have a gazillion of small isolated test files,
|
|
||||||
rather than a single file which covers all edge cases. It's okay for
|
|
||||||
files to have the same name except for the leading number. In general,
|
|
||||||
test suite should be append-only: old tests should not be modified,
|
|
||||||
new tests should be created instead.
|
|
||||||
|
|
||||||
Note that only `ok` tests are normative: `err` tests test error
|
|
||||||
recovery and it is totally ok for a parser to not implement any error
|
|
||||||
recovery at all. However, for libsyntax2.0 we do care about error
|
|
||||||
recovery, and we do care about precise and useful error messages.
|
|
||||||
|
|
||||||
There are also so-called "inline tests". They appear as the comments
|
|
||||||
with a `test` header in the source code, like this:
|
|
||||||
|
|
||||||
```rust
|
|
||||||
// test fn_basic
|
|
||||||
// fn foo() {}
|
|
||||||
fn function(p: &mut Parser) {
|
|
||||||
// ...
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
You can run `cargo collect-tests` command to collect all inline tests
|
|
||||||
into `tests/data/inline` directory. The main advantage of inline tests
|
|
||||||
is that they help to illustrate what the relevant code is doing.
|
|
||||||
|
|
||||||
|
|
||||||
Contribution opportunity: design and implement testing infrastructure
|
|
||||||
for validators.
|
|
|
@ -1,36 +0,0 @@
|
||||||
# Tools used to implement libsyntax
|
|
||||||
|
|
||||||
libsyntax uses several tools to help with development.
|
|
||||||
|
|
||||||
Each tool is a binary in the [tools/](../tools) package.
|
|
||||||
You can run them via `cargo run` command.
|
|
||||||
|
|
||||||
```
|
|
||||||
cargo run --package tools --bin tool
|
|
||||||
```
|
|
||||||
|
|
||||||
There are also aliases in [./cargo/config](../.cargo/config),
|
|
||||||
so the following also works:
|
|
||||||
|
|
||||||
```
|
|
||||||
cargo tool
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
## Tool: `gen`
|
|
||||||
|
|
||||||
This tool reads a "grammar" from [grammar.ron](../grammar.ron) and
|
|
||||||
generates the `syntax_kinds.rs` file. You should run this tool if you
|
|
||||||
add new keywords or syntax elements.
|
|
||||||
|
|
||||||
|
|
||||||
## Tool: `parse`
|
|
||||||
|
|
||||||
This tool reads rust source code from the standard input, parses it,
|
|
||||||
and prints the result to stdout.
|
|
||||||
|
|
||||||
|
|
||||||
## Tool: `collect-tests`
|
|
||||||
|
|
||||||
This tools collect inline tests from comments in libsyntax2 source code
|
|
||||||
and places them into `tests/data/inline` directory.
|
|
Loading…
Reference in a new issue