Freshen Architecture.md document

2024-12-25 20:43:21 +00:00 · 2020-01-29 15:08:31 +01:00 · 2020-01-29 15:08:31 +01:00 · 84dfbfbd1d
commit 84dfbfbd1d
parent 1065c2bf1d
2 changed files with 45 additions and 38 deletions
--- a/docs/dev/README.md
+++ b/docs/dev/README.md
@ -106,6 +106,10 @@ communication, and `print!` would break it.
 If I need to fix something simultaneously in the server and in the client, I
 feel even more sad. I don't have a specific workflow for this case.

+Additionally, I use `cargo run --release -p ra_cli -- analysis-stats
+path/to/some/rust/crate` to run a batch analysis. This is primaraly useful for
+performance optimiations, or for bug minimization.
+
 # Logging

 Logging is done by both rust-analyzer and VS Code, so it might be tricky to
--- a/docs/dev/architecture.md
+++ b/docs/dev/architecture.md
@ -12,6 +12,9 @@ analyzer:

 https://www.youtube.com/playlist?list=PL85XCvVPmGQho7MZkdW-wtPtuJcFpzycE

+Note that the guide and videos are pretty dated, this document should be in
+generally fresher.
+
 ## The Big Picture

 ![](https://user-images.githubusercontent.com/1711539/50114578-e8a34280-0255-11e9-902c-7cfc70747966.png)
@ -20,13 +23,12 @@ On the highest level, rust-analyzer is a thing which accepts input source code
 from the client and produces a structured semantic model of the code.

 More specifically, input data consists of a set of test files (`(PathBuf,
-String)` pairs) and information about project structure, captured in the so called
-`CrateGraph`. The crate graph specifies which files are crate roots, which cfg
-flags are specified for each crate (TODO: actually implement this) and what
-dependencies exist between the crates. The analyzer keeps all this input data in
-memory and never does any IO. Because the input data is source code, which
-typically measures in tens of megabytes at most, keeping all input data in
-memory is OK.
+String)` pairs) and information about project structure, captured in the so
+called `CrateGraph`. The crate graph specifies which files are crate roots,
+which cfg flags are specified for each crate and what dependencies exist between
+the crates. The analyzer keeps all this input data in memory and never does any
+IO. Because the input data are source code, which typically measures in tens of
+megabytes at most, keeping everything in memory is OK.

 A "structured semantic model" is basically an object-oriented representation of
 modules, functions and types which appear in the source code. This representation
@ -43,37 +45,39 @@ can be quickly updated for small modifications.
 ## Code generation

 Some of the components of this repository are generated through automatic
-processes. These are outlined below:
+processes. `cargo xtask codegen` runs all generation tasks. Generated code is
+commited to the git repository.

- `cargo xtask codegen`: The kinds of tokens that are reused in several places, so a generator
-  is used. We use `quote!` macro to generate the files listed below, based on
-  the grammar described in [grammar.ron]:
-  - [ast/generated.rs][ast generated]
-  - [syntax_kind/generated.rs][syntax_kind generated]
+In particular, `cargo xtask codegen` generates:

-[grammar.ron]: ../../crates/ra_syntax/src/grammar.ron
-[ast generated]: ../../crates/ra_syntax/src/ast/generated.rs
-[syntax_kind generated]: ../../crates/ra_parser/src/syntax_kind/generated.rs
+1. [`syntax_kind/generated`](https://github.com/rust-analyzer/rust-analyzer/blob/a0be39296d2925972cacd9fbf8b5fb258fad6947/crates/ra_parser/src/syntax_kind/generated.rs)
+  -- the set of terminals and non-terminals of rust grammar.
+
+2. [`ast/generated`](https://github.com/rust-analyzer/rust-analyzer/blob/a0be39296d2925972cacd9fbf8b5fb258fad6947/crates/ra_syntax/src/ast/generated.rs)
+  -- AST data structure.
+
+.3 [`doc_tests/generated`](https://github.com/rust-analyzer/rust-analyzer/blob/a0be39296d2925972cacd9fbf8b5fb258fad6947/crates/ra_assists/src/doc_tests/generated.rs),
+  [`test_data/parser/inline`](https://github.com/rust-analyzer/rust-analyzer/tree/a0be39296d2925972cacd9fbf8b5fb258fad6947/crates/ra_syntax/test_data/parser/inline)
+  -- tests for assists and the parser.
+
+The source for 1 and 2 is in [`ast_src.rs`](https://github.com/rust-analyzer/rust-analyzer/blob/a0be39296d2925972cacd9fbf8b5fb258fad6947/xtask/src/ast_src.rs).

 ## Code Walk-Through

 ### `crates/ra_syntax`, `crates/ra_parser`

 Rust syntax tree structure and parser. See
-[RFC](https://github.com/rust-lang/rfcs/pull/2256) for some design notes.
+[RFC](https://github.com/rust-lang/rfcs/pull/2256) and [./syntax.md](./syntax.md) for some design notes.

 - [rowan](https://github.com/rust-analyzer/rowan) library is used for constructing syntax trees.
 - `grammar` module is the actual parser. It is a hand-written recursive descent parser, which
  produces a sequence of events like "start node X", "finish node Y". It works similarly to [kotlin's parser](https://github.com/JetBrains/kotlin/blob/4d951de616b20feca92f3e9cc9679b2de9e65195/compiler/frontend/src/org/jetbrains/kotlin/parsing/KotlinParsing.java),
  which is a good source of inspiration for dealing with syntax errors and incomplete input. Original [libsyntax parser](https://github.com/rust-lang/rust/blob/6b99adeb11313197f409b4f7c4083c2ceca8a4fe/src/libsyntax/parse/parser.rs)
  is what we use for the definition of the Rust language.
- `parser_api/parser_impl` bridges the tree-agnostic parser from `grammar` with `rowan` trees.
-  This is the thing that turns a flat list of events into a tree (see `EventProcessor`)
+- `TreeSink` and `TokenSource` traits bridge the tree-agnostic parser from `grammar` with `rowan` trees.
 - `ast` provides a type safe API on top of the raw `rowan` tree.
- `grammar.ron` RON description of the grammar, which is used to
-  generate `syntax_kinds` and `ast` modules, using `cargo xtask codegen` command.
- `algo`: generic tree algorithms, including `walk` for O(1) stack
-  space tree traversal (this is cool).
+- `ast_src` description of the grammar, which is used to generate `syntax_kinds`
+  and `ast` modules, using `cargo xtask codegen` command.

 Tests for ra_syntax are mostly data-driven: `test_data/parser` contains subdirectories with a bunch of `.rs`
 (test vectors) and `.txt` files with corresponding syntax trees. During testing, we check
@ -81,6 +85,10 @@ Tests for ra_syntax are mostly data-driven: `test_data/parser` contains subdirec
 tests). Additionally, running `cargo xtask codegen` will walk the grammar module and collect
 all `// test test_name` comments into files inside `test_data/parser/inline` directory.

+Note
+[`api_walkthrough`](https://github.com/rust-analyzer/rust-analyzer/blob/2fb6af89eb794f775de60b82afe56b6f986c2a40/crates/ra_syntax/src/lib.rs#L190-L348)
+in particular: it shows off various methods of working with syntax tree.
+
 See [#93](https://github.com/rust-analyzer/rust-analyzer/pull/93) for an example PR which
 fixes a bug in the grammar.

@ -94,18 +102,22 @@ defines most of the "input" queries: facts supplied by the client of the
 analyzer. Reading the docs of the `ra_db::input` module should be useful:
 everything else is strictly derived from those inputs.

-### `crates/ra_hir`
+### `crates/ra_hir*` crates

 HIR provides high-level "object oriented" access to Rust code.

 The principal difference between HIR and syntax trees is that HIR is bound to a
-particular crate instance. That is, it has cfg flags and features applied (in
-theory, in practice this is to be implemented). So, the relation between
-syntax and HIR is many-to-one. The `source_binder` module is responsible for
-guessing a HIR for a particular source position.
+particular crate instance. That is, it has cfg flags and features applied. So,
+the relation between syntax and HIR is many-to-one. The `source_binder` module
+is responsible for guessing a HIR for a particular source position.

 Underneath, HIR works on top of salsa, using a `HirDatabase` trait.

+`ra_hir_xxx` crates have a strong ECS flavor, in that they work with raw ids and
+directly query the databse.
+
+The top-level `ra_hir` façade crate wraps ids into a more OO-flavored API.
+
 ### `crates/ra_ide`

 A stateful library for analyzing many Rust files as they change. `AnalysisHost`
@ -135,18 +147,9 @@ different from data on disk. This is more or less the single really
 platform-dependent component, so it lives in a separate repository and has an
 extensive cross-platform CI testing.

-### `crates/gen_lsp_server`
-
-A language server scaffold, exposing a synchronous crossbeam-channel based API.
-This crate handles protocol handshaking and parsing messages, while you
-control the message dispatch loop yourself.
-
-Run with `RUST_LOG=sync_lsp_server=debug` to see all the messages.
-
 ### `crates/ra_cli`

-A CLI interface to rust-analyzer.
-
+A CLI interface to rust-analyzer, mainly for testing.

 ## Testing Infrastructure