Skip to content

Commit 608a925

Browse files
committed
docs: design notes includes rationale and philosophy
fix #154, #615
1 parent a3fd0ce commit 608a925

File tree

1 file changed

+88
-48
lines changed

1 file changed

+88
-48
lines changed
+88-48
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,90 @@
11
= Design Notes
22

3-
== AST Traversal
4-
5-
During the AST traversal stage, the complete AST (generated by the clang frontend)
6-
is walked beginning with the root `TranslationUnitDecl` node. It is during this
7-
stage that USRs (universal symbol references) are generated and hashed with SHA1
8-
to form the 160 bit `SymbolID` for an entity. With the exception of built-in types,
9-
*all* entities referenced in the corpus will be traversed and be assigned a `SymbolID`;
10-
including those from the standard library. This is necessary to generate the
11-
full interface for user-defined types.
12-
13-
== Bitcode
14-
15-
AST traversal is performed in parallel on a per-translation-unit basis.
16-
To maximize the size of the code base MrDocs is capable of processing, `Info`
17-
types generated during traversal are serialized to a compressed bitcode representation.
18-
Once AST traversal is complete for all translation units, the bitcode is deserialized
19-
back into `Info` types, and then merged to form the corpus. The merging step is necessar
20-
as there may be multiple identical definitions of the same entity (e.g. for class types,
21-
templates, inline functions, etc), as well as functions declared in one translation
22-
unit & defined in another.
23-
24-
== The Corpus
25-
26-
After AST traversal and `Info` merging, the result is stored as a map of `Info`s
27-
indexed by their respective `SymbolID`s. Documentation generators may traverse
28-
this structure by calling `Corpus::traverse` with a `Corpus::Visitor` derived
29-
visitor and the `SymbolID` of the entity to visit (e.g. the global namespace).
30-
31-
== Namespaces
32-
33-
Namespaces do not have a source location.
34-
This is because there can be many namespaces.
35-
We probably don't want to store any javadocs for namespaces either.
36-
37-
== Paths
38-
39-
The AST visitor and metadata all use forward slashes to represent file
40-
pathnames, even on Windows. This is so the generated reference documentation
41-
does not vary based on the platform.
42-
43-
== Exceptions
44-
45-
Errors thrown by the program should always have type `Exception`. Objects
46-
of this type are capable of transporting an `Error` object. This is important
47-
for the scripting to work; exceptions are used to propagate errors from
48-
library code to scripts and back to the invoking code. For exceptional cases,
49-
these thrown exceptions should be uncaught. The tool installs an uncaught exception
50-
handler that prints a stack trace and exits the process immediately.
3+
== Why automate documentation?
4+
5+
{cpp} API design is challenging.
6+
When you write a function signature, or declare a class, it is at that moment when you are likely to have as much information as you will ever have about what it is supposed to do.
7+
The more time passes before you write the documentation, the less likely you are to remember all the details of what motivated the API in the first place.
8+
9+
In other words, because best and worst engineers are naturally lazy, the *temporal adjacency* of the {cpp} declaration to the documentation improves outcomes.
10+
For this reason, having the documentation be as close as possible to the declaration is ideal.
11+
That is, the *spatial adjacency* of the C++ declaration to the documentation improves outcomes because it facilitates temporal adjacency.
12+
13+
In summary, the automated extraction of reference documentation from {cpp} declarations containing attached documentation comments is ideal because:
14+
15+
* Temporally adjacent documentation is more comprehensive
16+
* Spatially adjacent documentation requires less effort to maintain
17+
* And causally connected documentation is more accurate
18+
19+
From the perspective of engineers, however, the biggest advantage of automated documentation is that it implies a single source of truth for the API at a low cost.
20+
However, {cpp} codebases are notoriously difficult to document automatically because of constructs where the code needs to diverge from the intended API that represents the contract with the user.
21+
22+
Tools like Doxygen typically require workarounds and manual intervention via preprocessor macros to get the documentation right.
23+
These workarounds are problematic because they effectively mean that there are two versions of the code: the well-formed and the ill-formed versions.
24+
This subverts the single sources of truth for the code.
25+
26+
|===
27+
| | Mr. Docs | Automatic | Manual | No Reference
28+
29+
| No workarounds | ✅ | ❌ | ✅ | ❌
30+
| Nice for users | ✅ | ✅ | ✅ | ❌
31+
| Single Source of Truth | ✅ | ❌ | ❌ | ❌
32+
| Less Work for Developers | ✅ | ❌ | ❌ | ✅
33+
| Always up-to-date | ✅ | ✅ | ❌ | ❌
34+
|===
35+
36+
* By providing no reference to users, they are forced to read header files to understand the API.
37+
This is a problem because header files are not always the best source of truth for the API.
38+
Developers familiar with https://cppreference.com[cppreference.com,window=_blank] will know that the documentation there is often more informative than the header files.
39+
* A common alternative is to provide a manual reference to the API.
40+
Developers duplicate the signatures, which requires extra work.
41+
This strategy tends to work well for small libraries and allows the developer to directly express the contract with the user.
42+
However, as the single source of truth is lost, it becomes unmanageable as the codebase grows.
43+
When the declaration changes, they forget to edit the docs, and the reference becomes out of date.
44+
* In this case, it looks like the best solution is to automate the documentation.
45+
The causal connection between the declaration and the generated documentation improves outcomes.
46+
That's when developers face difficulties with existing tools like Doxygen, which require workarounds and manual intervention to get the documentation right.
47+
The workarounds mean that there are two versions of the code: the well-formed and the ill-formed versions.
48+
* The philosophy behind MrDocs is to provide solutions to these workarounds so that the single source of truth can be maintained with minimal effort by developers.
49+
With Mr.Docs, the documented {cpp} code should be the same as the code that is compiled.
50+
51+
== Customization
52+
53+
Once the documentation is extracted from the code, it is necessary to format it in a way that is useful to the user.
54+
This usually involves generating output files that match the content of the documentation to the user's needs.
55+
56+
A limitation of existing tools like Doxygen is that their output formats are not very customizable.
57+
It supports minor customization in the output style and, for custom content formats, it requires much more complex workflows, like generating XML files and writing secondary applications to process them.
58+
59+
MrDocs attempts to support multiple output formats and customization options.
60+
In practice, MrDocs attempts to provide three levels of customization:
61+
62+
* At the first level, the user can use an existing generator and customize its templates and helper functions.
63+
* The user can write a MrDocs plugin at a second level that defines a new generator.
64+
* At a third level, it can use a generator for a structured file format and consume the output in a more uncomplicated secondary application or script.
65+
66+
These multiple levels of complexity mean developers can worry only about the documentation content.
67+
In practice, developers rarely need new generators and are usually only interested in changing how an existing generator formats the output.
68+
Removing the necessity of writing and maintaining a secondary application only to customize the output via XML files can save these developers an immense amount of time.
69+
70+
== {cpp} Constructs
71+
72+
To deal with the complexity of {cpp} constructs, MrDocs uses clang's libtooling API.
73+
That means MrDocs understands all {cpp}: if clang can compile it, MrDocs knows about it.
74+
75+
Here are a few constructs MrDocs attempts to understand and document automatically from the metadata:
76+
77+
* Overload sets
78+
* Private APIs
79+
* Unspecified Return Types
80+
* Concepts
81+
* Typedef / Aliases
82+
* Constants
83+
* Automatic Related Types
84+
* Macros
85+
* SFINAE
86+
* Hidden Base Classes
87+
* Hidden EBO
88+
* Niebloids
89+
* Coroutines
90+

0 commit comments

Comments
 (0)