|
1 | 1 | = Design Notes
|
2 | 2 |
|
3 |
| -== AST Traversal |
4 |
| - |
5 |
| -During the AST traversal stage, the complete AST (generated by the clang frontend) |
6 |
| -is walked beginning with the root `TranslationUnitDecl` node. It is during this |
7 |
| -stage that USRs (universal symbol references) are generated and hashed with SHA1 |
8 |
| -to form the 160 bit `SymbolID` for an entity. With the exception of built-in types, |
9 |
| -*all* entities referenced in the corpus will be traversed and be assigned a `SymbolID`; |
10 |
| -including those from the standard library. This is necessary to generate the |
11 |
| -full interface for user-defined types. |
12 |
| - |
13 |
| -== Bitcode |
14 |
| - |
15 |
| -AST traversal is performed in parallel on a per-translation-unit basis. |
16 |
| -To maximize the size of the code base MrDocs is capable of processing, `Info` |
17 |
| -types generated during traversal are serialized to a compressed bitcode representation. |
18 |
| -Once AST traversal is complete for all translation units, the bitcode is deserialized |
19 |
| -back into `Info` types, and then merged to form the corpus. The merging step is necessar |
20 |
| - as there may be multiple identical definitions of the same entity (e.g. for class types, |
21 |
| - templates, inline functions, etc), as well as functions declared in one translation |
22 |
| - unit & defined in another. |
23 |
| - |
24 |
| -== The Corpus |
25 |
| - |
26 |
| -After AST traversal and `Info` merging, the result is stored as a map of `Info`s |
27 |
| -indexed by their respective `SymbolID`s. Documentation generators may traverse |
28 |
| -this structure by calling `Corpus::traverse` with a `Corpus::Visitor` derived |
29 |
| -visitor and the `SymbolID` of the entity to visit (e.g. the global namespace). |
30 |
| - |
31 |
| -== Namespaces |
32 |
| - |
33 |
| -Namespaces do not have a source location. |
34 |
| -This is because there can be many namespaces. |
35 |
| -We probably don't want to store any javadocs for namespaces either. |
36 |
| - |
37 |
| -== Paths |
38 |
| - |
39 |
| -The AST visitor and metadata all use forward slashes to represent file |
40 |
| -pathnames, even on Windows. This is so the generated reference documentation |
41 |
| -does not vary based on the platform. |
42 |
| - |
43 |
| -== Exceptions |
44 |
| - |
45 |
| -Errors thrown by the program should always have type `Exception`. Objects |
46 |
| -of this type are capable of transporting an `Error` object. This is important |
47 |
| -for the scripting to work; exceptions are used to propagate errors from |
48 |
| -library code to scripts and back to the invoking code. For exceptional cases, |
49 |
| -these thrown exceptions should be uncaught. The tool installs an uncaught exception |
50 |
| -handler that prints a stack trace and exits the process immediately. |
| 3 | +== Why automate documentation? |
| 4 | + |
| 5 | +{cpp} API design is challenging. |
| 6 | +When you write a function signature, or declare a class, it is at that moment when you are likely to have as much information as you will ever have about what it is supposed to do. |
| 7 | +The more time passes before you write the documentation, the less likely you are to remember all the details of what motivated the API in the first place. |
| 8 | + |
| 9 | +In other words, because best and worst engineers are naturally lazy, the *temporal adjacency* of the {cpp} declaration to the documentation improves outcomes. |
| 10 | +For this reason, having the documentation be as close as possible to the declaration is ideal. |
| 11 | +That is, the *spatial adjacency* of the C++ declaration to the documentation improves outcomes because it facilitates temporal adjacency. |
| 12 | + |
| 13 | +In summary, the automated extraction of reference documentation from {cpp} declarations containing attached documentation comments is ideal because: |
| 14 | + |
| 15 | +* Temporally adjacent documentation is more comprehensive |
| 16 | +* Spatially adjacent documentation requires less effort to maintain |
| 17 | +* And causally connected documentation is more accurate |
| 18 | + |
| 19 | +From the perspective of engineers, however, the biggest advantage of automated documentation is that it implies a single source of truth for the API at a low cost. |
| 20 | +However, {cpp} codebases are notoriously difficult to document automatically because of constructs where the code needs to diverge from the intended API that represents the contract with the user. |
| 21 | + |
| 22 | +Tools like Doxygen typically require workarounds and manual intervention via preprocessor macros to get the documentation right. |
| 23 | +These workarounds are problematic because they effectively mean that there are two versions of the code: the well-formed and the ill-formed versions. |
| 24 | +This subverts the single sources of truth for the code. |
| 25 | + |
| 26 | +|=== |
| 27 | +| | Mr. Docs | Automatic | Manual | No Reference |
| 28 | + |
| 29 | +| No workarounds | ✅ | ❌ | ✅ | ❌ |
| 30 | +| Nice for users | ✅ | ✅ | ✅ | ❌ |
| 31 | +| Single Source of Truth | ✅ | ❌ | ❌ | ❌ |
| 32 | +| Less Work for Developers | ✅ | ❌ | ❌ | ✅ |
| 33 | +| Always up-to-date | ✅ | ✅ | ❌ | ❌ |
| 34 | +|=== |
| 35 | + |
| 36 | +* By providing no reference to users, they are forced to read header files to understand the API. |
| 37 | +This is a problem because header files are not always the best source of truth for the API. |
| 38 | +Developers familiar with https://cppreference.com[cppreference.com,window=_blank] will know that the documentation there is often more informative than the header files. |
| 39 | +* A common alternative is to provide a manual reference to the API. |
| 40 | +Developers duplicate the signatures, which requires extra work. |
| 41 | +This strategy tends to work well for small libraries and allows the developer to directly express the contract with the user. |
| 42 | +However, as the single source of truth is lost, it becomes unmanageable as the codebase grows. |
| 43 | +When the declaration changes, they forget to edit the docs, and the reference becomes out of date. |
| 44 | +* In this case, it looks like the best solution is to automate the documentation. |
| 45 | +The causal connection between the declaration and the generated documentation improves outcomes. |
| 46 | +That's when developers face difficulties with existing tools like Doxygen, which require workarounds and manual intervention to get the documentation right. |
| 47 | +The workarounds mean that there are two versions of the code: the well-formed and the ill-formed versions. |
| 48 | +* The philosophy behind MrDocs is to provide solutions to these workarounds so that the single source of truth can be maintained with minimal effort by developers. |
| 49 | +With Mr.Docs, the documented {cpp} code should be the same as the code that is compiled. |
| 50 | + |
| 51 | +== Customization |
| 52 | + |
| 53 | +Once the documentation is extracted from the code, it is necessary to format it in a way that is useful to the user. |
| 54 | +This usually involves generating output files that match the content of the documentation to the user's needs. |
| 55 | + |
| 56 | +A limitation of existing tools like Doxygen is that their output formats are not very customizable. |
| 57 | +It supports minor customization in the output style and, for custom content formats, it requires much more complex workflows, like generating XML files and writing secondary applications to process them. |
| 58 | + |
| 59 | +MrDocs attempts to support multiple output formats and customization options. |
| 60 | +In practice, MrDocs attempts to provide three levels of customization: |
| 61 | + |
| 62 | +* At the first level, the user can use an existing generator and customize its templates and helper functions. |
| 63 | +* The user can write a MrDocs plugin at a second level that defines a new generator. |
| 64 | +* At a third level, it can use a generator for a structured file format and consume the output in a more uncomplicated secondary application or script. |
| 65 | + |
| 66 | +These multiple levels of complexity mean developers can worry only about the documentation content. |
| 67 | +In practice, developers rarely need new generators and are usually only interested in changing how an existing generator formats the output. |
| 68 | +Removing the necessity of writing and maintaining a secondary application only to customize the output via XML files can save these developers an immense amount of time. |
| 69 | + |
| 70 | +== {cpp} Constructs |
| 71 | + |
| 72 | +To deal with the complexity of {cpp} constructs, MrDocs uses clang's libtooling API. |
| 73 | +That means MrDocs understands all {cpp}: if clang can compile it, MrDocs knows about it. |
| 74 | + |
| 75 | +Here are a few constructs MrDocs attempts to understand and document automatically from the metadata: |
| 76 | + |
| 77 | +* Overload sets |
| 78 | +* Private APIs |
| 79 | +* Unspecified Return Types |
| 80 | +* Concepts |
| 81 | +* Typedef / Aliases |
| 82 | +* Constants |
| 83 | +* Automatic Related Types |
| 84 | +* Macros |
| 85 | +* SFINAE |
| 86 | +* Hidden Base Classes |
| 87 | +* Hidden EBO |
| 88 | +* Niebloids |
| 89 | +* Coroutines |
| 90 | + |
0 commit comments