docs: design notes includes rationale and philosophy

alandefreitas · alandefreitas · commit 608a9259761c · 2024-06-11T20:41:36.000-03:00
fix #154, #615
diff --git a/docs/modules/ROOT/pages/design-notes.adoc b/docs/modules/ROOT/pages/design-notes.adoc
@@ -1,50 +1,90 @@
 = Design Notes
 
-== AST Traversal
-
-During the AST traversal stage, the complete AST (generated by the clang frontend) 
-is walked beginning with the root `TranslationUnitDecl` node. It is during this
-stage that USRs (universal symbol references) are generated and hashed with SHA1
-to form the 160 bit `SymbolID` for an entity. With the exception of built-in types,
-*all* entities referenced in the corpus will be traversed and be assigned a `SymbolID`;
-including those from the standard library. This is necessary to generate the
-full interface for user-defined types.
-
-== Bitcode
-
-AST traversal is performed in parallel on a per-translation-unit basis.
-To maximize the size of the code base MrDocs is capable of processing, `Info`
-types generated during traversal are serialized to a compressed bitcode representation.
-Once AST traversal is complete for all translation units, the bitcode is deserialized
-back into `Info` types, and then merged to form the corpus. The merging step is necessar
- as there may be multiple identical definitions of the same entity (e.g. for class types,
- templates, inline functions, etc), as well as functions declared in one translation
- unit & defined in another.
-
-== The Corpus
-
-After AST traversal and `Info` merging, the result is stored as a map of `Info`s
-indexed by their respective `SymbolID`s. Documentation generators may traverse
-this structure by calling `Corpus::traverse` with a `Corpus::Visitor` derived
-visitor and the `SymbolID` of the entity to visit (e.g. the global namespace).
-
-== Namespaces
-
-Namespaces do not have a source location.
-This is because there can be many namespaces.
-We probably don't want to store any javadocs for namespaces either.
-
-== Paths
-
-The AST visitor and metadata all use forward slashes to represent file
-pathnames, even on Windows. This is so the generated reference documentation
-does not vary based on the platform.
-
-== Exceptions
-
-Errors thrown by the program should always have type `Exception`. Objects
-of this type are capable of transporting an `Error` object. This is important
-for the scripting to work; exceptions are used to propagate errors from
-library code to scripts and back to the invoking code. For exceptional cases,
-these thrown exceptions should be uncaught. The tool installs an uncaught exception
-handler that prints a stack trace and exits the process immediately.
+== Why automate documentation?
+
+{cpp} API design is challenging.
+When you write a function signature, or declare a class, it is at that moment when you are likely to have as much information as you will ever have about what it is supposed to do.
+The more time passes before you write the documentation, the less likely you are to remember all the details of what motivated the API in the first place.
+
+In other words, because best and worst engineers are naturally lazy, the *temporal adjacency* of the {cpp} declaration to the documentation improves outcomes.
+For this reason, having the documentation be as close as possible to the declaration is ideal.
+That is, the *spatial adjacency* of the C++ declaration to the documentation improves outcomes because it facilitates temporal adjacency.
+
+In summary, the automated extraction of reference documentation from {cpp} declarations containing attached documentation comments is ideal because:
+
+* Temporally adjacent documentation is more comprehensive
+* Spatially adjacent documentation requires less effort to maintain
+* And causally connected documentation is more accurate
+
+From the perspective of engineers, however, the biggest advantage of automated documentation is that it implies a single source of truth for the API at a low cost.
+However, {cpp} codebases are notoriously difficult to document automatically because of constructs where the code needs to diverge from the intended API that represents the contract with the user.
+
+Tools like Doxygen typically require workarounds and manual intervention via preprocessor macros to get the documentation right.
+These workarounds are problematic because they effectively mean that there are two versions of the code: the well-formed and the ill-formed versions.
+This subverts the single sources of truth for the code.
+
+|===
+|  | Mr. Docs | Automatic | Manual | No Reference
+
+| No workarounds | ✅ | ❌ | ✅ | ❌
+| Nice for users | ✅ | ✅ | ✅ | ❌
+| Single Source of Truth | ✅ | ❌ | ❌ | ❌
+| Less Work for Developers | ✅ | ❌ | ❌ | ✅
+| Always up-to-date | ✅ | ✅ | ❌ | ❌
+|===
+
+* By providing no reference to users, they are forced to read header files to understand the API.
+This is a problem because header files are not always the best source of truth for the API.
+Developers familiar with https://cppreference.com[cppreference.com,window=_blank] will know that the documentation there is often more informative than the header files.
+* A common alternative is to provide a manual reference to the API.
+Developers duplicate the signatures, which requires extra work.
+This strategy tends to work well for small libraries and allows the developer to directly express the contract with the user.
+However, as the single source of truth is lost, it becomes unmanageable as the codebase grows.
+When the declaration changes, they forget to edit the docs, and the reference becomes out of date.
+* In this case, it looks like the best solution is to automate the documentation.
+The causal connection between the declaration and the generated documentation improves outcomes.
+That's when developers face difficulties with existing tools like Doxygen, which require workarounds and manual intervention to get the documentation right.
+The workarounds mean that there are two versions of the code: the well-formed and the ill-formed versions.
+* The philosophy behind MrDocs is to provide solutions to these workarounds so that the single source of truth can be maintained with minimal effort by developers.
+With Mr.Docs, the documented {cpp} code should be the same as the code that is compiled.
+
+== Customization
+
+Once the documentation is extracted from the code, it is necessary to format it in a way that is useful to the user.
+This usually involves generating output files that match the content of the documentation to the user's needs.
+
+A limitation of existing tools like Doxygen is that their output formats are not very customizable.
+It supports minor customization in the output style and, for custom content formats, it requires much more complex workflows, like generating XML files and writing secondary applications to process them.
+
+MrDocs attempts to support multiple output formats and customization options.
+In practice, MrDocs attempts to provide three levels of customization:
+
+* At the first level, the user can use an existing generator and customize its templates and helper functions.
+* The user can write a MrDocs plugin at a second level that defines a new generator.
+* At a third level, it can use a generator for a structured file format and consume the output in a more uncomplicated secondary application or script.
+
+These multiple levels of complexity mean developers can worry only about the documentation content.
+In practice, developers rarely need new generators and are usually only interested in changing how an existing generator formats the output.
+Removing the necessity of writing and maintaining a secondary application only to customize the output via XML files can save these developers an immense amount of time.
+
+== {cpp} Constructs
+
+To deal with the complexity of {cpp} constructs, MrDocs uses clang's libtooling API.
+That means MrDocs understands all {cpp}: if clang can compile it, MrDocs knows about it.
+
+Here are a few constructs MrDocs attempts to understand and document automatically from the metadata:
+
+* Overload sets
+* Private APIs
+* Unspecified Return Types
+* Concepts
+* Typedef / Aliases
+* Constants
+* Automatic Related Types
+* Macros
+* SFINAE
+* Hidden Base Classes
+* Hidden EBO
+* Niebloids
+* Coroutines
+