-
-
Notifications
You must be signed in to change notification settings - Fork 644
Enable automatic compiling of imports #7099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for your pull request, @marler8997! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Some tips to help speed things up:
Bear in mind that large or tricky changes may require multiple rounds of review and revision. Please see CONTRIBUTING.md for more information. Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. |
cc4c374
to
572167b
Compare
src/ddmd/globals.d
Outdated
@@ -74,6 +74,7 @@ struct Param | |||
{ | |||
bool obj; // write object file | |||
bool link; // perform link | |||
bool compileimports; // compile imported modules that have not been passed in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The D modules and the C++ headers need to be kept in sync. You need to update src/ddmd/globals.h
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks for catching that.
Looks like you need to rebase your branch on top of the latest master. Most probably, this refactoring is the cause for the git conflict. I think you need to move the parsing of the cmd-line to the new |
dd91cc1
to
ce11470
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of minor nits. Do you need help adding test cases?
src/ddmd/dmodule.d
Outdated
@@ -381,6 +381,7 @@ extern (C++) final class Module : Package | |||
// i.e. a module that will be taken all the | |||
// way to an object file | |||
Module importedFrom; | |||
bool compiledImport; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A documentation comment wouldn't hurt
src/ddmd/mars.d
Outdated
{ | ||
auto importedModulesToCompile = Modules(); | ||
fprintf(global.stdmsg, "compileimports: start\n"); | ||
for (size_t i = 0; i < Module.amodules.dim; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't be afraid of using D features - the current codebase just doesn't in many places because it was automatically transpiled from C++
foreach(i; 0 .. Module.amodules.dim)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a search in all of ddmd and found like 1 or 2 instances of foreach
which really surprised me. Good to know it's ok to use the D feature in this case.
src/ddmd/mars.d
Outdated
continue; | ||
} | ||
fprintf(global.stdmsg, "compileimports: compiling \"%s\" (file \"%s\" importedFrom \"%s\")\n", | ||
m.toChars(), m.srcfile.toChars, m.importedFrom.toChars()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Phobos the coding style is to avoid unnecessary parentheses, for DMD there isn't a style guide yet, but it's probably a good idea to be consistent within the same line
So please name it |
@wilzbach I was planning on adding test cases when I got some response from a maintainer that this was a desirable feature. If you'd like to help guide/contribute to any cases then advice and PR's are very welcome (Note I'm not that familiar with the test infrastructure for dmd). |
55dec74
to
5730200
Compare
Isn't this what rdmd is for? |
@ibuclaw It's part of what rdmd does (another big thing rdmd does is automatically cache output files, which the compiler does not do). This PR moves that part of rdmd into the compler itself. One big advantage of this is it solves the performance problems we have with rdmd since it currently has to invoke the compiler twice, once to get the imports and once to compile with them. This feature allows rdmd to tell the compiler to just include the missing imports, no need to invoke the compiler again. It also makes alot of other use cases much simpler when building programs without rdmd. |
@andralex and @WalterBright @marler8997 |
I think that the implementation should be part of the driver, not the frontend. All changes should be moved to mars.d |
There must be a clear case for impact, i.e. to what extent this feature is superior and desirable to rdmd. Alternatively, it should be clarified how this tool could simplify rdmd to eliminate redundant logic. As things are, this feature competes with a subset of rdmd while still making it necessary. |
The impact to rdmd is that it doesn't have to invoke the compiler twice. However this feature isn't just about rdmd, it's also about making compilation simpler by invoking the DRY principle. Instead of maintaining a list of modules to compile as well as a list of imports, you use the imports themselves to determine what needs to be compiled. A side effect of this is that rdmd can use this feature instead of invoking the compiler twice but I wouldn't say that is the primary benefit. |
How long does the first invocation take?
That's also the charter of rdmd. |
For a single file, it's about 40% of the runtime.
So while this PR isn't ready for prime time yet, I do think that the idea itself is a worthwhile goal and could become quite powerful due to the performance savings. CCing the people who have been involved in the last discussion: @CyberShadow @AndrejMitrovic @aG0aep6G @mihails-strasuns |
AFAIR main purpose of rdmd was to support efficient shebang script usage (with caching resulting binary instead of recompilation each time it used). Compilation of dependencies came as a necessary requirement to make it work, not as primary functionality. I agree that it makes more sense to put it in compiler while focusing rdmd on its original purpose. |
5730200
to
c04e1c5
Compare
Thanks for working on this, @marler8997! I agree that this is a logical change that should have been made long ago, and it will improve the quality of life for all D developers not using Dub to build their code. One point in previous discussions is that there should be some way to specify exclusions, for packages that are going to be linked in as libraries and don't need to be complied again. This applies to Phobos/Druntime ( |
Disagree. The addition of this option has the potential to obsolete rdmd, and it is usable directly as well as by other build tools, so a short switch would be a boon for this functionality. |
Thanks for chiming in and showing support. I had the same thoughts about exclusions. Another thought I had was when you enable "compile imports", the compiler could read the object/library files and figure out which modules they include based on the existence of a particular symbol. I believe there is a special symbol that each module generates, something like "ModuleInfo". This would add overhead to the compiler to read those files, not sure how much of an impact it would have on performance, but this is the most robus solution I've come up with. |
Wouldn't this mean that the compiler would need to read and parse the object/library files, rather than just passing them to the linker? Sounds like a lot of work. |
On the high level aspect, I'm not particularly a big fan. I really don't like the idea of conflating compiler and build system. On technical, fine. Its isolated enough, though I have alluded to |
c630799
to
4eec8ca
Compare
(closed & re-opened, s.t. the newly added SemaphoreCI can find this PR) |
The approach to push more modules to semantic3 might easily introduce subtle bugs, e.g. by breaking assumptions about a fixed set of root modules or by missing semantic steps. |
Most of the work done for this feature was in the dmd test suite. The test suite was modified to allow a test to be run twice, with and without Furthermore, the list of root modules is a local variable in Also, performing extra semantic analysis on more modules than just the initial ones passed on the command line is already being done when the |
// array of module patterns used to include/exclude imported modules | ||
private __gshared Array!(const(char)*) includeModulePatterns; | ||
private __gshared Modules compiledImports; | ||
private extern(C++) bool marsOnImport(Module m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we approving things with no Ddoc function headers?
@@ -2125,3 +2168,278 @@ private bool parseCommandLine(const ref Strings arguments, const size_t argc, re | |||
return errors; | |||
} | |||
|
|||
|
|||
private __gshared bool includeImports = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This random assortment of undocumented globals is more technical debt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This boolean
and array represent all the data we need to know "compile imports" is enabled and all the patterns that were given. Are you wanting them to be put in a struct as static variables? What's the "debt" you're seeing that should be alleviated?
@@ -2125,3 +2168,278 @@ private bool parseCommandLine(const ref Strings arguments, const size_t argc, re | |||
return errors; | |||
} | |||
|
|||
|
|||
private __gshared bool includeImports = false; | |||
// array of module patterns used to include/exclude imported modules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what a "module pattern" is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A module pattern is a sequence of identifiers that are matched against the first part of a any module name. I tried to clearly define it in the associated documentation https://dlang.org/dmd-windows.html#switch-i[
if (index < packages.dim) | ||
return (*packages)[index]; | ||
if (index == packages.dim) | ||
return name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's going on here? There's no mention of this in the comments.
|
||
/* | ||
* Determines if the given module should be included in the compilation. | ||
* Returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can a ModuleComponentRange
be considered a "given module" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No Params: section.
} | ||
|
||
// Matching module names is done with an array of matcher nodes. | ||
// The nodes are sorted by "component depth" from largest to smallest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a "component depth"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "depth" is the number of components in a module name and/or pattern (i.e. foo.bar.baz
has 3 components, so a depth of 3). It's been so long I'm not sure why the term "depth" was chosen.
struct | ||
{ | ||
ushort depth; | ||
bool isExclude; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does isExclude
for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isExclude
indicates whether the pattern is an "exclusive" (i.e. -i=-foo.bar
) pattern, vs an "inclusive" one (i.e. -i=foo.bar
). The documentation will explain better, but in essence, patterns are prioritized by their component length and are either "inclusive" or "exclusive". Say you want to include all modules from package foo except for all modules in the foo.bar
sub-path.
dmd -i=foo -i=-foo.bar
Then foo
would be an inclusive pattern and -i=-foo.bar
would be an exclusive pattern. Since the algorithm works by sorting these "module patterns" by their component length, the "foo.bar" pattern will be checked first, giving the desired effect.
If you want the opposite, which is to include all modules in "foo.bar", you can simply give
dmd -i=foo.bar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code, I guess you were probably more wondering what the MatcherNode
is used for. The full pattern-matching algorithm uses an array of matcher nodes. This array of matcher nodes is a representation of all the patterns given to the compiler. Each "pattern" has a MatcherNode
that indicates how many components it has and whether it is an exclude pattern (via isExclude
) followed by a number of "Identifier" MatcherNode
's which provide the identifiers for the pattern.
For example, the following "-i=-foo.bar -i=foo" would become:
Index | Node |
---|---|
0 | MatcherNode(depth=2, isExclude=true) |
1 | MatcherNode(Id.foo) |
2 | MatcherNode(Id.bar) |
3 | MatcherNode(depth=1, isExclude=false) |
1 | MatcherNode(Id.foo) |
However I think you're main point is that this needs to be documented?
|
||
/* | ||
* $(D includeByDefault) determines whether to include/exclude modules when they don't | ||
* match any pattern. This setting changes depending on if the user provided any "inclusive" module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is meant by "pattern"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://dlang.org/dmd-windows.html#switch-i[
This behavior can be overriden by providing patterns via -i=. A pattern of the form -i= is an "inclusive pattern", whereas a pattern of the form -i=- is an "exclusive pattern". Inclusive patterns will include all module's whose names match the pattern, whereas exclusive patterns will exclude them. For example. all modules in the package foo.bar can be included using -i=foo.bar or excluded using -i=-foo.bar. Note that each component of the fully qualified name must match the pattern completely, so the pattern foo.bar would not match a module named foo.barx.
@WalterBright |
if (index == packages.dim) | ||
return name; | ||
else | ||
return Identifier.idPool("package"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably just return Id.package
, I'd be surprised if package
wasn't already a pre-allocated identifier.
Does this feature support caching of binaries indexed through hashing of inputs and all its (transitive) dependencies like |
No it doesn't do caching and there are no plans to add a build system to the compiler. |
An implementation of an idea I had (forum discussion here: http://forum.dlang.org/thread/tcrdpvqvwxffnewzohuj@forum.dlang.org)
I added a compiler option
-ci
which stands for "compile imports".When this option is enabled, the compiler will automatically treat imported modules as if they were specified on the command line so long as it doesn't think they exist in any library. For now it uses the same logic as rdmd to determine whether a module is in a library (see the
inALibrary
function), which is not full proof, but works well enough for a prototype. With this change you should now be able to omit all the imported module files from the command line, i.e.Instead of:
Use:
Should this feature be desirable, I recommend coming up with a more deterministic way of determining what modules exist in object/library files.
Also note that this feature would simplify rdmd and remove the need for it to invoke the compiler twice. This would fix the current performance problem with rdmd and would have even better performance than the previous solution (dlang/tools#194).