Skip to content

Enable automatic compiling of imports #7099

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 15, 2018

Conversation

marler8997
Copy link
Contributor

@marler8997 marler8997 commented Aug 25, 2017

An implementation of an idea I had (forum discussion here: http://forum.dlang.org/thread/tcrdpvqvwxffnewzohuj@forum.dlang.org)

I added a compiler option -ci which stands for "compile imports".

When this option is enabled, the compiler will automatically treat imported modules as if they were specified on the command line so long as it doesn't think they exist in any library. For now it uses the same logic as rdmd to determine whether a module is in a library (see the inALibrary function), which is not full proof, but works well enough for a prototype. With this change you should now be able to omit all the imported module files from the command line, i.e.

Instead of:

dmd prog.d -Isomelib somelib\foo\module1.d somelib\foo\module2.d somelib\foo\module3.d somelib\foo\module4.d somelib\foo\module5.d somelib\foo\module6.d -Ianotherlib anotherlib\bar\module1.d anotherlib\bar\module2.d anotherlib\bar\module3.d anotherlib\bar\module4.d anotherlib\bar\module5.d

Use:

dmd -ci prog.d -Isomelib -Ianotherlib

Should this feature be desirable, I recommend coming up with a more deterministic way of determining what modules exist in object/library files.

Also note that this feature would simplify rdmd and remove the need for it to invoke the compiler twice. This would fix the current performance problem with rdmd and would have even better performance than the previous solution (dlang/tools#194).

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @marler8997! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.

Some tips to help speed things up:

  • smaller, focused PRs are easier to review than big ones

  • try not to mix up refactoring or style changes with bug fixes or feature enhancements

  • provide helpful commit messages explaining the rationale behind each change

Bear in mind that large or tricky changes may require multiple rounds of review and revision.

Please see CONTRIBUTING.md for more information.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

@@ -74,6 +74,7 @@ struct Param
{
bool obj; // write object file
bool link; // perform link
bool compileimports; // compile imported modules that have not been passed in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The D modules and the C++ headers need to be kept in sync. You need to update src/ddmd/globals.h too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks for catching that.

@PetarKirov
Copy link
Member

PetarKirov commented Aug 25, 2017

This branch has conflicts that must be resolved

Looks like you need to rebase your branch on top of the latest master. Most probably, this refactoring is the cause for the git conflict. I think you need to move the parsing of the cmd-line to the new parseCommandLine function.

@marler8997 marler8997 force-pushed the compileimports branch 2 times, most recently from dd91cc1 to ce11470 Compare August 25, 2017 13:36
Copy link
Member

@wilzbach wilzbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor nits. Do you need help adding test cases?

@@ -381,6 +381,7 @@ extern (C++) final class Module : Package
// i.e. a module that will be taken all the
// way to an object file
Module importedFrom;
bool compiledImport;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A documentation comment wouldn't hurt

src/ddmd/mars.d Outdated
{
auto importedModulesToCompile = Modules();
fprintf(global.stdmsg, "compileimports: start\n");
for (size_t i = 0; i < Module.amodules.dim; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't be afraid of using D features - the current codebase just doesn't in many places because it was automatically transpiled from C++

foreach(i; 0 .. Module.amodules.dim)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a search in all of ddmd and found like 1 or 2 instances of foreach which really surprised me. Good to know it's ok to use the D feature in this case.

src/ddmd/mars.d Outdated
continue;
}
fprintf(global.stdmsg, "compileimports: compiling \"%s\" (file \"%s\" importedFrom \"%s\")\n",
m.toChars(), m.srcfile.toChars, m.importedFrom.toChars());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Phobos the coding style is to avoid unnecessary parentheses, for DMD there isn't a style guide yet, but it's probably a good idea to be consistent within the same line

@JohanEngelen
Copy link
Contributor

I added a compiler option -ci which stands for "compile imports".

So please name it -compile-imports.

@marler8997
Copy link
Contributor Author

@wilzbach I was planning on adding test cases when I got some response from a maintainer that this was a desirable feature. If you'd like to help guide/contribute to any cases then advice and PR's are very welcome (Note I'm not that familiar with the test infrastructure for dmd).

@marler8997 marler8997 force-pushed the compileimports branch 5 times, most recently from 55dec74 to 5730200 Compare August 26, 2017 13:50
@ibuclaw
Copy link
Member

ibuclaw commented Sep 28, 2017

Isn't this what rdmd is for?

@marler8997
Copy link
Contributor Author

@ibuclaw It's part of what rdmd does (another big thing rdmd does is automatically cache output files, which the compiler does not do). This PR moves that part of rdmd into the compler itself. One big advantage of this is it solves the performance problems we have with rdmd since it currently has to invoke the compiler twice, once to get the imports and once to compile with them. This feature allows rdmd to tell the compiler to just include the missing imports, no need to invoke the compiler again. It also makes alot of other use cases much simpler when building programs without rdmd.

@JinShil
Copy link
Contributor

JinShil commented Dec 2, 2017

@andralex and @WalterBright
I think we can carry something like this across the finish line, but we need to hear from you if it is something you want to have and support. Are DIPs required for simple things like this (additional compiler switches)? This isn't really a language change.

@marler8997
A thorough rationale statement would help @andralex and @WalterBright make a decision.

@ibuclaw
Copy link
Member

ibuclaw commented Dec 2, 2017

I think that the implementation should be part of the driver, not the frontend. All changes should be moved to mars.d

@andralex
Copy link
Member

andralex commented Dec 2, 2017

There must be a clear case for impact, i.e. to what extent this feature is superior and desirable to rdmd. Alternatively, it should be clarified how this tool could simplify rdmd to eliminate redundant logic. As things are, this feature competes with a subset of rdmd while still making it necessary.

@marler8997
Copy link
Contributor Author

@andralex

There must be a clear case for impact, i.e. to what extent this feature is superior and desirable to rdmd.

The impact to rdmd is that it doesn't have to invoke the compiler twice. However this feature isn't just about rdmd, it's also about making compilation simpler by invoking the DRY principle. Instead of maintaining a list of modules to compile as well as a list of imports, you use the imports themselves to determine what needs to be compiled. A side effect of this is that rdmd can use this feature instead of invoking the compiler twice but I wouldn't say that is the primary benefit.

@andralex
Copy link
Member

andralex commented Dec 2, 2017

The impact to rdmd is that it doesn't have to invoke the compiler twice.

How long does the first invocation take?

However this feature isn't just about rdmd, it's also about making compilation simpler by invoking the DRY principle.

That's also the charter of rdmd.

@wilzbach
Copy link
Member

wilzbach commented Dec 2, 2017

How long does the first invocation take?

For a single file, it's about 40% of the runtime.
We already had this discussion last summer (see e.g. dlang/tools#191 or dlang/tools#194). In fact, at that point we merged a PR that avoids the double invocation for the trivial case of having no dependencies (this PR was reverted later due to a regression and no one having time to look into it). Anyhow to quote from the last discussion:

The correct solution would be to move rdmd into dmd. IIRC Andrej attempted this a while ago, but hit a snag.

So while this PR isn't ready for prime time yet, I do think that the idea itself is a worthwhile goal and could become quite powerful due to the performance savings.

CCing the people who have been involved in the last discussion: @CyberShadow @AndrejMitrovic @aG0aep6G @mihails-strasuns

@mihails-strasuns
Copy link

AFAIR main purpose of rdmd was to support efficient shebang script usage (with caching resulting binary instead of recompilation each time it used). Compilation of dependencies came as a necessary requirement to make it work, not as primary functionality. I agree that it makes more sense to put it in compiler while focusing rdmd on its original purpose.

@CyberShadow
Copy link
Member

Thanks for working on this, @marler8997! I agree that this is a logical change that should have been made long ago, and it will improve the quality of life for all D developers not using Dub to build their code.

One point in previous discussions is that there should be some way to specify exclusions, for packages that are going to be linked in as libraries and don't need to be complied again. This applies to Phobos/Druntime (std.*, core.* and etc.*), but ought to be customizable.

@CyberShadow
Copy link
Member

So please name it -compile-imports.

Disagree. The addition of this option has the potential to obsolete rdmd, and it is usable directly as well as by other build tools, so a short switch would be a boon for this functionality.

@marler8997
Copy link
Contributor Author

@CyberShadow

Thanks for chiming in and showing support. I had the same thoughts about exclusions. Another thought I had was when you enable "compile imports", the compiler could read the object/library files and figure out which modules they include based on the existence of a particular symbol. I believe there is a special symbol that each module generates, something like "ModuleInfo". This would add overhead to the compiler to read those files, not sure how much of an impact it would have on performance, but this is the most robus solution I've come up with.

@CyberShadow
Copy link
Member

This would add overhead to the compiler to read those files, not sure how much of an impact it would have on performance, but this is the most robus solution I've come up with.

Wouldn't this mean that the compiler would need to read and parse the object/library files, rather than just passing them to the linker? Sounds like a lot of work.

@ibuclaw
Copy link
Member

ibuclaw commented Jan 14, 2018

On the high level aspect, I'm not particularly a big fan. I really don't like the idea of conflating compiler and build system.

On technical, fine. Its isolated enough, though I have alluded to Compiler.loadModule a couple of times. However there are things that need to be sorted out before that callback is usable, so there's no reason to block on that (just an FYI as a potential future refactor).

@wilzbach
Copy link
Member

(closed & re-opened, s.t. the newly added SemaphoreCI can find this PR)

@MartinNowak
Copy link
Member

The approach to push more modules to semantic3 might easily introduce subtle bugs, e.g. by breaking assumptions about a fixed set of root modules or by missing semantic steps.
Let's try to test this feature enough so that we can find those before releasing it.

@marler8997
Copy link
Contributor Author

Most of the work done for this feature was in the dmd test suite. The test suite was modified to allow a test to be run twice, with and without -i. That being said, I have set the "isRoot" property before any other code checks it. It's done when constructing the module, just after it is parsed. It was also important to do this before any semantic analysis was done as I discovered that some of the semantic analysis runs differently depending on if the module is a root module.

Furthermore, the list of root modules is a local variable in mars.d. Every usage of it is in that file so checking whether or not someone could be making assumptions about it is limited to verifying mars.d which I carefully attempted to do.

Also, performing extra semantic analysis on more modules than just the initial ones passed on the command line is already being done when the -deps flag is enabled. It doesn't add new root modules but it does cover some of that test surface.

@marler8997 marler8997 deleted the compileimports branch January 27, 2018 09:54
// array of module patterns used to include/exclude imported modules
private __gshared Array!(const(char)*) includeModulePatterns;
private __gshared Modules compiledImports;
private extern(C++) bool marsOnImport(Module m)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we approving things with no Ddoc function headers?

@@ -2125,3 +2168,278 @@ private bool parseCommandLine(const ref Strings arguments, const size_t argc, re
return errors;
}


private __gshared bool includeImports = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This random assortment of undocumented globals is more technical debt.

Copy link
Contributor Author

@marler8997 marler8997 Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This boolean and array represent all the data we need to know "compile imports" is enabled and all the patterns that were given. Are you wanting them to be put in a struct as static variables? What's the "debt" you're seeing that should be alleviated?

@@ -2125,3 +2168,278 @@ private bool parseCommandLine(const ref Strings arguments, const size_t argc, re
return errors;
}


private __gshared bool includeImports = false;
// array of module patterns used to include/exclude imported modules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what a "module pattern" is.

Copy link
Contributor Author

@marler8997 marler8997 Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A module pattern is a sequence of identifiers that are matched against the first part of a any module name. I tried to clearly define it in the associated documentation https://dlang.org/dmd-windows.html#switch-i[

if (index < packages.dim)
return (*packages)[index];
if (index == packages.dim)
return name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's going on here? There's no mention of this in the comments.


/*
* Determines if the given module should be included in the compilation.
* Returns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can a ModuleComponentRange be considered a "given module" ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Params: section.

}

// Matching module names is done with an array of matcher nodes.
// The nodes are sorted by "component depth" from largest to smallest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "component depth"?

Copy link
Contributor Author

@marler8997 marler8997 Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "depth" is the number of components in a module name and/or pattern (i.e. foo.bar.baz has 3 components, so a depth of 3). It's been so long I'm not sure why the term "depth" was chosen.

struct
{
ushort depth;
bool isExclude;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does isExclude for?

Copy link
Contributor Author

@marler8997 marler8997 Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isExclude indicates whether the pattern is an "exclusive" (i.e. -i=-foo.bar) pattern, vs an "inclusive" one (i.e. -i=foo.bar). The documentation will explain better, but in essence, patterns are prioritized by their component length and are either "inclusive" or "exclusive". Say you want to include all modules from package foo except for all modules in the foo.bar sub-path.

dmd -i=foo -i=-foo.bar

Then foo would be an inclusive pattern and -i=-foo.bar would be an exclusive pattern. Since the algorithm works by sorting these "module patterns" by their component length, the "foo.bar" pattern will be checked first, giving the desired effect.

If you want the opposite, which is to include all modules in "foo.bar", you can simply give

dmd -i=foo.bar

Copy link
Contributor Author

@marler8997 marler8997 Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, I guess you were probably more wondering what the MatcherNode is used for. The full pattern-matching algorithm uses an array of matcher nodes. This array of matcher nodes is a representation of all the patterns given to the compiler. Each "pattern" has a MatcherNode that indicates how many components it has and whether it is an exclude pattern (via isExclude) followed by a number of "Identifier" MatcherNode's which provide the identifiers for the pattern.

For example, the following "-i=-foo.bar -i=foo" would become:

Index Node
0 MatcherNode(depth=2, isExclude=true)
1 MatcherNode(Id.foo)
2 MatcherNode(Id.bar)
3 MatcherNode(depth=1, isExclude=false)
1 MatcherNode(Id.foo)

However I think you're main point is that this needs to be documented?


/*
* $(D includeByDefault) determines whether to include/exclude modules when they don't
* match any pattern. This setting changes depending on if the user provided any "inclusive" module
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is meant by "pattern"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://dlang.org/dmd-windows.html#switch-i[

This behavior can be overriden by providing patterns via -i=. A pattern of the form -i= is an "inclusive pattern", whereas a pattern of the form -i=- is an "exclusive pattern". Inclusive patterns will include all module's whose names match the pattern, whereas exclusive patterns will exclude them. For example. all modules in the package foo.bar can be included using -i=foo.bar or excluded using -i=-foo.bar. Note that each component of the fully qualified name must match the pattern completely, so the pattern foo.bar would not match a module named foo.barx.

@marler8997
Copy link
Contributor Author

@WalterBright
It sounds like most of your comments are about lack of comments. If you like I can go through this code and create a PR to fill out any missing comments?

if (index == packages.dim)
return name;
else
return Identifier.idPool("package");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably just return Id.package, I'd be surprised if package wasn't already a pre-allocated identifier.

@nordlow
Copy link
Contributor

nordlow commented Jun 16, 2020

Does this feature support caching of binaries indexed through hashing of inputs and all its (transitive) dependencies like rdmd does? If not are there any plans for doing this?

@wilzbach
Copy link
Member

No it doesn't do caching and there are no plans to add a build system to the compiler.
If you want a better rdmd, you should have a look at https://github.com/dragon-lang/rund

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.