Skip to content
This repository was archived by the owner on Feb 20, 2020. It is now read-only.

Implement 1459203 (remove NSSM), 1436274 (use eventlog) #214

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

milescrabill
Copy link
Contributor

@milescrabill milescrabill commented Jul 8, 2019

There's a ton here. Too much!

Continued from #210

  • adds dependency on gopkg.in/natefinch/lumberjack.v2 for logfile management

  • removes NSSM and modifies docs, userdata in worker_types, etc. as such

  • changes the way the main RunWorker function is called for flexibility, it now accepts a signal chan

  • split out config handling and exit code handling into functions

  • adds debug logic for panic scenario exits including writing out a crash log in scenarios where other logging might be failing

  • support for installing/removing the generic worker service using native go

  • support for specifying configuration location at install time

  • support for windows event log when run as a service

  • logging to logfile by default in windows

  • remove need for a bat script wrapper to change the service working directory, cwd is set to parent dir of worker exe

Usage testing:

  • verify: starting the service in interactive mode
  • verify: starting the service as a service works properly
  • verify: under expected use, can ctrl-c to properly exit from interactive mode
    • caveat: not running as LocalUser sticks us into a loop that cannot be broken by interrupts!

TODO:

  • fix CI
  • actually write tests
  • niceify commits
  • test: install the service, verify the service exists, verify the event log source exists, remove the service
  • test: starting the service without the event log source in non-interactive mode should fail with exit code 78
  • test: adding a broken io.Writer to the logWriter MultiWriter will be caught (worth testing?)
  • should logfile location be configurable? (currently: generic-worker.log, rotated by lumberjack)
  • should logfile management be in scope or is that an infra concern?

Weird things:

  • running in interactive mode prints an extra blank newline in between log entries

@coveralls
Copy link

coveralls commented Jul 8, 2019

Coverage Status

Coverage increased (+0.03%) to 77.126% when pulling 5dff72c on no-more-nssm-merge-20190619 into 6737dbf on master.

Copy link
Member

@petemoore petemoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks awesome, thanks so much for all your hard work, especially your care and attention to detail, ensuring we have some solid tests, and configurable options where they make sense, over hardcoded values.

@@ -225,7 +225,8 @@ func execute(t *testing.T, expectedExitCode ExitCode) {
if err != nil {
t.Fatalf("Test setup failure - could not write to tasks-resolved-count.txt file: %v", err)
}
exitCode := RunWorker()
interruptChan := make(chan os.Signal, 1)
exitCode := RunWorker(interruptChan)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this allow us to Ctrl-C out of tests now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it does, yes. Were we not able to before?

I changed this to match the signature of the function, which I had changed before running the tests (yikes!) :P

simple_docker.go Outdated
func remove(arguments map[string]interface{}) (err error) {
return nil
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the remove target only needed on Windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I added this to match install() but I don't think either are needed now that we're doing things in platformTargets()

usage_windows.go Outdated
installed under. [default: Generic Worker]`
installed under. [default: Generic Worker]
--working-directory DIRECTORY The working directory the Generic Worker
service will use. [default: C:\Windows\system32]`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the worker writes files to the working directory, can we change the default to e.g. C:\generic-worker? Perhaps the correct drive letter can be determined from the system, in case the operating system isn't installed on C:.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would the behavior be if that directory doesn't exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to change the behavior here.

If no --working-directory is specified, we default to the parent directory of the generic-worker executable. If one is specified, we use it.

}
log.Printf("Successfully configured eventlog source %q.", name)

// start service manually in order to fail fast
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we create a machine image with generic-worker on it, we typically want to install it without starting it up (since it is just the machine image that real workers will use, and is not a real worker itself), so I think we don't want to start it up here, otherwise it would run on the machine which is not a worker, but it used to create the machine image for the actual workers.

var logWriter = io.MultiWriter(ioutil.Discard)

func init() {
// TODO make this work with --working-directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is done now, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This always changes the working directiry to the parent directory of the generic worker binary.

@@ -282,6 +284,8 @@ func removeService(name string) error {
return fmt.Errorf("service %s is not installed", name)
}
defer s.Close()
// throw away this error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note to say why we ignore the error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails it's essentially because something went wrong interacting with the service, i.e. it's not running, it didn't respond, etc. We don't care about an error in this scenario, we're just trying to delete it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Could you include that as a code comment?

// but not actually removed until reboot

// TODO
cleanupService(t, name)
Copy link
Member

@petemoore petemoore Jul 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What made you decide to move cleanupService out of the defer block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented out blocks below, if uncommented, would happen before a deferred call to cleanupService()

os.Chdir(cwd)
// default to generic-worker executable parent dir
if cwd == "" {
dir := path.Dir(os.Args[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dir := path.Dir(os.Args[0])
dir := filepath.Dir(os.Args[0])

@@ -62,7 +62,7 @@ func platformCommandLineParameters() string {
--service-name SERVICE-NAME The name that the Windows service should be
installed under. [default: Generic Worker]
--working-directory DIRECTORY The working directory the Generic Worker
service will use. [default: C:\Windows\system32]`
service will use.`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we determine the system default at runtime and include it in the usage text?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think defaulting to the parent directory of the generic worker binary is a reasonable default. It's also trivial to implement in a cross platform manner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, but we should describe what the default is in the docs.

@petemoore petemoore force-pushed the no-more-nssm-merge-20190619 branch from 012d1ce to ef04a9b Compare July 12, 2019 08:09
Miles Crabill added 12 commits July 12, 2019 11:52
build.cmd: use multiuser build tag
keep existing gitattributes directive
add TestRunServiceWithoutEventlogSource, TestRunServiceWithBrokenWriter, use SKIP_ADMINISTRATOR_TESTS env var to skip service install/remove tests
…s locally

rewrite runservice to return ExitCodes properly
use io.MultiWriter() instead of ioutil.Discard
@milescrabill milescrabill force-pushed the no-more-nssm-merge-20190619 branch from 20b84d8 to 518722b Compare July 12, 2019 18:54
@milescrabill milescrabill marked this pull request as ready for review July 16, 2019 14:54
@milescrabill milescrabill changed the title WIP: Implement 1459203 (remove NSSM), 1436274 (use eventlog) Implement 1459203 (remove NSSM), 1436274 (use eventlog) Jul 16, 2019
@milescrabill milescrabill requested a review from a team July 16, 2019 14:54
Copy link
Member

@petemoore petemoore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Miles,

I had a play with the Windows binaries today, and found some issues:

  1. The generic-worker creates a log file even when you run generic-worker --help. The Windows service should direct stderr/stdout to a log file, but when generic-worker is invoked any other way, it should write to stdout/stderr without redirecting them somewhere else.

  2. The remove service subcommand is not documented in usage docs. Please include some help text.

  3. Same for run-service.

  4. Same for the crash file.

  5. When I had a crash, the crash file content said Error loading configuration file %v. I suspect this might just be a log.Print somewhere that should be a log.Printf.

  6. I'm not sure we have a use case for calling run-service from the command line with arguments (such as --config). My understanding was that running generic-worker install service ... would install the worker as a Windows Service, and from that point on, the service is configured, so doesn't need options to be passed to it. The user would just start it or stop it. I thought running generic-worker run-service would be equivalent to running net start "Generic Worker".

    I think we can get away entirely without having a run-service subcommand, because net start "Generic Worker" is already available, and then the Windows service would invoke the run subcommand, running as LocalSystem, and redirect stdout/stderr to the log file. The run subcommand can test to see whether it is being run interactively or not, if it is necessary for switching the logging mode, etc. Alternatively, you could add a --log LOG-FILE option to run that gets passed when invoked as a Windows service.

  7. generic-worker run isn't logging to stdout/stderr when run interactively.

  8. run-service is named inconsistently with remove service and install service which use separate words (rather than a hyphen), but this issue should disappear when we remove run-service and adapt run instead (see point 6).

  9. generic-worker install service is also directed to a log rather than stderr/stdout.

  10. I get a panic running the install step as Administrator. If the panic is due to missing files or anything else, we should try to report that and avoid crashing the worker:

C:\Users\Peter James Moore\Desktop>.\generic-worker-windows-386.exe install service
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x8 pc=0x7a023c]

goroutine 1 [running]:
github.com/taskcluster/generic-worker/vendor/golang.org/x/sys/windows/svc/mgr.(*Service).Start(0x0, 0x1265ef60, 0x3, 0x4, 0x1, 0x0)
        Z:/task_1563291642/gopath1.10.8/src/github.com/taskcluster/generic-worker/vendor/golang.org/x/sys/windows/svc/mgr/service.go:45 +0xcc
main.deployService(0x126378a9, 0x15, 0x124cd32b, 0xe, 0x126ac190, 0x41, 0xaf000000, 0x0, 0x0)
        Z:/task_1563291642/gopath1.10.8/src/github.com/taskcluster/generic-worker/multiuser_service_windows.go:206 +0x334
main.install(0x1265ef40, 0x1265ef40, 0x8f7dda)
        Z:/task_1563291642/gopath1.10.8/src/github.com/taskcluster/generic-worker/multiuser_windows.go:242 +0x25a
main.platformTargets(0x1265ef40, 0x1265ef40)
        Z:/task_1563291642/gopath1.10.8/src/github.com/taskcluster/generic-worker/multiuser_windows.go:428 +0x6db
main.main()
        Z:/task_1563291642/gopath1.10.8/src/github.com/taskcluster/generic-worker/main.go:123 +0x1b5
  1. If the generic-worker binary is being installed to a different location than it currently resides (e.g. if it is downloaded to Downloads or Desktop etc) then the binary should probably be copied to the target destination folder as part of the installation process.

    Admittedly we didn't do this before, so this is minor scope creep. I think it would make sense to tackle this at the same time though. If we don't do this, we should at least highlight in the usage docs, that the Windows service will be configured to use the generic-worker binary from its current location, so the generic-worker binary should be moved to its desired installation location before running the install subcommand.

Thanks for all your hard work so far on this so far. It is looking really good!

@petemoore
Copy link
Member

@milescrabill This has kind of bitrotted. Should we close it, or do you want to return to it?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants