Enable training with Tensorboard tracking #209

marcobellagente93 · 2023-01-22T14:02:46Z

Currently only wandb logging is supported, and using tensorboard will result in a number of small blockers.

This small PR removes the following blockers:

a logging_dir is added to the training config to be used by trackers needing a local folder (e.g. tensorboard)
an assertion is places before calling init_trackers to check the specified tracker
a flattened config is used for Tensorboard logging
Optional slight renaming (trackers -> tracker) and interface change assuming a single tracker is used

Disclaimers

Tensorboard logging can't take nested dict. the provided flattened dict removes one level to log most but not all hyperparameters specified in the config
There seems to be some weird behavior in Tensorboard with hps. They can be downloaded (with the correct values) but not displayed, see also: misleading "No hparams data was found" error when metrics are not defined tensorflow/tensorboard#5476

CC @daia99

Co-authored-by: Andrew <daia99@users.noreply.github.com>

daia99 · 2023-01-22T18:54:13Z

We may want to remove the tensorboard files for the PR!

marcobellagente93 · 2023-01-22T18:58:43Z

We may want to remove the tensorboard files for the PR!

woops, got it

cat-state · 2023-01-22T19:16:35Z

trlx/trainer/accelerate_base_trainer.py

@@ -78,19 +78,32 @@ def __init__(self, config, **kwargs):
            dist_config = get_distributed_config(self.accelerator)
            config_dict["distributed"] = dist_config
            init_trackers_kwargs = {}
-            if "wandb" in config.train.trackers:
+            # HACK: Tensorboard doesn't like nested dict as hyperparams
+            config_dict_flat = {a:b for (k,v) in config_dict.items() for (a,b) in v.items() if not isinstance(b, dict)}


You can use trlx.utils.modeling.flatten_dict here

thanks for the suggestion! I replaced the dict comprehension with a call to flatten_dict(), since tensorboard also doesn't like lists I added a couple lines to split the optimizer betas

cat-state · 2023-01-23T16:37:26Z

trlx/trainer/accelerate_base_trainer.py

+                    config=config_dict,
+                    init_kwargs=init_trackers_kwargs,
+                )
+            else:


Thanks!
Could you add back a comment explaining what this branch is for and the flattening? Aside from that it LGTM

Sure!

I had run into problems running trlx without wandb (I don't have an account as of now), and found an already opened issue precisely on this. The branch has minor modifications (which don't change the previous interface for wandb users) to allow tensorboard tracking

The only tricky part is that wandb is pretty fancy and take nested dicts as logging params, this is not the case for tensorboard, hence the experiment config is fully flattened, and the only list is simply split apart (for the same reason)
Do let me know if anything is not clear or if I should add comments in the tensorboard specific logging

Oh sorry, by branch I meant the else branch - i.e a short comment like
else: # tracker == 'tensorboard' and # flatten config for tensorboard, flatten lists in hparams into flattened config

Got it, I was reading the comment in between my regular work and got fully confused, thanks for the feedback

trlx/trainer/accelerate_base_trainer.py

cat-state · 2023-01-24T21:15:26Z

Thanks for contributing! This LGTM now.

This bug was introduced by CarperAI#209, which changed the `trackers` config property to `tracker` but didn't update this use case.

This bug was introduced by #209, which changed the `trackers` config property to `tracker` but didn't update this use case.

Marco Bellagente and others added 8 commits January 21, 2023 21:39

pass logging_dir from config

01f7ba0

Co-authored-by: Andrew <daia99@users.noreply.github.com>

test tensorboard tracker

76e82e2

single tracker string in config

848a80b

specify ini_trackers_kwargs for wandb only

52e1a30

added workaround to avoid nested dicts with tensorboard

63180a3

tracker and logging_dir config values optional

650b986

validate tracker before init_trackers

343e2e2

merged 2 lines for flat logging dir

86f3391

marcobellagente93 mentioned this pull request Jan 22, 2023

Bug when trying to log with other trackers that not wandb #207

Closed

removed accidentally commited tensorboard files

fcb49fe

marcobellagente93 closed this Jan 22, 2023

marcobellagente93 reopened this Jan 22, 2023

cat-state reviewed Jan 22, 2023

View reviewed changes

Marco Bellagente added 2 commits January 23, 2023 08:48

Formatting

342bc0c

use flatten_dict to log with tensorboard

7650a13

cat-state reviewed Jan 23, 2023

View reviewed changes

added back comments

1574227

cat-state requested changes Jan 24, 2023

View reviewed changes

trlx/trainer/accelerate_base_trainer.py Outdated Show resolved Hide resolved

fixed typo

1e2beeb

cat-state approved these changes Jan 24, 2023

View reviewed changes

cat-state merged commit 82435d8 into CarperAI:main Jan 24, 2023

alan-cooney added a commit to skyhookadventure/trlx that referenced this pull request Jan 25, 2023

Fix undefined trackers property

665d6e0

This bug was introduced by CarperAI#209, which changed the `trackers` config property to `tracker` but didn't update this use case.

alan-cooney mentioned this pull request Jan 25, 2023

Fix undefined trackers property #224

Merged

jon-tow pushed a commit that referenced this pull request Jan 25, 2023

Fix undefined trackers property (#224)

ce5bf57

This bug was introduced by #209, which changed the `trackers` config property to `tracker` but didn't update this use case.

jon-tow mentioned this pull request Jan 26, 2023

Make experiment tracking optional #226

Merged

marcobellagente93 deleted the enable-tensorboard branch January 29, 2023 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable training with Tensorboard tracking #209

Enable training with Tensorboard tracking #209

marcobellagente93 commented Jan 22, 2023 •

edited

Loading

daia99 commented Jan 22, 2023

marcobellagente93 commented Jan 22, 2023 •

edited

Loading

cat-state Jan 22, 2023

marcobellagente93 Jan 23, 2023

cat-state Jan 23, 2023 •

edited

Loading

marcobellagente93 Jan 23, 2023

cat-state Jan 23, 2023

marcobellagente93 Jan 23, 2023

cat-state commented Jan 24, 2023

Enable training with Tensorboard tracking #209

Enable training with Tensorboard tracking #209

Conversation

marcobellagente93 commented Jan 22, 2023 • edited Loading

daia99 commented Jan 22, 2023

marcobellagente93 commented Jan 22, 2023 • edited Loading

cat-state Jan 22, 2023

Choose a reason for hiding this comment

marcobellagente93 Jan 23, 2023

Choose a reason for hiding this comment

cat-state Jan 23, 2023 • edited Loading

Choose a reason for hiding this comment

marcobellagente93 Jan 23, 2023

Choose a reason for hiding this comment

cat-state Jan 23, 2023

Choose a reason for hiding this comment

marcobellagente93 Jan 23, 2023

Choose a reason for hiding this comment

cat-state commented Jan 24, 2023

marcobellagente93 commented Jan 22, 2023 •

edited

Loading

marcobellagente93 commented Jan 22, 2023 •

edited

Loading

cat-state Jan 23, 2023 •

edited

Loading