Skip to content

[MLA-1540] Training Analytics #4780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Jan 28, 2021
Merged

[MLA-1540] Training Analytics #4780

merged 29 commits into from
Jan 28, 2021

Conversation

chriselion
Copy link
Contributor

@chriselion chriselion commented Dec 19, 2020

Proposed change(s)

Adds analytics events for training. We will send 3 events:

  • 1 for info collected on the Unity side for each Behavior being trained, with information like ActionSpec and ObservationSpecs.
  • 1 for info collected on the python side for each Behavior being trained, with information like PPO/SAC, curriculum, etc.
  • 1 for info collected on the python side for "global" information like python version, torch version, etc.

The reason for splitting the two per-behavior events is so that we can backport the first to the verified branch and get some data from there too (without needing to rely on trainer upgrades).

Data is sent from python to the Unity executable, from there we'll send it through EditorAnalytics so the Unity Privacy Policy will apply.

Example events

These are from one of the behaviors from the WallJump scene with curriculum.

ml_agents_training_environment_initialized

 {
    "TrainingSessionGuid": "6a709347-d170-4b6d-90b9-f74f76c9df27",
    "TrainerPythonVersion": "3.8.7",
    "MLAgentsVersion": "0.24.0.dev0",
    "MLAgentsEnvsVersion": "0.24.0.dev0",
    "TorchVersion": "1.7.1",
    "TorchDeviceType": "cpu",
    "NumEnvironments": 1,
    "NumEnvironmentParameters": 2
}

ml_agents_remote_policy_initialized

{
    "TrainingSessionGuid": "6a709347-d170-4b6d-90b9-f74f76c9df27",
    "BehaviorName": "df93fe80efe2beadbbb54f3a4d7eee25",
    "ObservationSpecs": [
        {
            "SensorName": "StackingSensor_size6_OffsetRayPerceptionSensor",
            "CompressionType": "None",
            "DimensionInfos": [
                {
                    "Size": 210,
                    "Flags": 0
                }
            ]
        },
        {
            "SensorName": "StackingSensor_size6_RayPerceptionSensor",
            "CompressionType": "None",
            "DimensionInfos": [
                {
                    "Size": 210,
                    "Flags": 0
                }
            ]
        },
        {
            "SensorName": "StackingSensor_size6_VectorSensor_size4",
            "CompressionType": "None",
            "DimensionInfos": [
                {
                    "Size": 24,
                    "Flags": 0
                }
            ]
        }
    ],
    "ActionSpec": {
        "NumContinuousActions": 0,
        "NumDiscreteActions": 4,
        "BranchSizes": [
            3,
            3,
            3,
            2
        ]
    },
    "MLAgentsEnvsVersion": "0.24.0.dev0",
    "TrainerCommunicationVersion": "1.3.0"
}

ml_agents_training_behavior_initialized

{
    "TrainingSessionGuid": "6a709347-d170-4b6d-90b9-f74f76c9df27",
    "BehaviorName": "df93fe80efe2beadbbb54f3a4d7eee25",
    "TrainerType": "ppo",
    "RewardSignalFlags": 1,
    "TrainingFeatureFlags": 20,
    "VisualEncoder": "simple",
    "NumNetworkLayers": 2,
    "NumNetworkHiddenUnits": 256
}

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

  • New feature

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)

Other comments

var eventName = msg.ReadString();
if (!string.IsNullOrEmpty(eventName))
{
Debug.Log(eventName);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed that this gets

environment_initialized!

and

training_started for 3DBall with config TrainerSettings(trainer_type=<TrainerType.PPO: 'ppo'>, ...)!

when using 3DBall. This is obviously placeholder; the plan is to add a protobuf message and send that over the SideChannel instead.

@chriselion chriselion marked this pull request as ready for review January 12, 2021 22:33
var unityCommunicationVersion = initParameters.unityCommunicationVersion;

TrainingAnalytics.SetTrainerInformation(pythonPackageVersion, pythonCommunicationVersion);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this, but it seems liked the best way to push this info to the analytics code for later use. We could potentially "pull" it from here in analytics, but that seems more brittle.

/// Hash a string to remove PII or secret info before sending to analytics
/// </summary>
/// <param name="s"></param>
/// <returns></returns>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this cause a warning during validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add some actual content there. It's internal, though, so probably not.

Copy link
Contributor

@surfnerd surfnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think someone more familiar with the python end should also approve.
For the C# Trainging analytics side-channel tests: are they there just to show that they don't throw an exception?

worker_id, [env_parameters, engine_configuration_channel, stats_channel]
)
side_channels = [env_parameters, engine_configuration_channel, stats_channel]
if training_analytics_channel is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this line after line 161 (after we check if the environment can support training_analytics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it's a bit convoluted. But we don't currently have a way to add a SideChannel after the BaseEnv has been created, and we can't get the capabilities until after we've created the environment. So this always adds the sidechannel (for worker==0) but then doesn't use it if the capabilities don't support it.

Do you think it's worth adding an interface to BaseEnv to add SideChannels after creation? Note that modifying the side_channels list after creation won't have any effect, because the list is turned into an ID->SideChannel mapping in

Copy link
Contributor

@dongruoping dongruoping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the OS information already somewhere in the analytics?

@chriselion
Copy link
Contributor Author

@dongruoping good call, we can add sys.platform if the equivalent isn't already there. Let me check what gets automatically added.

@chriselion
Copy link
Contributor Author

Actually, looks like we already have that in the "base" event.

@chriselion chriselion changed the title [WIP] [MLA-1540] Training Analytics [MLA-1540] Training Analytics Jan 28, 2021
@chriselion chriselion merged commit 82b7602 into master Jan 28, 2021
@delete-merged-branch delete-merged-branch bot deleted the MLA-1540-training-analytics branch January 28, 2021 23:24
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants