[MLA-1540] Training Analytics #4780

chriselion · 2020-12-19T02:11:39Z

Proposed change(s)

Adds analytics events for training. We will send 3 events:

1 for info collected on the Unity side for each Behavior being trained, with information like ActionSpec and ObservationSpecs.
1 for info collected on the python side for each Behavior being trained, with information like PPO/SAC, curriculum, etc.
1 for info collected on the python side for "global" information like python version, torch version, etc.

The reason for splitting the two per-behavior events is so that we can backport the first to the verified branch and get some data from there too (without needing to rely on trainer upgrades).

Data is sent from python to the Unity executable, from there we'll send it through EditorAnalytics so the Unity Privacy Policy will apply.

Example events

These are from one of the behaviors from the WallJump scene with curriculum.

ml_agents_training_environment_initialized

 {
    "TrainingSessionGuid": "6a709347-d170-4b6d-90b9-f74f76c9df27",
    "TrainerPythonVersion": "3.8.7",
    "MLAgentsVersion": "0.24.0.dev0",
    "MLAgentsEnvsVersion": "0.24.0.dev0",
    "TorchVersion": "1.7.1",
    "TorchDeviceType": "cpu",
    "NumEnvironments": 1,
    "NumEnvironmentParameters": 2
}

ml_agents_remote_policy_initialized

{
    "TrainingSessionGuid": "6a709347-d170-4b6d-90b9-f74f76c9df27",
    "BehaviorName": "df93fe80efe2beadbbb54f3a4d7eee25",
    "ObservationSpecs": [
        {
            "SensorName": "StackingSensor_size6_OffsetRayPerceptionSensor",
            "CompressionType": "None",
            "DimensionInfos": [
                {
                    "Size": 210,
                    "Flags": 0
                }
            ]
        },
        {
            "SensorName": "StackingSensor_size6_RayPerceptionSensor",
            "CompressionType": "None",
            "DimensionInfos": [
                {
                    "Size": 210,
                    "Flags": 0
                }
            ]
        },
        {
            "SensorName": "StackingSensor_size6_VectorSensor_size4",
            "CompressionType": "None",
            "DimensionInfos": [
                {
                    "Size": 24,
                    "Flags": 0
                }
            ]
        }
    ],
    "ActionSpec": {
        "NumContinuousActions": 0,
        "NumDiscreteActions": 4,
        "BranchSizes": [
            3,
            3,
            3,
            2
        ]
    },
    "MLAgentsEnvsVersion": "0.24.0.dev0",
    "TrainerCommunicationVersion": "1.3.0"
}

ml_agents_training_behavior_initialized

{
    "TrainingSessionGuid": "6a709347-d170-4b6d-90b9-f74f76c9df27",
    "BehaviorName": "df93fe80efe2beadbbb54f3a4d7eee25",
    "TrainerType": "ppo",
    "RewardSignalFlags": 1,
    "TrainingFeatureFlags": 20,
    "VisualEncoder": "simple",
    "NumNetworkLayers": 2,
    "NumNetworkHiddenUnits": 256
}

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

New feature

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)

Other comments

…rsion, etc

chriselion · 2020-12-19T02:14:18Z

com.unity.ml-agents/Runtime/SideChannels/TrainingAnalyticsSideChannel.cs

+            var eventName = msg.ReadString();
+            if (!string.IsNullOrEmpty(eventName))
+            {
+                Debug.Log(eventName);


Confirmed that this gets

environment_initialized!

and

training_started for 3DBall with config TrainerSettings(trainer_type=<TrainerType.PPO: 'ppo'>, ...)!

when using 3DBall. This is obviously placeholder; the plan is to add a protobuf message and send that over the SideChannel instead.

ml-agents/mlagents/trainers/env_manager.py

ml-agents/mlagents/trainers/subprocess_env_manager.py

ml-agents/mlagents/training_analytics_side_channel.py

com.unity.ml-agents/Runtime/Analytics/TrainingAnalytics.cs

com.unity.ml-agents/Runtime/Policies/RemotePolicy.cs

ml-agents/mlagents/training_analytics_side_channel.py

Project/ProjectSettings/ProjectVersion.txt

com.unity.ml-agents/Runtime/Analytics/TrainingAnalytics.cs

…nalytics

chriselion · 2021-01-14T00:34:44Z

com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs

                var unityCommunicationVersion = initParameters.unityCommunicationVersion;

+                TrainingAnalytics.SetTrainerInformation(pythonPackageVersion, pythonCommunicationVersion);


I don't love this, but it seems liked the best way to push this info to the analytics code for later use. We could potentially "pull" it from here in analytics, but that seems more brittle.

ml-agents/mlagents/trainers/learn.py

ml-agents/mlagents/trainers/subprocess_env_manager.py

surfnerd · 2021-01-14T18:09:29Z

com.unity.ml-agents/Runtime/Analytics/AnalyticsUtils.cs

+        /// Hash a string to remove PII or secret info before sending to analytics
+        /// </summary>
+        /// <param name="s"></param>
+        /// <returns></returns>


will this cause a warning during validation?

Will add some actual content there. It's internal, though, so probably not.

com.unity.ml-agents/Runtime/SideChannels/TrainingAnalyticsSideChannel.cs

com.unity.ml-agents/Tests/Editor/TrainingAnalyticsSideChannelTests.cs

surfnerd

LGTM, I think someone more familiar with the python end should also approve.
For the C# Trainging analytics side-channel tests: are they there just to show that they don't throw an exception?

vincentpierre · 2021-01-15T18:44:37Z

ml-agents/mlagents/trainers/subprocess_env_manager.py

-            worker_id, [env_parameters, engine_configuration_channel, stats_channel]
-        )
+        side_channels = [env_parameters, engine_configuration_channel, stats_channel]
+        if training_analytics_channel is not None:


I would put this line after line 161 (after we check if the environment can support training_analytics

Agreed it's a bit convoluted. But we don't currently have a way to add a SideChannel after the BaseEnv has been created, and we can't get the capabilities until after we've created the environment. So this always adds the sidechannel (for worker==0) but then doesn't use it if the capabilities don't support it.

Do you think it's worth adding an interface to BaseEnv to add SideChannels after creation? Note that modifying the side_channels list after creation won't have any effect, because the list is turned into an ID->SideChannel mapping in

ml-agents/ml-agents-envs/mlagents_envs/side_channel/side_channel_manager.py

Line 65 in 728d492

def _get_side_channels_dict(

dongruoping

Is the OS information already somewhere in the analytics?

chriselion · 2021-01-15T23:10:35Z

@dongruoping good call, we can add sys.platform if the equivalent isn't already there. Let me check what gets automatically added.

chriselion · 2021-01-16T00:17:17Z

Actually, looks like we already have that in the "base" event.

…nalytics

Chris Elion added 5 commits December 17, 2020 11:38

WIP

288e111

WIP

89da5e3

side channel works

ed114bb

send training start info

ceccf7d

move sidechannel to mlagents so it can import TrainerConfig, torch ve…

54f383b

…rsion, etc