Wadaboa
diff --git a/‎.gitignore
+1-4 b/‎.gitignore
+1-4
diff --git a/‎README.md
+37-3 b/‎README.md
+37-3
diff --git a/‎results/ls-baseline-ce-svd.png
23.9 KB b/‎results/ls-baseline-ce-svd.png
23.9 KB
diff --git a/‎results/ls-baseline-ce-umap.png
21.3 KB b/‎results/ls-baseline-ce-umap.png
21.3 KB
diff --git a/‎results/ls-titanet-arc-svd.png
20.6 KB b/‎results/ls-titanet-arc-svd.png
20.6 KB
diff --git a/‎results/ls-titanet-arc-umap.png
11.9 KB b/‎results/ls-titanet-arc-umap.png
11.9 KB
diff --git a/‎results/ls-titanet-ce-svd.png
20.3 KB b/‎results/ls-titanet-ce-svd.png
20.3 KB
diff --git a/‎results/ls-titanet-ce-umap.png
15.1 KB b/‎results/ls-titanet-ce-umap.png
15.1 KB
diff --git a/‎results/vctk-baseline-ce-svd.png
25.4 KB b/‎results/vctk-baseline-ce-svd.png
25.4 KB
diff --git a/‎results/vctk-baseline-ce-umap.png
22.5 KB b/‎results/vctk-baseline-ce-umap.png
22.5 KB
diff --git a/‎results/vctk-titanet-arc-svd.png
20.4 KB b/‎results/vctk-titanet-arc-svd.png
20.4 KB
diff --git a/‎results/vctk-titanet-arc-umap.png
15.6 KB b/‎results/vctk-titanet-arc-umap.png
15.6 KB
diff --git a/‎results/vctk-titanet-ce-svd.png
19.4 KB b/‎results/vctk-titanet-ce-svd.png
19.4 KB
diff --git a/‎results/vctk-titanet-ce-umap.png
16 KB b/‎results/vctk-titanet-ce-umap.png
16 KB
diff --git a/‎src/utils.py
+3-7 b/‎src/utils.py
+3-7
@@ -137,11 +137,8 @@ data/
 # Training
 checkpoints/
 figures/
+pretrained/
 
 # Wandb
 init/wandb_api_key_file
 wandb/
-
-# Results
-pretrained/
-results/
@@ -12,7 +12,7 @@ Nithin Rao Koluguri, Taejin Park, Boris Ginsburg,
 https://arxiv.org/abs/2110.04410.
 ```
 
-It is "small scale" because we only rely on the LibriSpeech dataset, instead of using VoxCeleb1, VoxCeleb2, SRE, Fisher, Switchboard and LibriSpeech, as done in the original work. The main reason for this choice is related to resources, as the combined dataset has 3373 hours of speech, with 16681 speakers and 4890K utterances, which is quite big to be trained on Google Colab. Instead, LibriSpeech has 336 hours of speech, with 2338 speakers and 634K utterances, which is sufficient to test the capabilities of the model. Moreover, we only test TitaNet on the speaker identification task, instead of testing it on speaker verification and diarization.
+It is "small scale" because we only rely on the LibriSpeech dataset, instead of using VoxCeleb1, VoxCeleb2, SRE, Fisher, Switchboard and LibriSpeech, as done in the original work. The main reason for this choice is related to resources, as the combined dataset has 3373 hours of speech, with 16681 speakers and 4890K utterances, which is quite big to be trained on Google Colab. Instead, the LibriSpeech subset that we consider has about 100 hours of speech, with 251 speakers and 28.5K utterances, which is sufficient to test the capabilities of the model. Moreover, we only test TitaNet on the speaker identification and verification tasks, instead of also testing it on speaker diarization.
 
 ## Installation
 
@@ -26,7 +26,7 @@ pip install -r init/requirements.txt
 
 ## Execution
 
-Both training and testing parts of the project are managed through a Jupyter notebook ([titanet.ipynb](titanet.ipynb)). The notebook contains a broad analysis of the dataset in use, an explanation of all the data augmentation techniques reported in the paper, a description of the TitaNet model and a way to train and test it. Hyper-parameters are handled via the `parameters.yml` file. To run the Jupyter notebook, execute the following command:
+Both training and testing parts of the project are managed through a Jupyter notebook ([titanet.ipynb](titanet.ipynb)). The notebook contains a broad analysis of the dataset in use, an explanation of all the data augmentation techniques reported in the paper, a description of the baseline and TitaNet models and a way to train and test them. Hyper-parameters are handled via the `parameters.yml` file. To run the Jupyter notebook, execute the following command:
 
 ```bash
 jupyter notebook titanet.ipynb
@@ -40,4 +40,38 @@ python3 src/train.py -p "./parameters.yml"
 
 Training and evaluation metrics, along with model checkpoints and results, are directly logged into a W&B project, which is openly accessible [here](https://wandb.ai/wadaboa/titanet). In case you want to perform a custom training run, you have to either disable W&B (see `parameters.yml`) or provide your own entity (your username), project and API key file location in the `parameters.yml` file. The W&B API key file is a plain text file that contains a single line with your W&B API key, that you can get from [here](https://wandb.ai/authorize).
 
-Currently, training and testing on Google Colab is only allowed to the repository owner, as it relies on a private SSH key to clone this Github repo. Please open an issue if your use case requires you to work on Google Colab.
+## Results
+
+This section shows some visual results obtained after training each embedding model for around 75 epochs. Please note that all figures represent the same set of utterances, even though different figures use different colours for the same speaker.
+
+### Baseline vs TitaNet on LibriSpeech
+
+This test compares the baseline and TitaNet models on the LibriSpeech dataset used for training. Both models were trained with cross-entropy loss and 2D projections were performed with UMAP. As we can see, the good training and validation metrics of the baseline model are not mirrored in this empirical test. Instead, TitaNet is able to form compact clusters of utterances, thus reflecting the high performance metrics obtained during training.
+
+Baseline             |  TitaNet
+:-------------------------:|:-------------------------:
+![](results/ls-baseline-ce-umap.png)  |  ![](results/ls-titanet-ce-umap.png)
+
+### Baseline vs TitaNet on VCTK
+
+This test compares the baseline and TitaNet models on the VCTK dataset, unseen during training. Both models were trained with cross-entropy loss and 2D projections were performed with UMAP. As above, TitaNet beats the baseline model by a large margin.
+
+Baseline             |  TitaNet
+:-------------------------:|:-------------------------:
+![](results/vctk-baseline-ce-umap.png)  |  ![](results/vctk-titanet-ce-umap.png)
+
+### SVD vs UMAP reduction
+
+This test compares two 2D reduction methods, namely SVD and UMAP. Both figures rely on the TitaNet model trained with cross-entropy loss. As we can see, the choice of the reduction method highly influences our subjective evaluation, with UMAP giving much better separation in the latent space.
+
+TitaNet LS SVD             |  TitaNet LS UMAP
+:-------------------------:|:-------------------------:
+![](results/ls-titanet-ce-svd.png)  |  ![](results/ls-titanet-ce-umap.png)
+
+### Cross-entropy vs ArcFace loss
+
+This test compares two TitaNet models, one trained with cross-entropy loss and the other one trained with ArcFace loss. Both figures rely on UMAP as their 2D reduction method. As we can see, there doesn't seem to be a winner in this example, as both models are able to obtain good clustering properties.
+
+Cross-entropy           |  ArcFace
+:-------------------------:|:-------------------------:
+![](results/ls-titanet-ce-umap.png)  |  ![](results/ls-titanet-arc-umap.png)
@@ -97,13 +97,8 @@ def visualize_embeddings(
         )
 
     # Store embeddings in a dataframe and compute cluster colors
-    embeddings_df = pd.DataFrame(
-        np.concatenate([embeddings, np.expand_dims(labels, axis=-1)], axis=-1),
-        columns=["x", "y", "l"],
-    )
-    embeddings_df.x = embeddings_df.x.astype(float)
-    embeddings_df.y = embeddings_df.y.astype(float)
-    embeddings_df.l = embeddings_df.l.astype(str)
+    embeddings_df = pd.DataFrame(embeddings, columns=["x", "y"], dtype=np.float32)
+    embeddings_df["l"] = np.expand_dims(labels, axis=-1)
     cluster_colors = {l: np.random.random(3) for l in np.unique(labels)}
     embeddings_df["c"] = embeddings_df.l.map(
         {l: tuple(c) for l, c in cluster_colors.items()}
@@ -120,6 +115,7 @@ def visualize_embeddings(
             color=c,
             label=f"{label} (C)",
             marker="^",
+            s=250,
         )
         if not only_centroids:
             ax.scatter(to_plot.x, to_plot.y, color=c, label=f"{label}")