Skip to content

Update viz.clustered_scenarios to allow confidence interval plot #482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Sep 22, 2023

Conversation

Zapiano
Copy link
Contributor

@Zapiano Zapiano commented Sep 14, 2023

  • Updated functions _clusters_colors and _cluster_color to _get_colors and _get_alphas
  • Added return type to all functions in this file
  • Adapted the whole file to the style guide

There is an option to fill or not the area inside the confidence interval - the best visualization option may depend on the situation. The default value of :fill_confint is currently set to true:

clustered_scenarios_true

This is how it looks with :fill-confint set to false:

clustered_scenarios_false

Closes #464

@Zapiano Zapiano added enhancement New feature or request maintenance labels Sep 14, 2023
@Zapiano Zapiano self-assigned this Sep 14, 2023
@ConnectedSystems
Copy link
Collaborator

This is how it looks with :fill-confint set to false:

I'm struggling to see where this is useful - I suggest :aggregate or :summarize (defaulting to true) which plots either the individual series or the aggregated version with filled confidence intervals.

Incidentally, I'm not against confint, but also putting ci (or CI) as an option as it's an established acronym.

Copy link
Collaborator

@ConnectedSystems ConnectedSystems left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Pedro, I think some minor tweaks (mostly to get rid of push!()) are needed.

- Updated functions _clusters_colors and _cluster_color to _get_colors and _get_alphas
- Added return type to all functions in this file
- Adapted the whole file to the style guide
@Zapiano
Copy link
Contributor Author

Zapiano commented Sep 17, 2023

This is how it looks with :fill-confint set to false:

I'm struggling to see where this is useful - I suggest :aggregate or :summarize (defaulting to true) which plots either the individual series or the aggregated version with filled confidence intervals.

100% agree.

Incidentally, I'm not against confint, but also putting ci (or CI) as an option as it's an established acronym.

I know about ci, but I really think it's better to avoid acronyms when possible.

@Zapiano
Copy link
Contributor Author

Zapiano commented Sep 18, 2023

@ConnectedSystems ready for another review.

About the timestep() function:
This function is already being used (line 39) when the Axis is created, BUT I have to pass an array to band! with x-axis values. You can see that the previous plots doesn't have x-axis labels. I had to use this range (line 84) to fix that. This is the final result:

:summarize = true :summarize = false
clustered_scenarios_summarize-true clustered_scenarios_summarize-false

Copy link
Collaborator

@ConnectedSystems ConnectedSystems left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things to think about, but overall looks good!

Comment on lines +208 to +212
for (i, cluster) in enumerate(unique(clusters))
n_scens = count(clusters .== cluster)
base_alpha = 1.0 / (n_scens * 0.05)
alphas[i] = max(min(base_alpha, 0.6), 0.1)
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this now, shouldn't the weighting be based on total number of scenarios?
What we have right now could lead to a smaller cluster hiding a larger cluster (unless I'm misinterpreting the code here?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a small cluster would hide a large one because alpha is, at the most, 0.6 (for small clusters) but in this case the cluster has only a few lines.

Considering the result (figure above) it seems to me that this works fine.

That said, if I would do this now, I would probably try something like:

base_alpha = 1 - count(clusters .== cluster)/length(clusters)
alphas[i] = max(min(base_alpha, 0.6), 0.1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think this change is out of the scope of this PR so if you think we should change this I suggest we open an issue, what do you think?

Copy link
Collaborator

@ConnectedSystems ConnectedSystems Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for an issue if you're going to submit a PR directly after (but please do if your not planning to make changes immediately).

By hiding, take a look at the example figures provided here. See how the orange trajectories are blocked from view.

#492 (comment)

Maybe we plot smaller clusters last?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wrote "smaller cluster hiding a larger cluster" but in this figure what I see is a larger cluster hiding a smaller cluster, right? But yes, I think a good improvement would be to plot clusters in order from larger to smaller.

Let me open a different PR to do that and also try to change the way this alpha is computed and test some different cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wrote "smaller cluster hiding a larger cluster" but in this figure what I see is a larger cluster hiding a smaller cluster, right?

Yes, correct, accidentally swapped those, sorry for the miscommunication

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the problem with the figures you sent from PR 492 wasn't the alpha weight.

If you see my last commit - and the commit message - you will see what happened:

Currently we are iterating over the cluster with a for cluster in unique(clusters) but if clusters vector is something like [2, 1, 1, 3, 2], then unique(clusters) will return [2, 1, 3], and then cluster number 2, for instance, will get the weight that should be assigned to cluster 1, because we are using the index of the cluster inside the vector as it was the number of the cluster itself.

Copy link
Contributor Author

@Zapiano Zapiano Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I just tested here and that doesn't solve the visualization problem. I think plotting larger clusters before smaller ones will solve that though.

Previously we were using the index of unique(clusters) array to find clusters colors, plot clusters and render the legend. But depending on the order they appear in clusters vector this could led to mistakes, like cluster number 2 being the first in this vector (meaning it has index 1). That was fixed in a previous commit but the legend wasn't.
@Zapiano
Copy link
Contributor Author

Zapiano commented Sep 20, 2023

@ConnectedSystems

I just made a small change, so this summary visualization works with other types of timeseries, not only scenarios. This is how it looks for coral cover and different sites (target cluster == 1 and non-target == 2):

Coral Cover (target high values) Shelter Volume (target low values)
tac_outcome_series_agg_false asv_outcome_series_agg_false
tac_outcome_series_agg asv_outcome_series_agg

Rosejoycrocker and others added 5 commits September 21, 2023 14:10
Update version number to 0.9.0
Add .JuliaFormatter.toml to .gitignore
- Updated functions _clusters_colors and _cluster_color to _get_colors and _get_alphas
- Added return type to all functions in this file
- Adapted the whole file to the style guide
Also remove option to not fill confint when it is plotted
Previously we were using the index of unique(clusters) array to find clusters colors, plot clusters and render the legend. But depending on the order they appear in clusters vector this could led to mistakes, like cluster number 2 being the first in this vector (meaning it has index 1). That was fixed in a previous commit but the legend wasn't.
@ConnectedSystems
Copy link
Collaborator

Hi @Zapiano , you don't need to merge in unrelated changes if they aren't relevant to the PR.

@ConnectedSystems ConnectedSystems merged commit 8b322bd into main Sep 22, 2023
@Zapiano
Copy link
Contributor Author

Zapiano commented Sep 22, 2023

Hi @Zapiano , you don't need to merge in unrelated changes if they aren't relevant to the PR.

I know... it was by accident

@Zapiano Zapiano deleted the agg-cluster-viz branch September 25, 2023 00:38
@Zapiano Zapiano mentioned this pull request Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request maintenance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Summarized vizualisation for time series clustering
3 participants