Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem in reproducing results on Figure 1 #5

Open
JunHanStudy opened this issue Jan 26, 2024 · 2 comments
Open

problem in reproducing results on Figure 1 #5

JunHanStudy opened this issue Jan 26, 2024 · 2 comments

Comments

@JunHanStudy
Copy link

Thanks for the first open-sourced diffusion model on EHR. When we ran GAN baselines and EHRDiff on MIMIC or other datasets, we found the correlation between feature prevalence of synthetic data and feature prevalence of real data are both ~0.8, much lower than 0.99. Is there any tricks to run GAN baselines and EHRDiff?

@JunHanStudy
Copy link
Author

One reason could be that your paper reports Pearson corr, which is high for all methods. While we evaluate Spearman corr, some methods have pretty low Spearman corr.

@sczzz3
Copy link
Owner

sczzz3 commented Mar 22, 2024

I believe the issue lies with the metrics. Given that the MIMIC data predominantly features rare ICD codes, the distinction between these codes is quite subtle. This nuance can be amplified by the Spearman correlation. However, this might not be critically significant, as the Pearson correlation tends to be more relevant in this context, and there are many other metrics available for evaluating the results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants