problem in reproducing results on Figure 1 #5

JunHanStudy · 2024-01-26T01:38:45Z

Thanks for the first open-sourced diffusion model on EHR. When we ran GAN baselines and EHRDiff on MIMIC or other datasets, we found the correlation between feature prevalence of synthetic data and feature prevalence of real data are both ~0.8, much lower than 0.99. Is there any tricks to run GAN baselines and EHRDiff?

JunHanStudy · 2024-01-26T16:27:15Z

One reason could be that your paper reports Pearson corr, which is high for all methods. While we evaluate Spearman corr, some methods have pretty low Spearman corr.

sczzz3 · 2024-03-22T10:40:34Z

I believe the issue lies with the metrics. Given that the MIMIC data predominantly features rare ICD codes, the distinction between these codes is quite subtle. This nuance can be amplified by the Spearman correlation. However, this might not be critically significant, as the Pearson correlation tends to be more relevant in this context, and there are many other metrics available for evaluating the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problem in reproducing results on Figure 1 #5

problem in reproducing results on Figure 1 #5

JunHanStudy commented Jan 26, 2024

JunHanStudy commented Jan 26, 2024

sczzz3 commented Mar 22, 2024

problem in reproducing results on Figure 1 #5

problem in reproducing results on Figure 1 #5

Comments

JunHanStudy commented Jan 26, 2024

JunHanStudy commented Jan 26, 2024

sczzz3 commented Mar 22, 2024