Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the WCEP dataset #16

Open
zhangzx-uiuc opened this issue Jul 26, 2022 · 1 comment
Open

About the WCEP dataset #16

zhangzx-uiuc opened this issue Jul 26, 2022 · 1 comment

Comments

@zhangzx-uiuc
Copy link

zhangzx-uiuc commented Jul 26, 2022

Hi there, thanks for releasing the code of you work!

Just wondering can you also release the WCEP-10 dataset you are using in your paper? The original WCEP (https://drive.google.com/drive/folders/1T5wDxu4ajFwEq77dG88oE95e8ppREamg?usp=sharing) has more than 10 docs in each cluster, and I just would like to confirm the exact way you obtain the WCEP-10 version. Did you just select the first 10 docs using [0:10] or you did something like random sampling?

Thanks and looking forward to your reply.

@Wendy-Xiao
Copy link
Contributor

Hi there,

Thanks for your interest in the paper!

For the details of WCEP dataset, you can refer to section B in appendix. We simply remove the duplicates in the input documents in each cluster, and select the top-10 documents based on the relevance score provided in the original dataset, which is built in the same way as described in the original paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants