About the WCEP dataset #16

zhangzx-uiuc · 2022-07-26T17:38:03Z

Hi there, thanks for releasing the code of you work!

Just wondering can you also release the WCEP-10 dataset you are using in your paper? The original WCEP (https://drive.google.com/drive/folders/1T5wDxu4ajFwEq77dG88oE95e8ppREamg?usp=sharing) has more than 10 docs in each cluster, and I just would like to confirm the exact way you obtain the WCEP-10 version. Did you just select the first 10 docs using [0:10] or you did something like random sampling?

Thanks and looking forward to your reply.

The text was updated successfully, but these errors were encountered:

Wendy-Xiao · 2022-08-06T22:34:38Z

Hi there,

Thanks for your interest in the paper!

For the details of WCEP dataset, you can refer to section B in appendix. We simply remove the duplicates in the input documents in each cluster, and select the top-10 documents based on the relevance score provided in the original dataset, which is built in the same way as described in the original paper.

JohnGiorgi mentioned this issue Nov 23, 2022

Is there any notion of relevance or rank? complementizer/wcep-mds-dataset#9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the WCEP dataset #16

About the WCEP dataset #16

zhangzx-uiuc commented Jul 26, 2022 •

edited

Loading

Wendy-Xiao commented Aug 6, 2022

About the WCEP dataset #16

About the WCEP dataset #16

Comments

zhangzx-uiuc commented Jul 26, 2022 • edited Loading

Wendy-Xiao commented Aug 6, 2022

zhangzx-uiuc commented Jul 26, 2022 •

edited

Loading