Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where can I obtain a generated dataset that includes an options column #76

Open
nanyyyyyy opened this issue May 28, 2023 · 7 comments
Open

Comments

@nanyyyyyy
Copy link

nanyyyyyy commented May 28, 2023

Where can I obtain a generated dataset that includes an options column, which can be used for rank evaluation purposes? Thank you.

@nanyyyyyy nanyyyyyy changed the title any idea how to resolve this? Where can I obtain a generated dataset that includes an options column May 29, 2023
@shayne-longpre
Copy link
Collaborator

@nanyyyyyy You would need to re-generate it and pass through an options column for relevant datasets. This could cost a bit of compute though. Alternatively you could isolate the options datasets and use a regex to extract them.

Sorry, this data was intended primarily for training so we didn't pass that information along. Hope this helps though!

@gao-xiao-bai
Copy link

Can you explain a bit about this? I want to include options and the exact template for generating each instance in the dataset. What are the detailed steps to achieve this?

@nanyyyyyy
Copy link
Author

Can you explain a bit about this? I want to include options and the exact template for generating each instance in the dataset. What are the detailed steps to achieve this?

I haven't figured it out. sorry

@shayne-longpre
Copy link
Collaborator

@nanyyyyyy @gao-xiao-bai So to generate all the templates and options alongside each example you would need to edit the preprocessors used for every task.

One in particular is the formatter (here) which is what applies the pattern (or "template") to each example. You could create a function like this one to store the pattern as a field, and make sure its passed all the way through to the final generated examples by adding to the list of passthrough fields here.

To get the answer options you would do the same thing, passing through the "options"key in each example, for the datasets that have the format_options preprocessor (see here).

@nanyyyyyy
Copy link
Author

@nanyyyyyy @gao-xiao-bai So to generate all the templates and options alongside each example you would need to edit the preprocessors used for every task.

One in particular is the formatter (here) which is what applies the pattern (or "template") to each example. You could create a function like this one to store the pattern as a field, and make sure its passed all the way through to the final generated examples by adding to the list of passthrough fields here.

To get the answer options you would do the same thing, passing through the "options"key in each example, for the datasets that have the format_options preprocessor (see here).

This is super helpful. thanks a lot

@gao-xiao-bai
Copy link

@nanyyyyyy @gao-xiao-bai So to generate all the templates and options alongside each example you would need to edit the preprocessors used for every task.

One in particular is the formatter (here) which is what applies the pattern (or "template") to each example. You could create a function like this one to store the pattern as a field, and make sure its passed all the way through to the final generated examples by adding to the list of passthrough fields here.

To get the answer options you would do the same thing, passing through the "options"key in each example, for the datasets that have the format_options preprocessor (see here).

Thank you for your response.

@a-antoniades
Copy link

@nanyyyyyy @gao-xiao-bai were you guys able to figure this out?

@shayne-longpre I must say it's a little weird not to include the options since FLAN paper evaluations are based on rank-classification with options, so it seems like a key thing to include. The data is appreciated nonetheless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants