Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to cache my mixture #69

Open
renmengjie7 opened this issue May 2, 2023 · 2 comments
Open

How to cache my mixture #69

renmengjie7 opened this issue May 2, 2023 · 2 comments

Comments

@renmengjie7
Copy link

renmengjie7 commented May 2, 2023

I noticed the annotation

# If you're using Seqio, we suggest caching your mixture as they take a while to generate.

But I don't know how to do it

@shayne-longpre
Copy link
Collaborator

Here are some resources, and there should be more info in the documentation: https://github.com/google/seqio#optional-offline-caching.

This caching is for if you are using the same vocabulary as T5. If you want to train a different model, stream the output and save their pretokenized text format, as we do in run_example.py.

@sunyi06200
Copy link

sunyi06200 commented Jul 13, 2023

So where can add a seqio.CacheDatasetPlaceholder(required=False)?Or is there a more detailed steps in how to use the five original submix you provided, when I've downloaded them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants