AssertionError in position embedding (potentially due to missing clear_cache
between batches of data)
#1340
Labels
bug
Something isn't working
Describe the bug
I have been trying to use lm-evaluation-harness with
gpt-neox/eval.py
. AFAIK the other query types besidesgenerate_until
work fine. Withgenerate_until
, I run into this assertion check (in the position embedding module) after a couple of examples have been processed.gpt-neox/megatron/model/positional_embeddings.py
Line 88 in 9107b25
In my testing, the model is about to generate (say) token 48. I have verified that the
token_index_to_generate
ingpt-neox/megatron/text_generation_utils.py
is in fact 48. But somehow RotaryEmbedding is trying to create an embedding for position 1025 (beyond the model_max_length).To Reproduce
Will fill in reproducible configs. Currently, I'm using a model with a custom config (but trained in neox) and evalauting on a QA dataset (where eval-harness uses
generate_until
).Proposed solution
I suspect the issue is caused by a missing
clear_cache()
between batches of data. Addingmodel.module.clear_cache()
at the start ofgpt-neox/megatron/text_generation_utils.py:stream_tokens
seems to fix it on my side.I am unsure whether this is correct and if it's a complete fix. The same
clear_cache
operation seems to be invoked ingenerate_samples_interactive
but not ingenerate_samples_from_prompt
.Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: