Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter name inconsistency between 1st and 2nd stage #120

Closed
halsay opened this issue Nov 30, 2023 · 2 comments
Closed

Parameter name inconsistency between 1st and 2nd stage #120

halsay opened this issue Nov 30, 2023 · 2 comments

Comments

@halsay
Copy link

halsay commented Nov 30, 2023

return getattr(self.module, name)

In 2nd stage training, names of all parameters are added 'module.' as a prefix. However, in the 1st stage they are not.
In that case, parameters trained from 1st stage are not really loaded in 2nd stage and you won't get an error because they are not strict loaded (It even prints the model is loaded).
https://github.com/yl4579/StyleTTS2/blob/main/models.py#L696
I get Nan at the beginning of 2nd stage training and I manage to find out the problem described above. And 2nd stage training becomes normal after the parameters successfully loaded.

@RillmentGames
Copy link

I ran into this as well, Issue #21 has the solution for going from stage 1 to stage 2 but you need to remove the extra code again when reloading a stage2 checkpoint.

@yl4579
Copy link
Owner

yl4579 commented Dec 2, 2023

You will need to run the first stage with multiple GPUs. Currently single GPU training for first stage is not supported. You can also remove all MyDataParallel in train_second.py to fix this problem if you only have one GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants