Skip to content

Add Cosine LR Scheduler for Fine-Tuning #273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 25, 2025
Merged

Add Cosine LR Scheduler for Fine-Tuning #273

merged 11 commits into from
Mar 25, 2025

Conversation

azahed98
Copy link
Contributor

Have you read the Contributing Guidelines?
Yes

Issue # ENG-23270

Adds support for Cosine LR Scheduler in the fine-tuning api as well as simplifies future support of additional schedulers (https://git.1-hub.cntogethercomputer/together-finetune/pull/1349)

@mryab mryab requested review from VProv and removed request for mryab March 18, 2025 21:16
"--num-cycles",
type=float,
default=0.5,
help="Number of cycles for cosine learning rate scheduler.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help="Number of cycles for cosine learning rate scheduler.",
help="Number of cycles for the cosine learning rate scheduler.",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe also add what fractional values mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to "Number or fraction of cycles".

lr_scheduler_args=FinetuneLinearLRSchedulerArgs(min_lr_ratio=min_lr_ratio),
)
if lr_scheduler_type == "cosine":
if num_cycles <= 0.0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe also add some meaningful upperbound?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to know what a reasonable upperbound would be without knowing the number of steps afaik. I think it makes sense to follow the hf implementation and let the cosine alias if the user inputs something unreasonably large for their job.

min_lr_ratio: float = 0.0,
num_cycles: float = 0.5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call it scheduler_num_cycles for clarity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend expects the field in the request to be num_cycles, but in the scheduler args. Would just changing the CLI arg be ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed CLI arg to scheduler_num_cycles. Worth noting FinetuneCosineLRSChedulerArgs still has the field as num_cycles

@azahed98 azahed98 merged commit 10ad24d into main Mar 25, 2025
9 checks passed
@azahed98 azahed98 deleted the arsh/ft-cosine-lr branch March 25, 2025 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants