-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path #50665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have scala tests to make sure the new save/load works?
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
* Saves the ML instances to the local file system path. | ||
*/ | ||
@throws[IOException]("If the input path already exists but overwrite is not enabled.") | ||
private[spark] def saveToLocal(path: String): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of introduce a new saveToLocal
method, what about adding new class MLLocalWriter extends MLWriter
?
Then based on the writer class, we can know whether an algorithm support local read/write
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is changing the base class might influence compatibility,
currently saveToLocal
and loadFromLocal
are only developing APIs and are only used in Spark connect server for ML cache management.
So that I tend to keep current code.
once we decide to make them as public APIs, we can consider to make a better API / Base class design.
thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG
merged to master |
It appears that after merging this pr, it caused test failures for Here's how I conducted the local inspection:
The reason why this issue wasn't detected during the GitHub Actions of this pr is that the changes in the |
I will revert this one first to restore the GitHub Actions workflow for the master. @WeichenXu123 @zhengruifeng |
Reopen this pr and we can re-merge it after fixing the above issue |
@LuciferYang I updated my PR. it should fix the issue . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending tests
Merged into master. Thanks @WeichenXu123 and @zhengruifeng |
@Since("1.6.0") | ||
object LogisticRegressionModel extends MLReadable[LogisticRegressionModel] { | ||
case class Data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why it fails MLSuite
, but we should not make it public.
It seems private[spark]
can work. #50760
…inst local filesystem path ### What changes were proposed in this pull request? Make scala model supporting save / load methods (deverloper api) against local filesystem path. ### Why are the changes needed? This is required by Spark Connect server model cache management. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50665 from WeichenXu123/ml-save-to-local. Authored-by: Weichen Xu <weichen.xu@databricks.com> Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
…inst local filesystem path ### What changes were proposed in this pull request? Make scala model supporting save / load methods (deverloper api) against local filesystem path. ### Why are the changes needed? This is required by Spark Connect server model cache management. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50665 from WeichenXu123/ml-save-to-local. Authored-by: Weichen Xu <weichen.xu@databricks.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>
…inst local filesystem path ### What changes were proposed in this pull request? Make scala model supporting save / load methods (deverloper api) against local filesystem path. ### Why are the changes needed? This is required by Spark Connect server model cache management. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50665 from WeichenXu123/ml-save-to-local. Authored-by: Weichen Xu <weichen.xu@databricks.com> Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
…inst local filesystem path ### What changes were proposed in this pull request? Make scala model supporting save / load methods (deverloper api) against local filesystem path. ### Why are the changes needed? This is required by Spark Connect server model cache management. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50665 from WeichenXu123/ml-save-to-local. Authored-by: Weichen Xu <weichen.xu@databricks.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>
What changes were proposed in this pull request?
Make scala model supporting save / load methods (deverloper api) against local filesystem path.
Why are the changes needed?
This is required by Spark Connect server model cache management.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit tests.
Was this patch authored or co-authored using generative AI tooling?
No.