[SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path #50665

WeichenXu123 · 2025-04-22T14:41:59Z

What changes were proposed in this pull request?

Make scala model supporting save / load methods (deverloper api) against local filesystem path.

Why are the changes needed?

This is required by Spark Connect server model cache management.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala

zhengruifeng

do we have scala tests to make sure the new save/load works?

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

zhengruifeng · 2025-04-28T10:20:40Z

mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala

+   * Saves the ML instances to the local file system path.
+   */
+  @throws[IOException]("If the input path already exists but overwrite is not enabled.")
+  private[spark] def saveToLocal(path: String): Unit = {


Instead of introduce a new saveToLocal method, what about adding new class MLLocalWriter extends MLWriter?

Then based on the writer class, we can know whether an algorithm support local read/write

My concern is changing the base class might influence compatibility,
currently saveToLocal and loadFromLocal are only developing APIs and are only used in Spark connect server for ML cache management.

So that I tend to keep current code.
once we decide to make them as public APIs, we can consider to make a better API / Base class design.
thoughts ?

WeichenXu123 · 2025-04-29T09:35:27Z

merged to master

LuciferYang · 2025-04-29T11:44:58Z

It appears that after merging this pr, it caused test failures for org.apache.spark.sql.connect.ml.MLSuite in the connect module.

https://github.com/apache/spark/actions/runs/14728199705/job/41335911530

Here's how I conducted the local inspection:

git reset --hard 6f9bf73c345d70c3d27ea2e1ebadaa03a275fb3c // this one 
build/sbt clean "connect/testOnly org.apache.spark.sql.connect.ml.MLSuite"

[info] - LogisticRegression works *** FAILED *** (8 seconds, 2 milliseconds)
[info]   org.apache.spark.SparkRuntimeException: [EXPRESSION_DECODING_FAILED] Failed to decode a row to a value of the expressions: newInstance(class org.apache.spark.ml.classification.LogisticRegressionModel$Data). SQLSTATE: 42846
[info]   at org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1364)
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:95)
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:80)
[info]   at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
[info]   at org.apache.spark.sql.classic.Dataset.collectFromPlan(Dataset.scala:2244)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$head$1(Dataset.scala:1381)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$withAction$2(Dataset.scala:2234)
[info]   at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:642)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$withAction$1(Dataset.scala:2232)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$8(SQLExecution.scala:162)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withSessionTagsApplied(SQLExecution.scala:268)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$7(SQLExecution.scala:124)
[info]   at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:106)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:124)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:291)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:123)
[info]   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:77)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:233)
[info]   at org.apache.spark.sql.classic.Dataset.withAction(Dataset.scala:2232)
[info]   at org.apache.spark.sql.classic.Dataset.head(Dataset.scala:1381)
[info]   at org.apache.spark.sql.Dataset.head(Dataset.scala:2683)
[info]   at org.apache.spark.ml.util.ReadWriteUtils$.loadObject(ReadWrite.scala:881)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1375)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1350)
[info]   at org.apache.spark.ml.util.MLReadable.load(ReadWrite.scala:385)
[info]   at org.apache.spark.ml.util.MLReadable.load$(ReadWrite.scala:385)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$.load(LogisticRegression.scala:1332)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel.load(LogisticRegression.scala)
...
[info]   Cause: java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 35, Column 8: Failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 35, Column 8: Private member cannot be accessed from type "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection".
[info]   at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:604)
[info]   at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:559)
[info]   at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:114)
[info]   at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:247)
[info]   at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2349)
[info]   at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2317)
[info]   at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2190)
[info]   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2080)
[info]   at com.google.common.cache.LocalCache.get(LocalCache.java:4017)
[info]   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4040)
[info]   at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4989)
[info]   at org.apache.spark.util.NonFateSharingLoadingCache.$anonfun$get$2(NonFateSharingCache.scala:108)
[info]   at org.apache.spark.util.KeyLock.withLock(KeyLock.scala:64)
[info]   at org.apache.spark.util.NonFateSharingLoadingCache.get(NonFateSharingCache.scala:108)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1490)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:205)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:39)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1415)
[info]   at org.apache.spark.sql.catalyst.expressions.SafeProjection$.createCodeGeneratedObject(Projection.scala:172)
[info]   at org.apache.spark.sql.catalyst.expressions.SafeProjection$.createCodeGeneratedObject(Projection.scala:169)
[info]   at org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:45)
[info]   at org.apache.spark.sql.catalyst.expressions.SafeProjection$.create(Projection.scala:195)
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:87)
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:80)
[info]   at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
[info]   at org.apache.spark.sql.classic.Dataset.collectFromPlan(Dataset.scala:2244)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$head$1(Dataset.scala:1381)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$withAction$2(Dataset.scala:2234)
[info]   at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:642)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$withAction$1(Dataset.scala:2232)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$8(SQLExecution.scala:162)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withSessionTagsApplied(SQLExecution.scala:268)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$7(SQLExecution.scala:124)
[info]   at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:106)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:124)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:291)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:123)
[info]   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:77)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:233)
[info]   at org.apache.spark.sql.classic.Dataset.withAction(Dataset.scala:2232)
[info]   at org.apache.spark.sql.classic.Dataset.head(Dataset.scala:1381)
[info]   at org.apache.spark.sql.Dataset.head(Dataset.scala:2683)
[info]   at org.apache.spark.ml.util.ReadWriteUtils$.loadObject(ReadWrite.scala:881)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1375)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1350)
[info]   at org.apache.spark.ml.util.MLReadable.load(ReadWrite.scala:385)
[info]   at org.apache.spark.ml.util.MLReadable.load$(ReadWrite.scala:385)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$.load(LogisticRegression.scala:1332)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel.load(LogisticRegression.scala)
[info]   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[info]   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
[info]   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info]   at java.base/java.lang.reflect.Method.invoke(Method.java:569)
[info]   at org.apache.spark.sql.connect.ml.MLUtils$.loadOperator(MLUtils.scala:422)
[info]   at org.apache.spark.sql.connect.ml.MLUtils$.loadTransformer(MLUtils.scala:447)
[info]   at org.apache.spark.sql.connect.ml.MLHandler$.handleMlCommand(MLHandler.scala:262)
[info]   at org.apache.spark.sql.connect.ml.MLHelper.readWrite(MLHelper.scala:227)
[info]   at org.apache.spark.sql.connect.ml.MLHelper.readWrite$(MLHelper.scala:196)
[info]   at org.apache.spark.sql.connect.ml.MLSuite.readWrite(MLSuite.scala:69)
[info]   at org.apache.spark.sql.connect.ml.MLSuite.$anonfun$new$2(MLSuite.scala:236)
...
[info]   Cause: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 35, Column 8: Failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 35, Column 8: Private member cannot be accessed from type "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection".
[info]   at org.apache.spark.sql.errors.QueryExecutionErrors$.compilerError(QueryExecutionErrors.scala:688)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.doCompile(CodeGenerator.scala:1557)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.$anonfun$cache$1(CodeGenerator.scala:1636)
[info]   at org.apache.spark.util.NonFateSharingCache$$anon$1.load(NonFateSharingCache.scala:68)
[info]   at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3574)
[info]   at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2316)
[info]   at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2190)
[info]   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2080)
[info]   at com.google.common.cache.LocalCache.get(LocalCache.java:4017)
[info]   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4040)
[info]   at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4989)
[info]   at org.apache.spark.util.NonFateSharingLoadingCache.$anonfun$get$2(NonFateSharingCache.scala:108)
[info]   at org.apache.spark.util.KeyLock.withLock(KeyLock.scala:64)
[info]   at org.apache.spark.util.NonFateSharingLoadingCache.get(NonFateSharingCache.scala:108)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1490)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:205)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:39)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1415)
[info]   at org.apache.spark.sql.catalyst.expressions.SafeProjection$.createCodeGeneratedObject(Projection.scala:172)
[info]   at org.apache.spark.sql.catalyst.expressions.SafeProjection$.createCodeGeneratedObject(Projection.scala:169)
[info]   at org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:45)
[info]   at org.apache.spark.sql.catalyst.expressions.SafeProjection$.create(Projection.scala:195)
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:87)
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:80)
[info]   at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
[info]   at org.apache.spark.sql.classic.Dataset.collectFromPlan(Dataset.scala:2244)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$head$1(Dataset.scala:1381)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$withAction$2(Dataset.scala:2234)
[info]   at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:642)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$withAction$1(Dataset.scala:2232)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$8(SQLExecution.scala:162)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withSessionTagsApplied(SQLExecution.scala:268)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$7(SQLExecution.scala:124)
[info]   at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:106)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:124)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:291)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:123)
[info]   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:77)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:233)
[info]   at org.apache.spark.sql.classic.Dataset.withAction(Dataset.scala:2232)
[info]   at org.apache.spark.sql.classic.Dataset.head(Dataset.scala:1381)
[info]   at org.apache.spark.sql.Dataset.head(Dataset.scala:2683)
[info]   at org.apache.spark.ml.util.ReadWriteUtils$.loadObject(ReadWrite.scala:881)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1375)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1350)
[info]   at org.apache.spark.ml.util.MLReadable.load(ReadWrite.scala:385)
[info]   at org.apache.spark.ml.util.MLReadable.load$(ReadWrite.scala:385)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$.load(LogisticRegression.scala:1332)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel.load(LogisticRegression.scala)
[info]   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[info]   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
[info]   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info]   at java.base/java.lang.reflect.Method.invoke(Method.java:569)
[info]   at org.apache.spark.sql.connect.ml.MLUtils$.loadOperator(MLUtils.scala:422)
[info]   at org.apache.spark.sql.connect.ml.MLUtils$.loadTransformer(MLUtils.scala:447)
[info]   at org.apache.spark.sql.connect.ml.MLHandler$.handleMlCommand(MLHandler.scala:262)
[info]   at org.apache.spark.sql.connect.ml.MLHelper.readWrite(MLHelper.scala:227)
[info]   at org.apache.spark.sql.connect.ml.MLHelper.readWrite$(MLHelper.scala:196)
[info]   at org.apache.spark.sql.connect.ml.MLSuite.readWrite(MLSuite.scala:69)
[info]   at org.apache.spark.sql.connect.ml.MLSuite.$anonfun$new$2(MLSuite.scala:236)
...

git reset --hard 86bf4c84805e89354d139ab72b298d3d4155fd0d // before this one 
build/sbt clean "connect/testOnly org.apache.spark.sql.connect.ml.MLSuite"

[info] MLSuite:
[info] - reconcileParam (141 milliseconds)
[info] - LogisticRegression works (5 seconds, 808 milliseconds)
[info] - Exception: Unsupported ML operator (15 milliseconds)
[info] - Exception: cannot retrieve object (246 milliseconds)
[info] - access the attribute which is not in allowed list (205 milliseconds)
[info] - Model must be registered into ServiceLoader when loading (1 millisecond)
[info] - RegressionEvaluator works (164 milliseconds)
[info] - VectorAssembler works (178 milliseconds)
[info] - Memory limitation of MLCache works (951 milliseconds)
[info] Run completed in 10 seconds, 668 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

The reason why this issue wasn't detected during the GitHub Actions of this pr is that the changes in the mllib module do not trigger the tests for the connect module now.

LuciferYang · 2025-04-29T11:50:39Z

I will revert this one first to restore the GitHub Actions workflow for the master. @WeichenXu123 @zhengruifeng

LuciferYang · 2025-04-29T12:06:25Z

Reopen this pr and we can re-merge it after fixing the above issue

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

WeichenXu123 · 2025-04-29T15:31:01Z

@LuciferYang I updated my PR. it should fix the issue .

LuciferYang

LGTM, pending tests

LuciferYang · 2025-04-30T02:16:25Z

Merged into master. Thanks @WeichenXu123 and @zhengruifeng

zhengruifeng · 2025-04-30T02:38:19Z

mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala

 @Since("1.6.0")
 object LogisticRegressionModel extends MLReadable[LogisticRegressionModel] {
+  case class Data(


I don't know why it fails MLSuite, but we should not make it public.
It seems private[spark] can work. #50760

…inst local filesystem path ### What changes were proposed in this pull request? Make scala model supporting save / load methods (deverloper api) against local filesystem path. ### Why are the changes needed? This is required by Spark Connect server model cache management. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50665 from WeichenXu123/ml-save-to-local. Authored-by: Weichen Xu <weichen.xu@databricks.com> Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

…inst local filesystem path ### What changes were proposed in this pull request? Make scala model supporting save / load methods (deverloper api) against local filesystem path. ### Why are the changes needed? This is required by Spark Connect server model cache management. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50665 from WeichenXu123/ml-save-to-local. Authored-by: Weichen Xu <weichen.xu@databricks.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>

…inst local filesystem path ### What changes were proposed in this pull request? Make scala model supporting save / load methods (deverloper api) against local filesystem path. ### Why are the changes needed? This is required by Spark Connect server model cache management. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50665 from WeichenXu123/ml-save-to-local. Authored-by: Weichen Xu <weichen.xu@databricks.com> Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

…inst local filesystem path ### What changes were proposed in this pull request? Make scala model supporting save / load methods (deverloper api) against local filesystem path. ### Why are the changes needed? This is required by Spark Connect server model cache management. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#50665 from WeichenXu123/ml-save-to-local. Authored-by: Weichen Xu <weichen.xu@databricks.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>

WeichenXu123 added 2 commits April 22, 2025 22:19

init

724bd8e

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

427f24f

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

github-actions bot added ML MLLIB labels Apr 22, 2025

zhengruifeng reviewed Apr 23, 2025

View reviewed changes

mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala Outdated Show resolved Hide resolved

zhengruifeng reviewed Apr 25, 2025

View reviewed changes

WeichenXu123 added 14 commits April 25, 2025 18:15

Merge branch 'master' into ml-save-to-local

6cbdff9

test

840c35c

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

cfcf513

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

als

bf3e248

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

f65091e

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

0685c5b

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

fix GaussianMixture

8463c1f

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

3e1afe8

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

Merge branch 'master' into ml-save-to-local

86b501a

features

77ea85f

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

fix features

c3936e8

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

5011b04

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

5d4af3e

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

61fc9aa

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

WeichenXu123 changed the title ~~[WIP] [SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path~~ [SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path Apr 28, 2025

zhengruifeng reviewed Apr 28, 2025

View reviewed changes

zhengruifeng approved these changes Apr 28, 2025

View reviewed changes

WeichenXu123 closed this in 6f9bf73 Apr 29, 2025

LuciferYang reopened this Apr 29, 2025

update

ea7d746

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

LuciferYang approved these changes Apr 29, 2025

View reviewed changes

LuciferYang closed this in aa6b5ed Apr 30, 2025

zhengruifeng reviewed Apr 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path #50665

[SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path #50665

Uh oh!

WeichenXu123 commented Apr 22, 2025

Uh oh!

Uh oh!

zhengruifeng left a comment

Uh oh!

zhengruifeng Apr 28, 2025

Uh oh!

WeichenXu123 Apr 28, 2025

Uh oh!

zhengruifeng Apr 29, 2025

Uh oh!

WeichenXu123 commented Apr 29, 2025

Uh oh!

LuciferYang commented Apr 29, 2025

Uh oh!

LuciferYang commented Apr 29, 2025

Uh oh!

LuciferYang commented Apr 29, 2025

Uh oh!

WeichenXu123 commented Apr 29, 2025

Uh oh!

LuciferYang left a comment

Uh oh!

LuciferYang commented Apr 30, 2025

Uh oh!

zhengruifeng Apr 30, 2025

Uh oh!

Uh oh!

[SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path #50665

[SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path #50665

Uh oh!

Conversation

WeichenXu123 commented Apr 22, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Uh oh!

zhengruifeng left a comment

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

WeichenXu123 Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

zhengruifeng Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

WeichenXu123 commented Apr 29, 2025

Uh oh!

LuciferYang commented Apr 29, 2025

Uh oh!

LuciferYang commented Apr 29, 2025

Uh oh!

LuciferYang commented Apr 29, 2025

Uh oh!

WeichenXu123 commented Apr 29, 2025

Uh oh!

LuciferYang left a comment

Choose a reason for hiding this comment

Uh oh!

LuciferYang commented Apr 30, 2025

Uh oh!

zhengruifeng Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!