Skip to content

[SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path #50665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from

Conversation

WeichenXu123
Copy link
Contributor

What changes were proposed in this pull request?

Make scala model supporting save / load methods (deverloper api) against local filesystem path.

Why are the changes needed?

This is required by Spark Connect server model cache management.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Copy link
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have scala tests to make sure the new save/load works?

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@WeichenXu123 WeichenXu123 changed the title [WIP] [SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path [SPARK-51867][ML] Make scala model supporting save / load methods against local filesystem path Apr 28, 2025
* Saves the ML instances to the local file system path.
*/
@throws[IOException]("If the input path already exists but overwrite is not enabled.")
private[spark] def saveToLocal(path: String): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of introduce a new saveToLocal method, what about adding new class MLLocalWriter extends MLWriter?

Then based on the writer class, we can know whether an algorithm support local read/write

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is changing the base class might influence compatibility,
currently saveToLocal and loadFromLocal are only developing APIs and are only used in Spark connect server for ML cache management.

So that I tend to keep current code.
once we decide to make them as public APIs, we can consider to make a better API / Base class design.
thoughts ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG

@WeichenXu123
Copy link
Contributor Author

merged to master

@LuciferYang
Copy link
Contributor

It appears that after merging this pr, it caused test failures for org.apache.spark.sql.connect.ml.MLSuite in the connect module.

Here's how I conducted the local inspection:

git reset --hard 6f9bf73c345d70c3d27ea2e1ebadaa03a275fb3c // this one 
build/sbt clean "connect/testOnly org.apache.spark.sql.connect.ml.MLSuite"
[info] - LogisticRegression works *** FAILED *** (8 seconds, 2 milliseconds)
[info]   org.apache.spark.SparkRuntimeException: [EXPRESSION_DECODING_FAILED] Failed to decode a row to a value of the expressions: newInstance(class org.apache.spark.ml.classification.LogisticRegressionModel$Data). SQLSTATE: 42846
[info]   at org.apache.spark.sql.errors.QueryExecutionErrors$.expressionDecodingError(QueryExecutionErrors.scala:1364)
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:95)
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:80)
[info]   at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
[info]   at org.apache.spark.sql.classic.Dataset.collectFromPlan(Dataset.scala:2244)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$head$1(Dataset.scala:1381)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$withAction$2(Dataset.scala:2234)
[info]   at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:642)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$withAction$1(Dataset.scala:2232)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$8(SQLExecution.scala:162)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withSessionTagsApplied(SQLExecution.scala:268)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$7(SQLExecution.scala:124)
[info]   at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:106)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:124)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:291)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:123)
[info]   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:77)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:233)
[info]   at org.apache.spark.sql.classic.Dataset.withAction(Dataset.scala:2232)
[info]   at org.apache.spark.sql.classic.Dataset.head(Dataset.scala:1381)
[info]   at org.apache.spark.sql.Dataset.head(Dataset.scala:2683)
[info]   at org.apache.spark.ml.util.ReadWriteUtils$.loadObject(ReadWrite.scala:881)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1375)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1350)
[info]   at org.apache.spark.ml.util.MLReadable.load(ReadWrite.scala:385)
[info]   at org.apache.spark.ml.util.MLReadable.load$(ReadWrite.scala:385)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$.load(LogisticRegression.scala:1332)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel.load(LogisticRegression.scala)
...
[info]   Cause: java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 35, Column 8: Failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 35, Column 8: Private member cannot be accessed from type "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection".
[info]   at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:604)
[info]   at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:559)
[info]   at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:114)
[info]   at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:247)
[info]   at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2349)
[info]   at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2317)
[info]   at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2190)
[info]   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2080)
[info]   at com.google.common.cache.LocalCache.get(LocalCache.java:4017)
[info]   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4040)
[info]   at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4989)
[info]   at org.apache.spark.util.NonFateSharingLoadingCache.$anonfun$get$2(NonFateSharingCache.scala:108)
[info]   at org.apache.spark.util.KeyLock.withLock(KeyLock.scala:64)
[info]   at org.apache.spark.util.NonFateSharingLoadingCache.get(NonFateSharingCache.scala:108)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1490)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:205)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:39)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1415)
[info]   at org.apache.spark.sql.catalyst.expressions.SafeProjection$.createCodeGeneratedObject(Projection.scala:172)
[info]   at org.apache.spark.sql.catalyst.expressions.SafeProjection$.createCodeGeneratedObject(Projection.scala:169)
[info]   at org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:45)
[info]   at org.apache.spark.sql.catalyst.expressions.SafeProjection$.create(Projection.scala:195)
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:87)
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:80)
[info]   at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
[info]   at org.apache.spark.sql.classic.Dataset.collectFromPlan(Dataset.scala:2244)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$head$1(Dataset.scala:1381)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$withAction$2(Dataset.scala:2234)
[info]   at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:642)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$withAction$1(Dataset.scala:2232)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$8(SQLExecution.scala:162)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withSessionTagsApplied(SQLExecution.scala:268)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$7(SQLExecution.scala:124)
[info]   at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:106)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:124)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:291)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:123)
[info]   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:77)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:233)
[info]   at org.apache.spark.sql.classic.Dataset.withAction(Dataset.scala:2232)
[info]   at org.apache.spark.sql.classic.Dataset.head(Dataset.scala:1381)
[info]   at org.apache.spark.sql.Dataset.head(Dataset.scala:2683)
[info]   at org.apache.spark.ml.util.ReadWriteUtils$.loadObject(ReadWrite.scala:881)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1375)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1350)
[info]   at org.apache.spark.ml.util.MLReadable.load(ReadWrite.scala:385)
[info]   at org.apache.spark.ml.util.MLReadable.load$(ReadWrite.scala:385)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$.load(LogisticRegression.scala:1332)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel.load(LogisticRegression.scala)
[info]   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[info]   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
[info]   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info]   at java.base/java.lang.reflect.Method.invoke(Method.java:569)
[info]   at org.apache.spark.sql.connect.ml.MLUtils$.loadOperator(MLUtils.scala:422)
[info]   at org.apache.spark.sql.connect.ml.MLUtils$.loadTransformer(MLUtils.scala:447)
[info]   at org.apache.spark.sql.connect.ml.MLHandler$.handleMlCommand(MLHandler.scala:262)
[info]   at org.apache.spark.sql.connect.ml.MLHelper.readWrite(MLHelper.scala:227)
[info]   at org.apache.spark.sql.connect.ml.MLHelper.readWrite$(MLHelper.scala:196)
[info]   at org.apache.spark.sql.connect.ml.MLSuite.readWrite(MLSuite.scala:69)
[info]   at org.apache.spark.sql.connect.ml.MLSuite.$anonfun$new$2(MLSuite.scala:236)
...
[info]   Cause: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 35, Column 8: Failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 35, Column 8: Private member cannot be accessed from type "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection".
[info]   at org.apache.spark.sql.errors.QueryExecutionErrors$.compilerError(QueryExecutionErrors.scala:688)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.doCompile(CodeGenerator.scala:1557)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.$anonfun$cache$1(CodeGenerator.scala:1636)
[info]   at org.apache.spark.util.NonFateSharingCache$$anon$1.load(NonFateSharingCache.scala:68)
[info]   at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3574)
[info]   at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2316)
[info]   at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2190)
[info]   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2080)
[info]   at com.google.common.cache.LocalCache.get(LocalCache.java:4017)
[info]   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4040)
[info]   at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4989)
[info]   at org.apache.spark.util.NonFateSharingLoadingCache.$anonfun$get$2(NonFateSharingCache.scala:108)
[info]   at org.apache.spark.util.KeyLock.withLock(KeyLock.scala:64)
[info]   at org.apache.spark.util.NonFateSharingLoadingCache.get(NonFateSharingCache.scala:108)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1490)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:205)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection$.create(GenerateSafeProjection.scala:39)
[info]   at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1415)
[info]   at org.apache.spark.sql.catalyst.expressions.SafeProjection$.createCodeGeneratedObject(Projection.scala:172)
[info]   at org.apache.spark.sql.catalyst.expressions.SafeProjection$.createCodeGeneratedObject(Projection.scala:169)
[info]   at org.apache.spark.sql.catalyst.expressions.CodeGeneratorWithInterpretedFallback.createObject(CodeGeneratorWithInterpretedFallback.scala:45)
[info]   at org.apache.spark.sql.catalyst.expressions.SafeProjection$.create(Projection.scala:195)
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:87)
[info]   at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:80)
[info]   at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:936)
[info]   at org.apache.spark.sql.classic.Dataset.collectFromPlan(Dataset.scala:2244)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$head$1(Dataset.scala:1381)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$withAction$2(Dataset.scala:2234)
[info]   at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:642)
[info]   at org.apache.spark.sql.classic.Dataset.$anonfun$withAction$1(Dataset.scala:2232)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$8(SQLExecution.scala:162)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withSessionTagsApplied(SQLExecution.scala:268)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$7(SQLExecution.scala:124)
[info]   at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:94)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:112)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:106)
[info]   at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:111)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$6(SQLExecution.scala:124)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:291)
[info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:123)
[info]   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:804)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:77)
[info]   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:233)
[info]   at org.apache.spark.sql.classic.Dataset.withAction(Dataset.scala:2232)
[info]   at org.apache.spark.sql.classic.Dataset.head(Dataset.scala:1381)
[info]   at org.apache.spark.sql.Dataset.head(Dataset.scala:2683)
[info]   at org.apache.spark.ml.util.ReadWriteUtils$.loadObject(ReadWrite.scala:881)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1375)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1350)
[info]   at org.apache.spark.ml.util.MLReadable.load(ReadWrite.scala:385)
[info]   at org.apache.spark.ml.util.MLReadable.load$(ReadWrite.scala:385)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel$.load(LogisticRegression.scala:1332)
[info]   at org.apache.spark.ml.classification.LogisticRegressionModel.load(LogisticRegression.scala)
[info]   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[info]   at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
[info]   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info]   at java.base/java.lang.reflect.Method.invoke(Method.java:569)
[info]   at org.apache.spark.sql.connect.ml.MLUtils$.loadOperator(MLUtils.scala:422)
[info]   at org.apache.spark.sql.connect.ml.MLUtils$.loadTransformer(MLUtils.scala:447)
[info]   at org.apache.spark.sql.connect.ml.MLHandler$.handleMlCommand(MLHandler.scala:262)
[info]   at org.apache.spark.sql.connect.ml.MLHelper.readWrite(MLHelper.scala:227)
[info]   at org.apache.spark.sql.connect.ml.MLHelper.readWrite$(MLHelper.scala:196)
[info]   at org.apache.spark.sql.connect.ml.MLSuite.readWrite(MLSuite.scala:69)
[info]   at org.apache.spark.sql.connect.ml.MLSuite.$anonfun$new$2(MLSuite.scala:236)
...
git reset --hard 86bf4c84805e89354d139ab72b298d3d4155fd0d // before this one 
build/sbt clean "connect/testOnly org.apache.spark.sql.connect.ml.MLSuite"
[info] MLSuite:
[info] - reconcileParam (141 milliseconds)
[info] - LogisticRegression works (5 seconds, 808 milliseconds)
[info] - Exception: Unsupported ML operator (15 milliseconds)
[info] - Exception: cannot retrieve object (246 milliseconds)
[info] - access the attribute which is not in allowed list (205 milliseconds)
[info] - Model must be registered into ServiceLoader when loading (1 millisecond)
[info] - RegressionEvaluator works (164 milliseconds)
[info] - VectorAssembler works (178 milliseconds)
[info] - Memory limitation of MLCache works (951 milliseconds)
[info] Run completed in 10 seconds, 668 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

The reason why this issue wasn't detected during the GitHub Actions of this pr is that the changes in the mllib module do not trigger the tests for the connect module now.

@LuciferYang
Copy link
Contributor

I will revert this one first to restore the GitHub Actions workflow for the master. @WeichenXu123 @zhengruifeng

@LuciferYang
Copy link
Contributor

Reopen this pr and we can re-merge it after fixing the above issue

@LuciferYang LuciferYang reopened this Apr 29, 2025
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
@WeichenXu123
Copy link
Contributor Author

@LuciferYang I updated my PR. it should fix the issue .

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending tests

@LuciferYang
Copy link
Contributor

Merged into master. Thanks @WeichenXu123 and @zhengruifeng

@Since("1.6.0")
object LogisticRegressionModel extends MLReadable[LogisticRegressionModel] {
case class Data(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why it fails MLSuite, but we should not make it public.
It seems private[spark] can work. #50760

ericm-db pushed a commit to ericm-db/spark that referenced this pull request May 5, 2025
…inst local filesystem path

### What changes were proposed in this pull request?

Make scala model supporting save / load methods (deverloper api) against local filesystem path.

### Why are the changes needed?

This is required by Spark Connect server model cache management.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50665 from WeichenXu123/ml-save-to-local.

Authored-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request May 5, 2025
…inst local filesystem path

### What changes were proposed in this pull request?

Make scala model supporting save / load methods (deverloper api) against local filesystem path.

### Why are the changes needed?

This is required by Spark Connect server model cache management.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50665 from WeichenXu123/ml-save-to-local.

Authored-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
Kimahriman pushed a commit to Kimahriman/spark that referenced this pull request May 13, 2025
…inst local filesystem path

### What changes were proposed in this pull request?

Make scala model supporting save / load methods (deverloper api) against local filesystem path.

### Why are the changes needed?

This is required by Spark Connect server model cache management.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50665 from WeichenXu123/ml-save-to-local.

Authored-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Kimahriman pushed a commit to Kimahriman/spark that referenced this pull request May 13, 2025
…inst local filesystem path

### What changes were proposed in this pull request?

Make scala model supporting save / load methods (deverloper api) against local filesystem path.

### Why are the changes needed?

This is required by Spark Connect server model cache management.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50665 from WeichenXu123/ml-save-to-local.

Authored-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants