Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Invalid column names in get_historical_features when there are field mappings on join keys #4886

Conversation

aloysius-lim
Copy link
Contributor

@aloysius-lim aloysius-lim commented Jan 3, 2025

What this PR does / why we need it:

When there are field mappings on an entity's join keys, get_historical_features() fails because it does not use the original column name in the data source.

This is needed when the source data cannot be modified, and a field mapping is used to map the source data's column to the join key.

Example:

from feast import Entity, Field, FeatureStore, FeatureView
from feast.types import Float32, String
from feast.infra.offline_stores.contrib.spark_offline_store.spark_source import SparkSource

# Initialize Feature Store.
store = FeatureStore(...)

# "driver_id" is used in the Feature Store.
driver = Entity(name="driver", join_keys=["driver_id"])

# Using SparkSource as an example, but this applies to other sources.
# Source data contains a primary key called "id". This is mapped to the join key "driver_id".
driver_stats_src = SparkSource(
    name="driver_stats",
    field_mapping={"id": "driver_id"},
    path=...,
    file_format=...,
)
driver_stats_fv = FeatureView(
    name="driver_stats",
    source= driver_stats_src,
    entities=[driver],
    schema=[
        # join key must be specified in the schema, else it is not included in driver_stats_fv.entity_columns
        Field(name="driver_id", dtype=String),
        Field(name="stat1", dtype=Float32),
        Field(name="stat2", dtype=Float32),
    ]
)

# Get historical features
store.get_historical_features(
    entity_df=...,
    features=[
        "driver_stats:stat1",
        "driver_stats:stat2",
    ]

Without this fix (Spark example):

pyspark.errors.exceptions.captured.AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `driver_id` cannot be resolved. Did you mean one of the following? [`id`, `stat1`, `stat2`]

Underlying Spark query:

driver_stats__subquery AS (
    SELECT
        event_timestamp as event_timestamp,
        created as created_timestamp,

        -- Here is the problem.
        driver_id AS driver_id,

        stat1 as stat1, stat2 as stat2
    FROM `feast_entity_df_677a1a6fd13443c6b0e8ccc059b25f01`
    WHERE event_timestamp <= '2025-01-05T14:00:00'
)

With this fix, the field mapping is respected and the query becomes:

driver_stats__subquery AS (
    SELECT
        event_timestamp as event_timestamp,
        created as created_timestamp,

        id AS driver_id,
        
        stat1 as stat1, stat2 as stat2
    FROM `feast_entity_df_677a1a6fd13443c6b0e8ccc059b25f01`
    WHERE event_timestamp <= '2025-01-05T14:00:00'
)

Which issue(s) this PR fixes:

#4889

Misc

This only works if the entity join key is specified in the schema of the FeatureView (see above).

@aloysius-lim aloysius-lim requested review from a team as code owners January 3, 2025 07:24
@aloysius-lim aloysius-lim force-pushed the fix-get-historical-features-join-key-field-mapping branch from 9c6ca6e to b0a70ac Compare January 3, 2025 07:26
@dmartinol
Copy link
Contributor

Thanks for your contribution!
I’m wondering why we don’t have an issue to track this error: if you can’t find an existing issue, could you please open one before?

@aloysius-lim
Copy link
Contributor Author

Thanks for your contribution! I’m wondering why we don’t have an issue to track this error: if you can’t find an existing issue, could you please open one before?

I added issue #4889 for this.

@franciscojavierarceo
Copy link
Member

Do you mind rebasing? I fixed an issue so after you rebase it should go away.

@aloysius-lim aloysius-lim force-pushed the fix-get-historical-features-join-key-field-mapping branch from b0a70ac to 84c355f Compare January 3, 2025 23:16
@aloysius-lim
Copy link
Contributor Author

I've rebased, but the tests are still failing due to other errors with Keycloak.

@franciscojavierarceo
Copy link
Member

Can you rebase again? Sorry I reverted my most recent change. 😅

Signed-off-by: Aloysius Lim <aloysius.lim@cuezen.com>
Signed-off-by: Aloysius Lim <aloysius.lim@cuezen.com>
Signed-off-by: Aloysius Lim <aloysius.lim@cuezen.com>
@aloysius-lim aloysius-lim force-pushed the fix-get-historical-features-join-key-field-mapping branch from 84c355f to 9b60d43 Compare January 10, 2025 14:56
Copy link
Member

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@franciscojavierarceo franciscojavierarceo merged commit c9aca2d into feast-dev:master Feb 3, 2025
22 checks passed
franciscojavierarceo pushed a commit that referenced this pull request Feb 4, 2025
# [0.44.0](v0.43.0...v0.44.0) (2025-02-04)

### Bug Fixes

* Adding periodic check to fix the sporadic failures of the operator e2e tests.  ([#4952](#4952)) ([1d086be](1d086be))
* Adding the feast-operator/bin to the .gitignore directory. Somehow it… ([#5005](#5005)) ([1a027ee](1a027ee))
* Changed Env Vars for e2e tests ([#4975](#4975)) ([fa0084f](fa0084f))
* Fix GitHub Actions to pass authentication ([#4963](#4963)) ([22b9138](22b9138)), closes [#4937](#4937) [#4939](#4939) [#4941](#4941) [#4940](#4940) [#4943](#4943) [#4944](#4944) [#4945](#4945) [#4946](#4946) [#4947](#4947) [#4948](#4948) [#4951](#4951) [#4954](#4954) [#4957](#4957) [#4958](#4958) [#4959](#4959) [#4960](#4960) [#4962](#4962)
* Fix showing selected navigation item in UI sidebar ([#4969](#4969)) ([8ac6a85](8ac6a85))
* Invalid column names in get_historical_features when there are field mappings on join keys ([#4886](#4886)) ([c9aca2d](c9aca2d))
* Read project data from the 'projects' key while loading the registry state in the Feast UI ([#4772](#4772)) ([cb81939](cb81939))
* Remove grpcurl dependency from Operator ([#4972](#4972)) ([439e0b9](439e0b9))
* Removed the dry-run flag to test and we will add it back later. ([#5007](#5007)) ([d112b52](d112b52))
* Render UI navigation items as links instead of buttons ([#4970](#4970)) ([1267703](1267703))
* Resolve Operator CRD bloat due to long field descriptions ([#4985](#4985)) ([7593bb3](7593bb3))
* Update manifest to add feature server image for odh ([#4973](#4973)) ([6a1c102](6a1c102))
* Updating release workflows to refer to yml instead of yaml ([#4935](#4935)) ([02b0a68](02b0a68))
* Use locally built feast-ui package in dev feature-server image ([#4998](#4998)) ([0145e55](0145e55))

### Features

* Added OWNERS file for OpenshiftCI ([#4991](#4991)) ([86a2ee8](86a2ee8))
* Adding Milvus demo to examples ([#4910](#4910)) ([2daf852](2daf852))
* Adding retrieve_online_documents endpoint ([#5002](#5002)) ([6607d3d](6607d3d))
* Adding support to return additional features from vector retrieval for Milvus db ([#4971](#4971)) ([6ce08d3](6ce08d3))
* Creating/updating the stable branch after the release. ([#5003](#5003)) ([e9b53cc](e9b53cc))
* Implementing online_read for MilvusOnlineStore ([#4996](#4996)) ([92dde13](92dde13))
* Improve exception message for unsupported Snowflake data types ([#4779](#4779)) ([5992364](5992364))
* Operator add feast ui deployment ([#4930](#4930)) ([b026d0c](b026d0c))
* Updating documents to highlight v2 api for Vector Similarity Se… ([#5000](#5000)) ([32b82a4](32b82a4))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants