Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Updating retrieve online documents v2 to work for other fields for sq… #5082

Merged
merged 4 commits into from
Feb 26, 2025

Conversation

franciscojavierarceo
Copy link
Member

@franciscojavierarceo franciscojavierarceo commented Feb 22, 2025

What this PR does / why we need it:

This PR enables full text search for the retrieve_online_documents/ endpoint for SQLite Vec. It also establishes a new parameter in the SDK method called query_string that can be passed to use key word search. There are a number of limitations with this approach as the top_k parameter can be misleading (as evident by the example). This offers a good start for keyword search that leverages the existing vector retrieval endpoint. As a next step, enabling hybrid search would be beneficial.

It makes keyword search as simple as:

results = store.retrieve_online_documents_v2(
    features=[
        "document_embeddings:Embeddings",
        "document_embeddings:content",
        "document_embeddings:title",
    ],
    query=query_embedding,
    query_string="(content: 5) OR (title: 1) OR (title: 3)",
    top_k=3,
).to_dict()
print(results)
  • feature_server.py:
    • Added optional query_string parameter to the GetOnlineFeaturesRequest class.
    • Updated retrieve_online_documents to support the query_string parameter.
  • feature_store.py:
    • Added optional query_string parameter to retrieve_online_documents_v2.
    • Updated related methods to handle query_string.
  • feature_view.py:
    • Added an assertion to ensure only one vector feature per feature view.
  • milvus.py:
    • Added optional query_string parameter to retrieve_online_documents_v2.
  • online_store.py:
    • Added optional query_string parameter to retrieve_online_documents_v2.
  • sqlite.py:
    • Extensive changes to support text search with BM25, including adding text_search_enabled configuration and handling query_string.
    • Updated SQL operations to support the new functionalities.
  • passthrough_provider.py and provider.py:
    • Updated retrieve_online_documents_v2 to support the query_string.
  • types.py:
    • Added FEAST_VECTOR_TYPES list for handling vector types.
  • example_feature_repo_1.py:
    • Added content and title fields to an example feature view.

Which issue(s) this PR fixes:

#5081
#5073

Misc

@franciscojavierarceo franciscojavierarceo changed the title Updating retrieve online documents v2 to work for other fields for sq… feat: Updating retrieve online documents v2 to work for other fields for sq… Feb 22, 2025
@franciscojavierarceo
Copy link
Member Author

@HaoXuAI

…lite

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
@@ -196,6 +196,17 @@ def __str__(self):
UnixTimestamp: pyarrow.timestamp("us", tz=_utc_now().tzname()),
}

FEAST_VECTOR_TYPES: List[Union[ValueType, PrimitiveFeastType, ComplexFeastType]] = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if this is used somewhere? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vector_bin = serialize_f32(
val.float_list_val.val, config.online_store.vector_len
) # type: ignore
if feature_type_dict[feature_name] in FEAST_VECTOR_TYPES:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HaoXuAI see here!

@franciscojavierarceo franciscojavierarceo marked this pull request as ready for review February 26, 2025 19:05
@franciscojavierarceo franciscojavierarceo requested review from a team as code owners February 26, 2025 19:05
@franciscojavierarceo franciscojavierarceo merged commit fc121c3 into master Feb 26, 2025
22 of 23 checks passed
@franciscojavierarceo
Copy link
Member Author

Technically there's a flaw here I need to resolve because you have to pass in a query embedding for just pure text search, which is silly.

franciscojavierarceo pushed a commit that referenced this pull request Mar 10, 2025
# [0.47.0](v0.46.0...v0.47.0) (2025-03-10)

* feat!: Include PUBLIC_URL in defaultProjectListPromise URL in /ui ([2f0f7b3](2f0f7b3))

### Bug Fixes

* Add transformation_service_endpoit to support Go feature server. ([#5071](#5071)) ([5627d7c](5627d7c))
* Adding extra space on the VM to kind cluster to see if this solves the issue with memory not available with operator e2e tests. ([#5102](#5102)) ([e6e928c](e6e928c))
* Allow unencrypted Snowflake key ([#5097](#5097)) ([87a7c23](87a7c23))
* Cant add different type of list types ([#5118](#5118)) ([bebd7be](bebd7be))
* Fixing transformations on writes ([#5127](#5127)) ([95ac34a](95ac34a))
* Identify s3/remote uri path correctly ([#5076](#5076)) ([93becff](93becff))
* Increase available action VM storage and reduce dev feature-server image size ([#5112](#5112)) ([75f5a90](75f5a90))
* Move Feast to pyproject.toml instead of setup.py ([#5067](#5067)) ([4231274](4231274))
* Skip refresh if already in progress or if lock is already held ([#5068](#5068)) ([f3a24de](f3a24de))

### Features

* Add an OOTB Chat uI to the Feature Server to support RAG demo ([#5106](#5106)) ([40ea7a9](40ea7a9))
* Add Couchbase Columnar as an Offline Store ([#5025](#5025)) ([4373cbf](4373cbf))
* Add Feast Operator RBAC example with Kubernetes Authentication … ([#5077](#5077)) ([2179fbe](2179fbe))
* Added docling and pytorch as add on ([#5089](#5089)) ([135342b](135342b))
* Feast Operator example with Postgres in TLS mode. ([#5028](#5028)) ([2c46f6a](2c46f6a))
* Operator - Add feastProjectDir section to CR with git & init options ([#5079](#5079)) ([d64f01e](d64f01e))
* Override the udf name when provided as input to an on demand transformation ([#5094](#5094)) ([8a714bb](8a714bb))
* Set value_type of entity directly in from_proto ([#5092](#5092)) ([90e7498](90e7498))
* Updating retrieve online documents v2 to work for other fields for sq… ([#5082](#5082)) ([fc121c3](fc121c3))

### BREAKING CHANGES

* The PUBLIC_URL environment variable is now taken into account by default
when fetching the projects list. This is a breaking change only if all
these points apply:

1. You're using Feast UI as a module

2. You're serving the UI files from a non-root path via the PUBLIC_URL
   environment variable

3. You're serving the project list from the root path

4. You're not passing the `feastUIConfigs.projectListPromise` prop to
   the FeastUI component

In this case, you need to explicitly fetch the project list from the
root path via the `feastUIConfigs.projectListPromise` prop:

```diff
 const root = createRoot(document.getElementById("root")!);
 root.render(
   <React.StrictMode>
-    <FeastUI />
+    <FeastUI
+      feastUIConfigs={{
+        projectListPromise: fetch("/projects-list.json", {
+            headers: {
+              "Content-Type": "application/json",
+            },
+          }).then((res) => res.json())
+      }}
+    />
   </React.StrictMode>
 );
```

Signed-off-by: Harri Lehtola <peruukki@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants