Skip to content

Commit a571e08

Browse files
authored
feat: Instrument Feast using Prometheus and OpenTelemetry (#4366)
feat: instrument feature store This commit adds opentelemetry to monitor Feast Signed-off-by: Twinkll Sisodia <tsisodia@redhat.com>
1 parent 8eceff2 commit a571e08

26 files changed

+928
-75
lines changed

README.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@
1717
[![GitHub Release](https://img.shields.io/github/v/release/feast-dev/feast.svg?style=flat&sort=semver&color=blue)](https://github.com/feast-dev/feast/releases)
1818

1919
## Join us on Slack!
20-
2120
👋👋👋 [Come say hi on Slack!](https://join.slack.com/t/feastopensource/signup)
2221

2322
## Overview
@@ -231,4 +230,4 @@ Thanks goes to these incredible people:
231230

232231
<a href="https://github.com/feast-dev/feast/graphs/contributors">
233232
<img src="https://contrib.rocks/image?repo=feast-dev/feast" />
234-
</a>
233+
</a>

infra/charts/feast-feature-server/README.md

+4
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,12 @@ See [here](https://github.com/feast-dev/feast/tree/master/examples/python-helm-d
4444
| imagePullSecrets | list | `[]` | |
4545
| livenessProbe.initialDelaySeconds | int | `30` | |
4646
| livenessProbe.periodSeconds | int | `30` | |
47+
| metrics.enabled | bool | `false` | |
48+
| metrics.otelCollector.endpoint | string | `""` | |
49+
| metrics.otelCollector.port | int | `4317` | |
4750
| nameOverride | string | `""` | |
4851
| nodeSelector | object | `{}` | |
52+
| otel_service.name | string | `"otelcol"` | |
4953
| podAnnotations | object | `{}` | |
5054
| podSecurityContext | object | `{}` | |
5155
| readinessProbe.initialDelaySeconds | int | `20` | |
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
## Adding Monitoring
2+
To add monitoring to the Feast Feature Server, follow these steps:
3+
4+
### Workflow
5+
6+
Feast instrumentation Using OpenTelemetry and Prometheus -
7+
![Workflow](samples/workflow.png)
8+
9+
### Deploy Prometheus Operator
10+
Follow the Prometheus Operator documentation to install the operator -
11+
https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md
12+
13+
### Deploy OpenTelemetry Operator
14+
Before installing OTEL Operator, install `cert-manager` and validate the `pods` should spin up --
15+
```
16+
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
17+
```
18+
19+
Follow the documentation for further installation steps -
20+
https://github.com/open-telemetry/opentelemetry-operator
21+
22+
### Configure OpenTelemetry Collector
23+
Add the OpenTelemetry Collector configuration under the metrics section in your values.yaml file.
24+
25+
Example values.yaml:
26+
27+
```
28+
metrics:
29+
enabled: true
30+
otelCollector:
31+
endpoint: "otel-collector.default.svc.cluster.local:4317" #sample
32+
headers:
33+
api-key: "your-api-key"
34+
```
35+
36+
### Add instrumentation annotation and environment variables in the deployment.yaml
37+
38+
```
39+
template:
40+
metadata:
41+
{{- with .Values.podAnnotations }}
42+
annotations:
43+
{{- toYaml . | nindent 8 }}
44+
instrumentation.opentelemetry.io/inject-python: "true"
45+
```
46+
47+
```
48+
- name: OTEL_EXPORTER_OTLP_ENDPOINT
49+
value: http://{{ .Values.service.name }}-collector.{{ .Release.namespace }}.svc.cluster.local:{{ .Values.metrics.endpoint.port}}
50+
- name: OTEL_EXPORTER_OTLP_INSECURE
51+
value: "true"
52+
```
53+
54+
### Add checks
55+
Add metric checks to all manifests and deployment file -
56+
57+
```
58+
{{ if .Values.metrics.enabled }}
59+
apiVersion: opentelemetry.io/v1alpha1
60+
kind: Instrumentation
61+
metadata:
62+
name: feast-instrumentation
63+
spec:
64+
exporter:
65+
endpoint: http://{{ .Values.service.name }}-collector.{{ .Release.Namespace }}.svc.cluster.local:4318 # This is the default port for the OpenTelemetry Collector
66+
env:
67+
propagators:
68+
- tracecontext
69+
- baggage
70+
python:
71+
env:
72+
- name: OTEL_METRICS_EXPORTER
73+
value: console,otlp_proto_http
74+
- name: OTEL_LOGS_EXPORTER
75+
value: otlp_proto_http
76+
- name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
77+
value: "true"
78+
{{end}}
79+
```
80+
81+
### Add manifests to the chart
82+
Add Instrumentation, OpenTelemetryCollector, ServiceMonitors, Prometheus Instance and RBAC rules as provided in the [samples/](https://github.com/feast-dev/feast/tree/91540703c483f1cd03b534a1a45bc4ccdcf79f81/infra/charts/feast-feature-server/samples) directory.
83+
84+
For latest updates please refer the official repository - https://github.com/open-telemetry/opentelemetry-operator
85+
86+
### Deploy Feast
87+
Deploy Feast and set `metrics` value to `true`.
88+
89+
Example -
90+
```
91+
helm install feast-release infra/charts/feast-feature-server --set metric=true --set feature_store_yaml_base64=""
92+
```
93+
94+
## See logs
95+
Once the opentelemetry is deployed, you can search the logs to see the required metrics -
96+
97+
```
98+
oc logs otelcol-collector-0 | grep "Name: feast_feature_server_memory_usage\|Value: 0.*"
99+
oc logs otelcol-collector-0 | grep "Name: feast_feature_server_cpu_usage\|Value: 0.*"
100+
```
101+
```
102+
-> Name: feast_feature_server_memory_usage
103+
Value: 0.579426
104+
```
105+
```
106+
-> Name: feast_feature_server_cpu_usage
107+
Value: 0.000000
108+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
apiVersion: opentelemetry.io/v1alpha1
2+
kind: Instrumentation
3+
metadata:
4+
name: feast-instrumentation
5+
spec:
6+
exporter:
7+
endpoint: <endpoint> # eg: http://{{ .Values.service.name }}-collector.{{ .Release.Namespace }}.svc.cluster.local:4318
8+
env:
9+
propagators:
10+
- tracecontext
11+
- baggage
12+
python:
13+
env:
14+
- name: OTEL_METRICS_EXPORTER
15+
value: console,otlp_proto_http
16+
- name: OTEL_LOGS_EXPORTER
17+
value: otlp_proto_http
18+
- name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
19+
value: "true"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# API reference https://github.com/open-telemetry/opentelemetry-operator/blob/main/docs/api.md
2+
# Refs for v1beta1 config: https://github.com/open-telemetry/opentelemetry-operator/issues/3011#issuecomment-2154118998
3+
apiVersion: opentelemetry.io/v1beta1
4+
kind: OpenTelemetryCollector
5+
metadata:
6+
name: otelcol
7+
spec:
8+
mode: statefulset
9+
image: otel/opentelemetry-collector-contrib:0.102.1
10+
targetAllocator:
11+
enabled: true
12+
serviceAccount: opentelemetry-targetallocator-sa
13+
prometheusCR:
14+
enabled: true
15+
podMonitorSelector: {}
16+
serviceMonitorSelector: {}
17+
## If uncommented, only service monitors with this label will get picked up
18+
# app: feast
19+
config:
20+
receivers:
21+
otlp:
22+
protocols:
23+
grpc: {}
24+
http: {}
25+
prometheus:
26+
config:
27+
scrape_configs:
28+
- job_name: 'otelcol-collector'
29+
scrape_interval: 10s
30+
static_configs:
31+
- targets: [ '0.0.0.0:8888' ]
32+
33+
processors:
34+
batch: {}
35+
36+
exporters:
37+
logging:
38+
verbosity: detailed
39+
40+
service:
41+
pipelines:
42+
traces:
43+
receivers: [otlp]
44+
processors: [batch]
45+
exporters: [logging]
46+
metrics:
47+
receivers: [otlp, prometheus]
48+
processors: []
49+
exporters: [logging]
50+
logs:
51+
receivers: [otlp]
52+
processors: [batch]
53+
exporters: [logging]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
apiVersion: monitoring.coreos.com/v1
2+
kind: ServiceMonitor
3+
metadata:
4+
labels:
5+
app: feast
6+
name: otel-sm-1
7+
spec:
8+
endpoints:
9+
- port: metrics
10+
namespaceSelector:
11+
matchNames:
12+
- <namespace> # helm value - {{ .Release.Namespace }}
13+
selector:
14+
matchLabels:
15+
app.kubernetes.io/component: opentelemetry-collector
16+
app.kubernetes.io/managed-by: opentelemetry-operator
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
kind: Prometheus
2+
metadata:
3+
name: prometheus
4+
spec:
5+
evaluationInterval: 30s
6+
podMonitorSelector:
7+
matchLabels:
8+
app: feast
9+
portName: web
10+
replicas: 1
11+
scrapeInterval: 30s
12+
serviceAccountName: prometheus-k8s
13+
serviceMonitorSelector:
14+
matchLabels:
15+
app: feast
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
apiVersion: v1
2+
kind: ServiceAccount
3+
metadata:
4+
name: opentelemetry-targetallocator-sa
5+
---
6+
apiVersion: rbac.authorization.k8s.io/v1
7+
kind: ClusterRole
8+
metadata:
9+
name: opentelemetry-targetallocator-role-1
10+
annotations:
11+
meta.helm.sh/release-name: "feast-release"
12+
meta.helm.sh/release-namespace: "feast-val"
13+
labels:
14+
app.kubernetes.io/managed-by: "Helm"
15+
rules:
16+
- apiGroups:
17+
- monitoring.coreos.com
18+
resources:
19+
- servicemonitors
20+
- podmonitors
21+
verbs:
22+
- '*'
23+
- apiGroups: [""]
24+
resources:
25+
- namespaces
26+
verbs: ["get", "list", "watch"]
27+
- apiGroups: [""]
28+
resources:
29+
- nodes
30+
- nodes/metrics
31+
- services
32+
- endpoints
33+
- pods
34+
verbs: ["get", "list", "watch"]
35+
- apiGroups: [""]
36+
resources:
37+
- configmaps
38+
verbs: ["get"]
39+
- apiGroups:
40+
- discovery.k8s.io
41+
resources:
42+
- endpointslices
43+
verbs: ["get", "list", "watch"]
44+
- apiGroups:
45+
- networking.k8s.io
46+
resources:
47+
- ingresses
48+
verbs: ["get", "list", "watch"]
49+
- nonResourceURLs: ["/metrics"]
50+
verbs: ["get"]
51+
---
52+
apiVersion: rbac.authorization.k8s.io/v1
53+
kind: ClusterRoleBinding
54+
metadata:
55+
name: opentelemetry-targetallocator-rb-1
56+
annotations:
57+
meta.helm.sh/release-name: "feast-release"
58+
meta.helm.sh/release-namespace: "feast-val"
59+
labels:
60+
app.kubernetes.io/managed-by: "Helm"
61+
subjects:
62+
- kind: ServiceAccount
63+
name: opentelemetry-targetallocator-sa
64+
namespace: <namespace> # helm value - {{ .Release.Namespace }}
65+
roleRef:
66+
kind: ClusterRole
67+
name: opentelemetry-targetallocator-role-1
68+
apiGroup: rbac.authorization.k8s.io
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
apiVersion: monitoring.coreos.com/v1
2+
kind: ServiceMonitor
3+
metadata:
4+
labels:
5+
app: feast
6+
name: otel-sm
7+
spec:
8+
endpoints:
9+
- port: metrics
10+
namespaceSelector:
11+
matchNames:
12+
- <namespace> # helm value - {{ .Release.Namespace }}
13+
selector:
14+
matchLabels:
15+
app.kubernetes.io/component: opentelemetry-collector
16+
app.kubernetes.io/managed-by: opentelemetry-operator
Loading

infra/charts/feast-feature-server/templates/deployment.yaml

+12-1
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,9 @@ spec:
1414
{{- with .Values.podAnnotations }}
1515
annotations:
1616
{{- toYaml . | nindent 8 }}
17+
{{- if .Values.metrics.enabled }}
18+
instrumentation.opentelemetry.io/inject-python: "true"
19+
{{- end }}
1720
{{- end }}
1821
labels:
1922
{{- include "feast-feature-server.selectorLabels" . | nindent 8 }}
@@ -48,10 +51,18 @@ spec:
4851
- "feast"
4952
- "serve_registry"
5053
{{- else }}
54+
{{- if .Values.metrics.enlabled }}
5155
- "feast"
5256
- "serve"
57+
- "--metrics"
5358
- "-h"
5459
- "0.0.0.0"
60+
{{- else }}
61+
- "feast"
62+
- "serve"
63+
- "-h"
64+
- "0.0.0.0"
65+
{{- end }}
5566
{{- end }}
5667
ports:
5768
- name: {{ .Values.feast_mode }}
@@ -88,4 +99,4 @@ spec:
8899
{{- with .Values.tolerations }}
89100
tolerations:
90101
{{- toYaml . | nindent 8 }}
91-
{{- end }}
102+
{{- end }}

infra/charts/feast-feature-server/templates/service.yaml

+6
Original file line numberDiff line numberDiff line change
@@ -11,5 +11,11 @@ spec:
1111
targetPort: {{ .Values.feast_mode }}
1212
protocol: TCP
1313
name: http
14+
{{- if .Values.metrics.enabled }}
15+
- name: metrics
16+
port: 8000
17+
protocol: TCP
18+
targetPort: 8000 # metrics port
19+
{{- end }}
1420
selector:
1521
{{- include "feast-feature-server.selectorLabels" . | nindent 4 }}

infra/charts/feast-feature-server/values.yaml

+6
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,12 @@ imagePullSecrets: []
1515
nameOverride: ""
1616
fullnameOverride: ""
1717

18+
metrics:
19+
enabled: false
20+
otelCollector:
21+
endpoint: "" # sample endpoint: "otel-collector.default.svc.cluster.local:4317"
22+
port: 4317
23+
1824
# feature_store_yaml_base64 -- [required] a base64 encoded version of feature_store.yaml
1925
feature_store_yaml_base64: ""
2026

0 commit comments

Comments
 (0)