Skip to content

Add SpanProcessor for OpenTelemetry #875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 43 commits into
base: master
Choose a base branch
from

Conversation

solnic
Copy link
Collaborator

@solnic solnic commented Mar 14, 2025

To make this work you need to add opentelemetry deps:

      {:opentelemetry, "~> 1.3"},
      {:opentelemetry_api, "~> 1.2"},
      {:opentelemetry_exporter, "~> 1.6"},
      {:opentelemetry_phoenix, "~> 2.0"},
      {:opentelemetry_ecto, "~> 1.2"},
      {:opentelemetry_bandit, "~> 0.2"}

Then configure our span processor:

config :opentelemetry, span_processor: {Sentry.OpenTelemetry.SpanProcessor, []}

Things should start looking more or less like this:

Screenshot 2025-03-19 at 11 32 54 Screenshot 2025-03-19 at 11 31 53 Screenshot 2025-03-19 at 11 33 03 Screenshot 2025-03-19 at 11 29 03

@solnic solnic linked an issue Mar 14, 2025 that may be closed by this pull request
@solnic solnic force-pushed the 874-add-spanprocessor-for-otel branch from 0ebd132 to dee9540 Compare March 14, 2025 11:17
@solnic
Copy link
Collaborator Author

solnic commented Mar 14, 2025

1) test loads in_app_module_allow_list (Public.Sentry.ConfigTest)
Error:      apps/public/test/public_test.exs:4
     ** (RuntimeError) the Sentry configuration seems to be not available (while trying to fetch :in_app_module_allow_list). This is likely because the :sentry application has not been started yet. Make sure that you start the :sentry application before using any of its functions.

     code: assert Sentry.Config.in_app_module_allow_list() |> Enum.sort() ==
     stacktrace:
       (sentry 10.8.1) lib/sentry/config.ex:698: Sentry.Config.in_app_module_allow_list/0
       test/public_test.exs:5: (test)


Finished in 0.00 seconds (0.00s async, 0.00s sync)
1 test, 1 failure
==> admin
Running ExUnit with seed: 25[95](https://github.com/getsentry/sentry-elixir/actions/runs/13856201920/job/38773483717?pr=875#step:10:96)1, max_cases: 8



  1) test loads in_app_module_allow_list (Admin.Sentry.ConfigTest)
Error:      apps/admin/test/admin_test.exs:4
     ** (RuntimeError) the Sentry configuration seems to be not available (while trying to fetch :in_app_module_allow_list). This is likely because the :sentry application has not been started yet. Make sure that you start the :sentry application before using any of its functions.

     code: assert Sentry.Config.in_app_module_allow_list() |> Enum.sort() ==
     stacktrace:
       (sentry 10.8.1) lib/sentry/config.ex:6[98](https://github.com/getsentry/sentry-elixir/actions/runs/13856201920/job/38773483717?pr=875#step:10:99): Sentry.Config.in_app_module_allow_list/0
       test/admin_test.exs:5: (test)

@whatyouhide ☝🏻 I'm having trouble figuring out these failures. Could you tell me how I could debug what exactly causes the :sentry app startup failure? We start it manually in the umbrella test helpers and for whatever reason it started failing when opentelemetry is included.

@whatyouhide
Copy link
Collaborator

Adding just the OTel deps without any other changes leads to the same issue?

@sl0thentr0py
Copy link
Member

sl0thentr0py commented Mar 17, 2025

please write some minimal description in the PR about

  • what functionality this adds
  • instructions to add this to a Phoenix app with opentelemetry instrumentation and test it out

@solnic
Copy link
Collaborator Author

solnic commented Mar 18, 2025

Adding just the OTel deps without any other changes leads to the same issue?

@whatyouhide @sl0thentr0py I narrowed it down to opentelemetry dep. When it's included, the app doesn't start in umbrella integration test. We need to make otel deps optional anyway, so it's time to address this. How do we want to approach it though? Should I create sentry-opentelemetry package in this repo?

@solnic
Copy link
Collaborator Author

solnic commented Mar 18, 2025

@whatyouhide for the time being I addressed it by using optional deps for otel packages via 6fdf121 but then one of the tests in event_test.exs started to fail so I fixed it via d04c5d3 even though I don't understand what's going on there 🙃

@solnic solnic mentioned this pull request Mar 18, 2025
@whatyouhide
Copy link
Collaborator

@solnic is this ready for review? It's still a draft

@solnic
Copy link
Collaborator Author

solnic commented Mar 18, 2025

@solnic is this ready for review? It's still a draft

Not yet. I got it working but phoenix + bandit spans are not processed in a way that would make sense for Sentry for some reason. I've been investigating how to fix. It seems like phoenix spans are not coming in as children of bandit spans so there's a disconnect here. I'll figure it out 🙂

@sl0thentr0py
Copy link
Member

We need to make otel deps optional anyway, so it's time to address this.

If we're just shipping a SpanProcessor first, having them as peer dependencies is more than fine. We just document well how to get otel up and running in parallel to sentry and hook them up properly.

We will revisit packaging later once we have a proper working prototype.

@solnic solnic force-pushed the 874-add-spanprocessor-for-otel branch 2 times, most recently from b607ac1 to bdcc5de Compare March 19, 2025 12:45
@solnic solnic marked this pull request as ready for review March 19, 2025 12:59
@solnic
Copy link
Collaborator Author

solnic commented Mar 19, 2025

@sl0thentr0py @whatyouhide this is now open for reviews. I got it deployed to production already and it's working well (see screenshots in the description).

@sl0thentr0py
Copy link
Member

okay I'm taking a preliminary look now

one thing we definitely need is that instead of the boolean Config.tracing we need

without these, it is very hard for people to control quota spend so this is a hard requirement

Copy link
Member

@sl0thentr0py sl0thentr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some suggestions, looks very good otherwise!

Copy link
Collaborator

@whatyouhide whatyouhide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking like a good start but still a bunch of work left to do. Let me know if any of the comments are not clear!

@solnic solnic force-pushed the 874-add-spanprocessor-for-otel branch from 9fcc55e to 65c039c Compare April 2, 2025 13:18
@whatyouhide
Copy link
Collaborator

@solnic I’m assuming you're still working on this so re-request my review if this gets ready again.

@solnic
Copy link
Collaborator Author

solnic commented Apr 3, 2025

@solnic I’m assuming you're still working on this so re-request my review if this gets ready again.

@whatyouhide yes, wrapping it up today, still a couple of things to address 🙃

@solnic solnic force-pushed the 874-add-spanprocessor-for-otel branch from c9152c6 to 1654724 Compare April 3, 2025 10:44
@solnic
Copy link
Collaborator Author

solnic commented Apr 3, 2025

without these, it is very hard for people to control quota spend so this is a hard requirement

@sl0thentr0py I'll add traces_sampler config in a separate PR

@solnic solnic force-pushed the 874-add-spanprocessor-for-otel branch from 01585a8 to 5e8fc99 Compare April 3, 2025 11:07
@solnic solnic force-pushed the 874-add-spanprocessor-for-otel branch from 0559b9e to 59e55a8 Compare April 29, 2025 14:04
@sergiotapia
Copy link

Thank you @solnic very excited for this release. I need this badly! Appreciate all your hard work!

Copy link
Collaborator

@whatyouhide whatyouhide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did another round of review.

Comment on lines +645 to +646
@spec tracing?() :: boolean()
def tracing?, do: fetch!(:traces_sample_rate) > 0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the way to go, as it makes the API a little harder to work with IMO. I would rather have two separate configuration options if possible, unless other Sentry SDKs do it this way. If they do, can you point me to examples?

If they don't, let's go with:

  1. tracing: boolean() for enabling and disabling tracing.
  2. traces_sample_rate: float() (0.0 → 1.0), which can default to something other than 0.0 because now :tracing would default to false.

This way users can easily toggle tracing on and off without having to worry about the rate itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whatyouhide actually, that's how this works across the SDKs, so we need to stay consistent over here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yuck 😞

will be sampled. This value is also used to determine if tracing is enabled: if it's
greater than `0`, tracing is enabled.

This feature requires `opentelemetry` package.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's always phrase this as something like

Suggested change
This feature requires `opentelemetry` package.
Tracing requires OpenTelemetry packages to work. See [the
OpenTelemetry setup documentation](LINK-TODO) for guides on
how to set it up.

@@ -117,7 +117,7 @@ defmodule Sentry.Client do

result_type = Keyword.get_lazy(opts, :result, &Config.send_result/0)
client = Keyword.get_lazy(opts, :client, &Config.client/0)
sample_rate = Keyword.get_lazy(opts, :sample_rate, &Config.sample_rate/0)
sample_rate = Keyword.get_lazy(opts, :sample_rate, &Config.traces_sample_rate/0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also rename the option that can be passed to this function to be :traces_sample_rate and document that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whatyouhide I'm gonna do a follow up PR with improved sampler which is meant to use this config option. @sl0thentr0py told me we need to sample sooner than in send_transaction and that the sampler is the correct place so we'd be dropping spans sooner rather than aggregating them in our storage to potentially drop later.

alias Sentry.Interfaces.Span

# This can be a no-op since we can postpone inserting the span into storage until on_end
@impl true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit, but can we replace these with

Suggested change
@impl true
@impl :otel_span_processor

just for ease of readability?

SpanStorage.store_span(span_record)

if span_record.parent_span_id == nil do
child_span_records = SpanStorage.get_child_spans(span_record.span_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this somehow pop the child spans off of the storage? I’m worried about the concurrency of this all. I don't know how on_end/2 is called by OTel.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whatyouhide OTel guarantees that on_end is not called for the same span more than once because it removes the span from its storage before sending a span to processors's on_end, so even if something may caused another call to on_end for the same span, that span will not be found and processor callbacks won't be called again.


def instrumented_function do
Tracer.with_span "instrumented_function" do
:timer.sleep(100)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of these sleeps? They're going to slow down tests but I’m not sure that they bring value to the tests?

Comment on lines 95 to 96
assert nil ==
SpanStorage.get_root_span(transaction.contexts.trace.span_id, table_name: table_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we generally (almost always) do this

Suggested change
assert nil ==
SpanStorage.get_root_span(transaction.contexts.trace.span_id, table_name: table_name)
assert SpanStorage.get_root_span(transaction.contexts.trace.span_id, table_name: table_name) == nil

Applies below too.


defp assert_valid_trace_id(trace_id) do
assert is_binary(trace_id), "Expected trace_id to be a string"
assert String.length(trace_id) == 32, "Expected trace_id to be 32 characters long #{trace_id}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert String.length(trace_id) == 32, "Expected trace_id to be 32 characters long #{trace_id}"
assert byte_size(trace_id) == 32, "Expected trace_id to be 32 characters long #{trace_id}"

Otherwise this could (unlikely, ofc) catch Unicode characters.

Comment on lines +120 to +121
assert String.match?(trace_id, ~r/^[a-f0-9]{32}$/),
"Expected trace_id to be a lowercase hex string"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simplify this by asserting on

assert {:ok, _} = Base.decode32(trace_id, case: :lower, padding: false)

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this actually does not pass

{:sentry, path: "../.."}
{:sentry, path: "../.."},

{:opentelemetry, "~> 1.5"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do these changes in the Phoenix test app in a separate PR that also, possibly, adds some tests.

@sl0thentr0py
Copy link
Member

hey everyone, I will take one final look at this now and we are going to prioritize shipping a first version. Other nitpicks can be handled later and we will iterate on it. Thanks for all the reviews and discussions but this has been open for a fairly long time so now we should wrap it up.

@solnic told me he's been running it in his production app for a while so that's also good enough as a signal for me to get this out.

@@ -117,7 +117,7 @@ defmodule Sentry.Client do

result_type = Keyword.get_lazy(opts, :result, &Config.send_result/0)
client = Keyword.get_lazy(opts, :client, &Config.client/0)
sample_rate = Keyword.get_lazy(opts, :sample_rate, &Config.sample_rate/0)
sample_rate = Keyword.get_lazy(opts, :sample_rate, &Config.traces_sample_rate/0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the errors sample_rate and transactions traces_sample_rate are two different options and should be kept separate and used for the relevant sampling decision, so please don't replace the original one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not correct. It's not replacing the original one, it's switching from sample_rate to traces_sample_rate in this function, which reports transactions. We didn't have traces_sample_rate before, so we used sample_rate as a stopgap.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm saying we need both

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sampler still needs to use traces_sample_rate right?

@whatyouhide
Copy link
Collaborator

As discussed in Discord, I won't re-review this. Thanks for all the work @solnic!

@whatyouhide whatyouhide removed their request for review May 14, 2025 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add SpanProcessor for OTel
7 participants