-
-
Notifications
You must be signed in to change notification settings - Fork 200
Add SpanProcessor for OpenTelemetry #875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
0ebd132
to
dee9540
Compare
@whatyouhide ☝🏻 I'm having trouble figuring out these failures. Could you tell me how I could debug what exactly causes the :sentry app startup failure? We start it manually in the umbrella test helpers and for whatever reason it started failing when opentelemetry is included. |
Adding just the OTel deps without any other changes leads to the same issue? |
please write some minimal description in the PR about
|
@whatyouhide @sl0thentr0py I narrowed it down to |
@whatyouhide for the time being I addressed it by using optional deps for otel packages via 6fdf121 but then one of the tests in event_test.exs started to fail so I fixed it via d04c5d3 even though I don't understand what's going on there 🙃 |
@solnic is this ready for review? It's still a draft |
Not yet. I got it working but phoenix + bandit spans are not processed in a way that would make sense for Sentry for some reason. I've been investigating how to fix. It seems like phoenix spans are not coming in as children of bandit spans so there's a disconnect here. I'll figure it out 🙂 |
If we're just shipping a We will revisit packaging later once we have a proper working prototype. |
b607ac1
to
bdcc5de
Compare
@sl0thentr0py @whatyouhide this is now open for reviews. I got it deployed to production already and it's working well (see screenshots in the description). |
okay I'm taking a preliminary look now one thing we definitely need is that instead of the boolean
without these, it is very hard for people to control quota spend so this is a hard requirement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some suggestions, looks very good otherwise!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking like a good start but still a bunch of work left to do. Let me know if any of the comments are not clear!
9fcc55e
to
65c039c
Compare
@solnic I’m assuming you're still working on this so re-request my review if this gets ready again. |
@whatyouhide yes, wrapping it up today, still a couple of things to address 🙃 |
c9152c6
to
1654724
Compare
@sl0thentr0py I'll add traces_sampler config in a separate PR |
01585a8
to
5e8fc99
Compare
0559b9e
to
59e55a8
Compare
Thank you @solnic very excited for this release. I need this badly! Appreciate all your hard work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did another round of review.
@spec tracing?() :: boolean() | ||
def tracing?, do: fetch!(:traces_sample_rate) > 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the way to go, as it makes the API a little harder to work with IMO. I would rather have two separate configuration options if possible, unless other Sentry SDKs do it this way. If they do, can you point me to examples?
If they don't, let's go with:
tracing: boolean()
for enabling and disabling tracing.traces_sample_rate: float() (0.0 → 1.0)
, which can default to something other than0.0
because now:tracing
would default tofalse
.
This way users can easily toggle tracing on and off without having to worry about the rate itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whatyouhide actually, that's how this works across the SDKs, so we need to stay consistent over here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yuck 😞
lib/sentry/config.ex
Outdated
will be sampled. This value is also used to determine if tracing is enabled: if it's | ||
greater than `0`, tracing is enabled. | ||
|
||
This feature requires `opentelemetry` package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's always phrase this as something like
This feature requires `opentelemetry` package. | |
Tracing requires OpenTelemetry packages to work. See [the | |
OpenTelemetry setup documentation](LINK-TODO) for guides on | |
how to set it up. |
@@ -117,7 +117,7 @@ defmodule Sentry.Client do | |||
|
|||
result_type = Keyword.get_lazy(opts, :result, &Config.send_result/0) | |||
client = Keyword.get_lazy(opts, :client, &Config.client/0) | |||
sample_rate = Keyword.get_lazy(opts, :sample_rate, &Config.sample_rate/0) | |||
sample_rate = Keyword.get_lazy(opts, :sample_rate, &Config.traces_sample_rate/0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also rename the option that can be passed to this function to be :traces_sample_rate
and document that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whatyouhide I'm gonna do a follow up PR with improved sampler which is meant to use this config option. @sl0thentr0py told me we need to sample sooner than in send_transaction
and that the sampler is the correct place so we'd be dropping spans sooner rather than aggregating them in our storage to potentially drop later.
alias Sentry.Interfaces.Span | ||
|
||
# This can be a no-op since we can postpone inserting the span into storage until on_end | ||
@impl true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nit, but can we replace these with
@impl true | |
@impl :otel_span_processor |
just for ease of readability?
SpanStorage.store_span(span_record) | ||
|
||
if span_record.parent_span_id == nil do | ||
child_span_records = SpanStorage.get_child_spans(span_record.span_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this somehow pop the child spans off of the storage? I’m worried about the concurrency of this all. I don't know how on_end/2
is called by OTel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@whatyouhide OTel guarantees that on_end
is not called for the same span more than once because it removes the span from its storage before sending a span to processors's on_end
, so even if something may caused another call to on_end
for the same span, that span will not be found and processor callbacks won't be called again.
|
||
def instrumented_function do | ||
Tracer.with_span "instrumented_function" do | ||
:timer.sleep(100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of these sleeps? They're going to slow down tests but I’m not sure that they bring value to the tests?
assert nil == | ||
SpanStorage.get_root_span(transaction.contexts.trace.span_id, table_name: table_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we generally (almost always) do this
assert nil == | |
SpanStorage.get_root_span(transaction.contexts.trace.span_id, table_name: table_name) | |
assert SpanStorage.get_root_span(transaction.contexts.trace.span_id, table_name: table_name) == nil |
Applies below too.
|
||
defp assert_valid_trace_id(trace_id) do | ||
assert is_binary(trace_id), "Expected trace_id to be a string" | ||
assert String.length(trace_id) == 32, "Expected trace_id to be 32 characters long #{trace_id}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert String.length(trace_id) == 32, "Expected trace_id to be 32 characters long #{trace_id}" | |
assert byte_size(trace_id) == 32, "Expected trace_id to be 32 characters long #{trace_id}" |
Otherwise this could (unlikely, ofc) catch Unicode characters.
assert String.match?(trace_id, ~r/^[a-f0-9]{32}$/), | ||
"Expected trace_id to be a lowercase hex string" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could simplify this by asserting on
assert {:ok, _} = Base.decode32(trace_id, case: :lower, padding: false)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this actually does not pass
{:sentry, path: "../.."} | ||
{:sentry, path: "../.."}, | ||
|
||
{:opentelemetry, "~> 1.5"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do these changes in the Phoenix test app in a separate PR that also, possibly, adds some tests.
hey everyone, I will take one final look at this now and we are going to prioritize shipping a first version. Other nitpicks can be handled later and we will iterate on it. Thanks for all the reviews and discussions but this has been open for a fairly long time so now we should wrap it up. @solnic told me he's been running it in his production app for a while so that's also good enough as a signal for me to get this out. |
@@ -117,7 +117,7 @@ defmodule Sentry.Client do | |||
|
|||
result_type = Keyword.get_lazy(opts, :result, &Config.send_result/0) | |||
client = Keyword.get_lazy(opts, :client, &Config.client/0) | |||
sample_rate = Keyword.get_lazy(opts, :sample_rate, &Config.sample_rate/0) | |||
sample_rate = Keyword.get_lazy(opts, :sample_rate, &Config.traces_sample_rate/0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the errors sample_rate
and transactions traces_sample_rate
are two different options and should be kept separate and used for the relevant sampling decision, so please don't replace the original one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not correct. It's not replacing the original one, it's switching from sample_rate
to traces_sample_rate
in this function, which reports transactions. We didn't have traces_sample_rate
before, so we used sample_rate
as a stopgap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm saying we need both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the sampler still needs to use traces_sample_rate
right?
As discussed in Discord, I won't re-review this. Thanks for all the work @solnic! |
To make this work you need to add opentelemetry deps:
Then configure our span processor:
Things should start looking more or less like this: