-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EDU-1502: Adds bigQuery page #2432
base: main
Are you sure you want to change the base?
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
ddf1bd9
to
2351632
Compare
34c30af
to
b7d2197
Compare
e43140b
to
54b2537
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs some light restructuring to make it consistent with the other integrations and flow a bit more succinctly.
Links also need updating with/docs/
amongst other things, and we can add it into the TOC and appropriate overview now too.
content/bigquery.textile
Outdated
@@ -0,0 +1,109 @@ | |||
--- | |||
title: BigQuery rule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you're doing with the others, but tempted to say drop 'rule' from all of them.
Also consider including GCP or Google like we do for AWS maybe?
content/bigquery.textile
Outdated
meta_description: "Stream realtime event data from Ably into Google BigQuery using the Firehose BigQuery rule. Configure, and analyze your data efficiently." | ||
--- | ||
|
||
Stream events published to Ably directly into a table in "BigQuery":https://cloud.google.com/bigquery?utm_source=google&utm_medium=cpc&utm_campaign=emea-es-all-en-dr-bkws-all-all-trial-e-gcp-1707574&utm_content=text-ad-none-any-dev_c-cre_574561258287-adgp_Hybrid+%7C+BKWS+-+EXA+%7C+Txt+-+Data+Analytics+-+BigQuery+-+v1-kwid_43700072692462237-kwd-12297987241-userloc_1005419&utm_term=kw_big+query-net_g-plac_&&gad_source=1&gclid=Cj0KCQiAwtu9BhC8ARIsAI9JHanslQbN6f8Ho6rvEvozknlBMbqaea0s6ILK-VA9YpQhRr_IUrVz6rYaAtXeEALw_wcB&gclsrc=aw.ds&hl=en for analytical or archival purposes. General use cases include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Watch your link here, this includes a lot of unnecessary params such as UTM
content/bigquery.textile
Outdated
* Historical auditing of messages. | ||
|
||
<aside data-type='note'> | ||
<p>Ably's BigQuery integration rule for "Firehose":/integrations/streaming is in development status.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to clarify this in light of the product status changes.
content/bigquery.textile
Outdated
* Create a BigQuery table in that dataset: | ||
** Use the "JSON schema":#schema. | ||
** For large datasets, partition the table by ingestion time, with daily partitioning recommended for optimal performance. | ||
* Create a Google Cloud Platform (GCP) "service account":https://cloud.google.com/iam/docs/service-accounts-create?utm_source=chatgpt.com with the minimal required BigQuery permissions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UTM link again.
content/bigquery.textile
Outdated
<p>Ably's BigQuery integration rule for "Firehose":/integrations/streaming is in development status.</p> | ||
</aside> | ||
|
||
h3(#create-rule). Create a BigQuery rule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a H2 and the first part isn't creating a rule; it's configuring GCP resources.
content/bigquery.textile
Outdated
|
||
h4(#api-rule). Create a BigQuery rule using the Control API | ||
|
||
The following steps to create a BigQuery rule using the "Control API:":https://ably.com/docs/api#control-api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this should be steps. This could just be an API call?
content/bigquery.textile
Outdated
|
||
h3(#schema). JSON Schema | ||
|
||
You can run queries directly against the Ably-managed BigQuery table. For example, if the message payloads are stored as raw JSON in the data column, you can parse them using the following query: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This says 'the following query' but is showing a payload.
content/bigquery.textile
Outdated
|
||
h3(#queries). Direct queries | ||
|
||
Run queries directly against the Ably-managed table. For instance, to parse JSON payloads stored in @data@: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs some explanation around the data
property otherwise it's a little out of context.
content/bigquery.textile
Outdated
The following explains the components of the query: | ||
|
||
|_. Query Function |_. Purpose | | ||
| @CAST(data AS STRING)@ | Converts the data column from BYTES (if applicable) into a STRING format. | | ||
| @PARSE_JSON(…)@ | Parses the string into a structured JSON object for easier querying. | | ||
| @WHERE channel = “my-channel”@ | Filters results to retrieve messages only from a specific Ably channel. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need to explain the query like this - we could write this in a short sentence before the example.
content/bigquery.textile
Outdated
<aside data-type='note'> | ||
<p>Parsing JSON at query time can be computationally expensive for large datasets. If your queries need frequent JSON parsing, consider pre-processing and storing structured fields in a secondary table using an ETL pipeline for better performance.</p> | ||
</aside> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could make this part of the ETL section below since it's a sub-section rather than calling it out as a note.
a2cfa41
to
454afaa
Compare
56f9b9b
to
c72f98a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of small comments to improve understanding here.
@@ -139,6 +139,10 @@ export default { | |||
name: 'Pulsar', | |||
link: '/docs/integrations/streaming/pulsar', | |||
}, | |||
{ | |||
name: 'BigQuery', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: 'BigQuery', | |
name: 'Google BigQuery', |
* Choose *BigQuery* from the list of available Firehose integrations. | ||
* "Configure":#configure the rule settings. Then, click *Create*. | ||
|
||
h3(#api-rule). Create a rule using the ABly Control API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
h3(#api-rule). Create a rule using the ABly Control API | |
h3(#api-rule). Create a rule using the Ably Control API |
|
||
h3(#api-rule). Create a rule using the ABly Control API | ||
|
||
The following steps to create a BigQuery rule using the Control API: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't exist in the Control API spec at the moment, so I think we need to check whether this is possible.
|
||
h3(#settings). BigQuery configuration options | ||
|
||
The following explains the BigQuery configuration options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this in the Ably dashboard? If so, I think we should mention that.
The following explains the BigQuery configuration options: | ||
|
||
|_. Section |_. Purpose | | ||
| *Source* | Defines the type of event(s) for delivery. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you've been working on this in a separate PR, but there should be somewhere to link to for the list of these events now.
|
||
```[json] | ||
{ | ||
“name”: “id”, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming this is describing the message ID? If so, I think we need to explain that this is all this is a snapshot of.
This PR:
EDU-1502