You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Change structure of URL
So far the url structure was heavily inspired by whatwg/url#337. I initially only wanted to make some tweaks to it to improve querying but I realised I never fully felt comfortable with the field names used here. So I started to look at the url parser of different languages like Go, Ruby, Python and the output they provide are surprisingly similar but not consistent with whatwg. The change made here brings the field names closer to what most url parsers output.
* fix description of query field
* switch it to integer
* Add dot to host.name to make it consistent
Copy file name to clipboardExpand all lines: README.md
+10-13
Original file line number
Diff line number
Diff line change
@@ -347,26 +347,23 @@ A complete URL, with scheme, host, and path.
347
347
348
348
The URL object can be reused in other prefixes like `host.url.*` for example. It is important that whenever URL is used that the same structure is used.
349
349
350
-
`url.href` is a [multi field](https://www.elastic.co/guide/en/elasticsearch/reference/6.2/multi-fields.html#_multi_fields_with_multiple_analyzers) which means the data is stored as keyword `url.href` and test `url.href.analyzed`. The advantage of this is that for running a query against only a part of the url still works without having to split up the URL in all its part on ingest time.
351
-
352
-
Based on whatwg URL definition: https://github.com/whatwg/url/issues/337
350
+
`url.href` is a [multi field](https://www.elastic.co/guide/en/ elasticsearch/reference/6.2/ multi-fields.html#_multi_fields_with_multiple_analyzers) which means the data is stored as keyword `url.href` and test `url.href.analyzed`. The advantage of this is that for running a query against only a part of the url still works without having to split up the URL in all its part on ingest time.
353
351
354
352
355
353
| Field | Description | Type | Multi Field | Example |
356
354
|---|---|---|---|---|
357
-
| <aname="url.href"></a>`url.href`| href contains the full url. The field is stored as keyword.<br/>`href` is an analyzed field so the parsed information can be accessed through `href.analyzed` in queries. | keyword ||`https://elastic.co:443/search?q=elasticsearch#top`|
355
+
| <aname="url.href"></a>`url.href`| href contains the full url. The field is stored as keyword.<br/>`href` is an analyzed field so the parsed information can be accessed through `href.analyzed` in quries. | keyword ||`https://elastic.co:443/search?q=elasticsearch#top`|
358
356
| <aname="url.href.analyzed"></a>`url.href.analyzed`|| text | 1 ||
359
-
| <aname="url.protocol"></a>`url.protocol`| The protocol of the request, e.g. "https:". | keyword |||
360
-
| <aname="url.hostname"></a>`url.hostname`| The hostname of the request, e.g. "example.com".<br/>For correlation the this field can be copied into the `host.name` field. | keyword |||
361
-
| <aname="url.port"></a>`url.port`| The port of the request, e.g. 443. |keyword|||
362
-
| <aname="url.pathname"></a>`url.pathname`| The path of the request, e.g. "/search". | text |||
363
-
| <aname="url.pathname.raw"></a>`url.pathname.raw`| The url path. This is a non-analyzed field that is useful for aggregations. | keyword | 1 ||
364
-
| <aname="url.search"></a>`url.search`| The search describes the query string of the request, e.g. "q=elasticsearch". | text |||
365
-
| <aname="url.search.raw"></a>`url.search.raw`| The url search part. This is a non-analyzed field that is useful for aggregations. | keyword | 1 ||
366
-
| <aname="url.hash"></a>`url.hash`| The hash of the request URL, e.g. "top". | keyword |||
357
+
| <aname="url.scheme"></a>`url.scheme`| The scheme of the request, e.g. "https".<br/>Note: The `:` is not part of the scheme. | keyword ||`https`|
358
+
| <aname="url.host.name"></a>`url.host.name`| The hostname of the request, e.g. "example.com".<br/>For correlation the this field can be copied into the `host.name` field. | keyword ||`elastic.co`|
359
+
| <aname="url.port"></a>`url.port`| The port of the request, e.g. 443. |integer||`443`|
360
+
| <aname="url.path"></a>`url.path`| The path of the request, e.g. "/search". | text |||
361
+
| <aname="url.path.raw"></a>`url.path.raw`| The url path. This is a non-analyzed field that is useful for aggregations. | keyword | 1 ||
362
+
| <aname="url.query"></a>`url.query`| The query field describes the query string of the request, e.g. "q=elasticsearch".<br/>The `?` is excluded from the query string. In case an URL contains no `?` it is expected that the query field is left out. In case there is a `?` but no query, the query field is expected to exist with an empty string. Like this the `exists` query can be used to differentiate between the two cases. | text |||
363
+
| <aname="url.query.raw"></a>`url.query.raw`| The url query part. This is a non-analyzed field that is useful for aggregations. | keyword | 1 ||
364
+
| <aname="url.fragment"></a>`url.fragment`| The part of the url after the `#`, e.g. "top".<br/>The `#` is not part of the fragment. | keyword |||
367
365
| <aname="url.username"></a>`url.username`| The username of the request. | keyword |||
368
366
| <aname="url.password"></a>`url.password`| The password of the request. | keyword |||
369
-
| <aname="url.extension"></a>`url.extension`| The url extension field contains the extension of the file associated with the url.<br/>A simple example is `http://localhost/logo.png` where the extension would be `png`. There can also be more complex cases like `http://localhost/content?asset=logo.png&token=XYZ` where the extension could also be `png` but depends on the implementation.<br/>The `extension` field should be left out if the extension is not defined. | keyword ||`png`|
0 commit comments