Skip to content

Proposal: add format "url" #233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gajus opened this issue Jan 25, 2017 · 9 comments
Closed

Proposal: add format "url" #233

gajus opened this issue Jan 25, 2017 · 9 comments

Comments

@gajus
Copy link

gajus commented Jan 25, 2017

JSON schema already has "uri" format.

A string instance is valid against this attribute if it is a valid URI, according to .

URL is a subset of URI specification. Using the URI format, /test is a valid URI, but not a valid URL. From a practical perspective, it is a lot more common to validate whether a string is a valid URL, than if it is a valid URI. Adding URL format would standardise this validation.

There is a defined specification of a what "URL record" is (see whatwg URL).

For an example implementation, refer to ajv-validator/ajv#402.

@awwright
Copy link
Member

In my experience of this issue, this particular usage of "URL" is steeped in a lot of history and (very) invested parties; but in short, the only usage (that I'm aware of) is by Web browsers that have to be able to render ancient HTML documents and other documents containing invalid URIs and URI References. It's not used in HTTP, or any RFC to date for that matter.

My own usage of "URL" always follows the authoritative definition in RFC3986:

A URI can be further classified as a locator, a name, or both. The
term "Uniform Resource Locator" (URL) refers to the subset of URIs
that, in addition to identifying a resource, provide a means of
locating the resource by describing its primary access mechanism
(e.g., its network "location").

JSON Schema being an Internet-Draft seeking RFC publication, it would only be appropriate to use the term "URL" as so defined.

The open option would be to pick a different name; however the list of formats is by no means intended to be exhaustive, I would err towards not publishing more formats. But we don't really have a good standard by which to pick... to date, it seems to be: a well-defined grammar (e.g. ABNF) intended for short strings (not a media type unto itself), that is either used very broadly, or used by JSON Schema itself.

@handrews
Copy link
Contributor

I agree with @awwright. I'd prefer to stick with RFC 3986 unless/until something else is broadly adopted (I have no personal investment in any definition of URI/URL/URN or the parties to whatever disagreements exist, I would just find mixing RFC 3986 with less-adopted but different definitions confusing).

@epoberezkin
Copy link
Member

So we could define format "url" that is probably more commonly needed than "uri" as @awwright suggests: a subset of URIs, according to RFC3986.

@handrews
Copy link
Contributor

@epoberezkin there's no way to reliably distinguish URLs and URNs. Per RFC 3986:

An individual scheme does not have to be classified as being just one
of "name" or "locator". Instances of URIs from any given scheme may
have the characteristics of names or locators or both, often
depending on the persistence and care in the assignment of
identifiers by the naming authority, rather than on any quality of
the scheme. Future specifications and related documentation should
use the general term "URI" rather than the more restrictive terms
"URL" and "URN"

Not to mention that since new schemes are always being created, you wouldn't be able to effectively catalog by scheme anyway.

My feeling is that the shift to URI as the primary term is pretty well-established at this point, at least when it comes to how other specifications discuss the concepts. Something being a URL or a URN is more about usage than syntax. I suppose that, as a semantic format, that could be useful usage documentation, but there's definitely no way to validate it.

@chrisdostert
Copy link

I agree w/ @epoberezkin. Sure "uri" covers "url" but in the land of validation, the more specific you can be, the more valuable the validation. URL per RFC 3986 (and also wikipedia opening paragraph) would be fantastic to have; i'd like to use it right now in several schemas of mine specifically requiring "url"'s

@handrews
Copy link
Contributor

@chrisdostert what does "URL per RFC 3986" mean to you? RFC 3986 does not, as far as I know, give a validatable definition of "URL". The same URI may in fact be used as either a URL or a URN, and there is no way to tell whether a not-previously-known URI scheme is more likely to be a URL or a URN.

If you want to enforce something based on scheme, then a combination of "format" and "pattern" would be appropriate, e.g.:

{
    "type": "string",
    "format": "uri",
    "pattern": "^(https?|wss?|ftp)://"
}

This is more flexible anyway as it allows the schema author to decide which schemes are significant.

@handrews
Copy link
Contributor

@gajus @chrisdostert @epoberezkin unless someone can identify a clear and reliable way to distinguish URLs from URNs that is compatible with RFC 3986, I'm going to close this after it's been two weeks since I asked for such a definition in the prior comment (as of right now, it's been just over one week).

I don't see any support for using WHATWG in place of RFC 3986, and I don't see any way to implement this based on RFC 3986. I think the workaround I gave is something we could document on the web site as a best practice if anyone wants to open an issue on that repo about it.

@gajus gajus closed this as completed Feb 20, 2017
tfesenko added a commit to RepreZen/gnostic that referenced this issue Oct 16, 2017
See https://github.com/OAI/OpenAPI-Specification/blob/OpenAPI.next/versions/3.0.0.md#contact-object

> The email address of the contact person/organization. MUST be in the format of an email address.

Added a `"format": "email"`. Similar approach is used in [JSON Schema for Swagger v2](https://github.com/OAI/OpenAPI-Specification/blob/OpenAPI.next/schemas/v2.0/schema.json#L142)

> The URL pointing to the contact information. MUST be in the format of a URL.

JSON Schema Draft 4 does not have a string format for URL (see [Proposal: add format "url"](json-schema-org/json-schema-spec#233)), the best approximation is `"format": "uri"` which is also used in JSON Schema for [Swagger v2](https://github.com/OAI/OpenAPI-Specification/blob/OpenAPI.next/schemas/v2.0/schema.json#L137)
@631068264
Copy link

631068264 commented Dec 6, 2019

not work for me @handrews the test can pass

def test_url_schema():
    schema = {
        'type': 'object',
        'required': ['url'],
        'properties': {
            'url': {
                'type': 'array', 'items': {"type": "string", "format": "uri", "pattern": "^(https?|http?)://",
                                           'minLength': 1, 'maxLength': 255,
                                           },

            }
        },
    }
    row = {
        'url': [
            # 'https://urlregex.com/',
            # 'https://www.w3cschool.cn/tryrun/runcode?lang=python3',
            # 'http://61.191.163.98:8187/'
            # 'http://61.191.163.98:8187/#/login'
            'http://.com/'
            'com'
        ]
    }
    try:
        validate(row, schema=schema, format_checker=FormatChecker())
    except ValidationError:
        assert 1 == 2

    assert 1 == 1

@notEthan
Copy link
Contributor

notEthan commented Dec 6, 2019

it's not clear what you are expecting or why you're commenting on a years-old issue only tangentially related to your schema. I think you might have better luck asking on slack, stack overflow, or opening a new issue.

a link to your schema and data on jsonschema.dev is also helpful - here is one. https://jsonschema.dev/s/N4IgziBcLAOgdgAkbEAXAngBwKasiiAPYBGAVjgMZqoA0CyqATjgI4CuAliwCb6IBtBskLsmAG1TCAuvSSEsTIriZpOOMPzjzGIMZKiJtIkaky5+qAIZMmVjHWEnUnNDgC2mw8ZOn02PENUMDQmTngAc0cdX1QAMyImdysaIL0w6N8-LBS3JnhLEAA9AAoACzQ0LDAAfgAfCqqagEpIAHo2zKzCd3CAGRxItDL+AEY5bp6rAA8BoZHDACYAVmWnEQBfda35HY2QWhBOKBhhVH1+IRjURqx2toA6SiJ3ToP11Gf3KXlpBH2NkA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants