-
Notifications
You must be signed in to change notification settings - Fork 2
Addressing SPARQL EXISTS errata #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This was discussed during the rdf-star meeting on 26 September 2024. View the transcriptAddressing SPARQL EXISTS errata 4ora: Are there people fine with the current syntax? ora: In any case, chairs will discuss this, let's move on AndyS: [about SPARQL EXISTS] There are two proposals AndyS: 1. substitution based on various existing errata AndyS: 2. an other one based on ANTIJOIN. We already have MINUS. Except the behavior with disjoin domain. But outside of it it's ANTIJOIN AndyS: On an other note, there are other things that might go to SPARQL like LATERAL that can be based on substitution. And pure form of anti join and semi join AndyS: It's a possibility to move these additions (LATERAL, anti join...) to sparql dev pchampin: we would add more subtly differences between operators like FILTER NOT EXISTS vs MINUS pchampin: Your point of having multiple ways might create problems ora: SPARQL spec spends a bit of time presenting this difference AndyS: It was quite contentious in SPARQL 1.1 <pchampin> I'm more than happy to let the editors decide on that AndyS: I am not aware of any outgoing opinion, I think it ends up to a choice on which way to go tl: is it related to triple terms in any way of is it a SPARQL errata AndyS: it has nothing to do with triple terms tl: what is the criteria of SPARQL errata to discuss now? tl: it's a central issue, is that the argument? pfps: There are a bunch of problems with SPARQL, the ones with EXIST are the biggies pfps: They end up splitting the SPARQL implementation space pfps: The decision that has to be made is to move SPARQL EXIST toward a more database-like implementation and keep it more consistent with the existing AndyS: The current implementation is present in SQL with correlated subqueries pfps: if you use the semi/anti join interepretation of EXISTS you change SPARQL more than the other option pfps: In the end people who will see and understand the differences are very few ora: I would like to know preferences AndyS: My preference is for substitution and applying errata (option 1) pfps: I don't have much of a horse in this race pfps: Idealy I would love to get more SPARQL developers on board ora: we could talk outside of the group ktk: I reached out to stardog but not got an input gtw: I am not sure much value to reach out to more developers. sparql-dev has been opened for a long time <pchampin> Tpt: I have a signicant preference for option 1; option 2 is basically equivalent to MINUS pfps: One way to check the issue would be to pull some tests <pfps> which PR? <gkellogg> w3c/rdf-tests#42 <gb> Issue 42 tests to document current definition of EXISTS in SPARQL (by pfps) [SPARQL] <gkellogg> w3c/rdf-tests#43 <gb> CLOSED Pull Request 43 Add tests to document current definition of EXISTS (by pfps) ora: Whatever solutions we pick, someone will ask why we pick it AndyS: picking sustitution breaks the least queries ora: That seems to me a as good reason as any, let's make a decision tl: I would like to ask james about it ora: Let's vote on it next Thursday ora: Let's do it |
Here's my 2c as a query engine tech lead at Stardog: I prefer fixing the substitution semantics, making it a part of the spec, and keeping
I have seen lots of queries like this over the years where the bottom-up eval would produce zero results (I know this because Stardog, like many relational query engines, has a de-correlation optimiser and that has to carefully analyse for this sort of cases). Eventually, I would really like to see |
@domel sent us this "questionnaire" by email. I'll paste it below, because I find it a useful roadmap to this topic. The RDF-star Working Group is currently addressing issues related to updating the semantics of SPARQL EXISTS, specifically regarding:
Would you be able to provide your insights on these matters |
This section of SEP-0007 expands on these points: |
Dear all, Hannah (@hannahbast) and Johannes (@joka921) here from the University of Freiburg and developers of QLever. Fascinating and important question. There is a lot to untangle here, so first TLDR: In border cases regarding the semantics of a complex query, we recommend to follow the inner logic of the query language and not what a user thinks the query should do. Following that, we recommend to define Here is the long version:
|
This was discussed in today's meeting |
We need to discuss a Task Force for this |
This was discussed during the #rdf-star meeting on 10 April 2025. View the transcriptTask Force for SPARQL EXISTS 3ora: Idea is to prepare for EXISTS - not a priority for the whole WG at this time. <AndyS> s/agress/agree/ ora: james - would you chair this? james: insufficient experience of W3C processes ora: it involves scheduling and making the TF moves forward gkellogg: Agree to TF and maybe other items in the WG for subsets of the participants james: sub-group decide chair? Andys: I can schedule a first meeting <TallTed> TF(s) will let focused conversation(s) take place in parallel with main group without consuming main group time. I don't think I will have the time to do much if any more than participate (which I will *try* to do). ktk: can external people participate? <TallTed> TF participants must be WG members, whether as W3CMember reps or as IEs james: certainly agree for external participants pchampin: chairs can invite (and to WG meetings) <TallTed> IP issues can quickly become quite hairy. tallted: depends on their input (IP issues need care) ora: adjourned |
WG members - please let me know (by direct email) of your interest and availability. |
There appears to be two different ways of fixing EXISTS and they produce different answers. One method adds the current bindings to the FILTER expression, roughly as if a VALUES expression was prepended to it, but evaluates the resulting expression normally. This means the result of any subqueries is the same for all the bindings so they need only be evaluated once. The other method is much more involved, and is described in https://github.com/w3c/sparql-dev/blob/main/SEP/SEP-0007/sep-0007.md In this method the current bindings are pushed into subqueries with the result that they may need to be evaluated multiple times. (It may be possible to optimize the evalation so that the subqueries do not need to be completely re-evaluated for each binding.) A query that shows a difference between the two ways is:
Under the first method this query produces no answers because no binding is available for ?v in the embedded query. Under the second method an answer is produced because the current bindings are pushed all the way down in the expression. Currently MilleniumDB and Blazegraph produce an empty answer set for this query. |
@pfps -- What does the SPARQL 1.1 say? "the pattern formed by replacing every occurrence of a variable v in pattern by μ(v) for each v in dom(μ)" Aside from the projection of ?v (separate issue that you have already identified, What about:
|
PR 177 has one way in which The principle is that during execution of the row being filtered, variables have their value constrained to the binding in the row. The PR describes this by a process of rewriting the algebra on a per-row basis. It is not a required implementation approach. It is not the only way to communicate the effect. An alternative possibility is to introduce a new algebra operator which evaluates to the row being filtered. This would make for a rewrite a static (algebra build time) step. |
I've setup a repo for tests and notes: https://github.com/afs/SPARQL-exists In it, there is the example query, and variations, together with https://github.com/afs/SPARQL-exists/tree/main/tests/exists-filter |
there is now a pull request to explore minimal changes to the recommendation to support the interpretation, that it should be implemented analogous to nested loop bgp mechanisms. the change clarifies that patterns are not to be instantiated, which eliminates the errors otherwise associated the blank nodes. |
It would be useful to describe how this proposal would change common uses of EXISTS. For example what happens with
|
Prepend inside or outside the The second method can be seen as prepending inside the In the syntax to algebra algorithm of SPARQL, there is also always an empty BGP at the start of a group graph pattern. The "simplification" step in the spec replaces "join(Z, X)" and "join(X, Z)" with X for clarity. The values insertion process includes a VALUES block of a single row (actually There is a binding for
there is no join, and So it becomes (algebra)
Filter (algebra) always filers something.
Maybe because it is outside the
|
Added blank node tests (issue 3). |
Dear all, I just reread my comment from 03.10.2024 and still feel the same way. In case this got forgotten, here are the main points again; see the comment for examples and all the details:
|
My view is that embedded queries should be treated inside EXISTS just as they are outside exists.
But that in-scope bindings are available at the top level of the EXISTS pattern. So these two have different results, just like
and
do. |
My recollection of the 1.1 WG is that (NOT-)EXISTS was mostly motivated by the desire for negation support. In that context, I think |
@hannahbast - The comment has not been forgotten. Johannes presented a summary in the meeting on Friday (May 2nd) |
The topic for the SPARQL TF meeting this week is agree on the 5 issues, with the understanding that new issues may be found. The expectation for a new issue is that it needs clearly stating what the problem is, with evidence, such as query, data and results. WG members - the meetings should now be appearing on your W3C calendar and also in the working group upcoming meetings. |
attached here is a script to run the tests from https://github.com/afs/SPARQL-exists/tree/main/tests against a repository in dydra. the data is present in distinct graphs in a single dataset. |
I've reorganised the tests area with a set of tests per issue as suggested at the last meeting. (There is overlap because the issues are not independent.) Also - I've added a description of using an algebra function to PR #177. The function evaluates to the current solution mapping (mentioned in comment. The spec description now does imply changing the algebra for query execution once algebra expression is initially constructed. |
Recap
TPAC 2023 presentation
Issues: sparql-query/issues for EXISTS
After TPAC 2023, an email was sent to interested parties.
Proposals
1:: Improved substitution
SEP-0007/Substitution
2:: SemiJoin/Antijoin
https://w3c.github.io/sparql-exists/docs/sparql-exists.html#proposal-a
Proposal 1
Proposal 1 is based on errata for the "Substitution" operation.
Full details including relationship to errata: SEP-0007/Substitution.
Proposal 2
Proposal 2 is SemiJoin/AntiJoin.
SPARQL already has
MINUS
which is an antijoin with a special condition for the case of disjoint domains (a decision of the SPARQL 1.1 working group).A way forward.
A compromise way forward:
Replace "Substitute" with the errata-derived fix SEP-0007/Substitution
Plan for adding LATERAL, SEMIJOIN and ANTIJOIN (both pure forms) to the SPARQL language. This may have to be additional features added in the "new features" phase due to timing.
The text was updated successfully, but these errors were encountered: