Skip to content

Configurable per query or per request context/dataloader batching #192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jmccaull
Copy link
Contributor

@jmccaull jmccaull commented Jun 19, 2019

This is an initial work of up how I think it should look. I'm still working on cleaning it up a little and documenting but I want to get some eyes on it. I decided not to go with Lombok to keep this as lean as possible.
It was a little difficult keeping everything straight without a package structure so I first organized the files a bit. There are a number of breaking changes relating batching and the context. I removed the BatchExecutionHandler replacing it with a pre-processor, the http objects from the context and moved all of the logic handling http objects up into the AbstractServlet. The context from GraphQL's view is really just the subject and the data loader registry. The builder for the context still has access to all the http objects, I felt they could be useful for things like headers etc, when building the context, and the servlet now deals with an interface so consumers should be able to customize this and even add back the http objects if they desire.
The logic for executing a batch deals with an interface and is context independent. The logic for which type of batch input object to create and which dispatching instrumentation to use has been implemented on the enum for the setting to keep this all in once place. The default in the configuration for this is set to PER_QUERY, which most closely resembles the existing behavior.
As implementing the per request setting required wholly rewriting the dispatching instrumentation (and this was what was recommenced by GraphQL java), I figured it made sense to make it as pluggable and reusable as possible. Further, since the GraphQL java defaults don't allow configuring their dispatching instrumentation to use the options object they provide and probably has a bug regarding chained instrumentation, I've completely removed their default and supplied a custom instrumentation with a pluggable approach for both per query and per request settings.
Overall I think this will also make it a lot easier to refactor the abstract servlet and make a service layer that can be inject in custom controllers etc.
I also plan on updating the readme etc.
Thanks!

Copy link
Member

@oliemansm oliemansm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job, thanks a lot!

query(queryInvoker, graphQLObjectMapper, invocationInputFactory.createReadOnly(new GraphQLRequest(query, variables, operationName), request, response), response);
query(queryInvoker, graphQLObjectMapper,
invocationInputFactory.createReadOnly(new GraphQLRequest(query, variables, operationName), request, response),
Optional.of(request), response);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's recommended not to use Optional as parameter, see https://rules.sonarsource.com/java/RSPEC-3553

public class DefaultGraphQLContextBuilder implements GraphQLContextBuilder {

@Override
public GraphQLContext build(HttpServletRequest httpServletRequest, HttpServletResponse httpServletResponse) {
return new GraphQLContext(httpServletRequest, httpServletResponse);
return new DefaultGraphQLServletContext();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it would be good if we would provide another interface layer in between, e.g. GraphQLServletContext and GraphQLWebSocketContext along with concrete implementations containing the respective objects. That way the DefaultGraphQLServletContext could still provide the getFileParts() for example, simplifying file uploads. And this way people could still make use of existing functionality if they simply grab a different type of context from the environment.

Do you agree? Happy to help out with this of course.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious as to how all that was intended to be used, and maybe I should've posted a question about it first. I also didn't run into any examples of it being used till trying to locally build the Spring boot project last night. I'm still digging though the commons project and trying to work out what I think, but I definitely don’t want to remove any functionality. Would it make more sense to wrap the result from the serlvet and allow it to be accessed there? I also ideally would like a way to break the dependency of the context builder on the actual http objects but I'm at a loss. My goal for keeping the graphql objects leaner is to make refactoring the servlet into a service layer easier. Maybe we can extend the execution input to maintain the file parts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you on Gitter by any chance? Might be easier to chat realtime :)

Couple of use cases I have used myself using the HttpServletRequest for the servlet and the Session for the WebSockets from the context:

  • Authentication: usually getting the Authorization header to parse and populate the SecurityContext with the result. We could provide an alternate means for hooking in authentication that doesn't involve exposing the raw request through the context though.
  • File uploads like already mentioned, so that could perhaps be expose differently too. You have to be able to get to them through the DataFetchingEnvironment in your method though. But perhaps a cleaner solution can be constructed for this which provides a nicer API for consumers.
  • WebSocket Session also used for authentication.
  • I use the HttpServletRequest as well to grab the remote IP address to verify subsequent calls come from the same IP address as where the JWT was generated for. Which is an additional security mechanism.

And these are just the ones I'm using, I obviously don't know what other use cases people might have atm...

Copy link
Contributor Author

@jmccaull jmccaull Jun 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spring boot commons example, it looks like the mutation is intended to do something with the file parts (upload)? Should they be passed through the DataFetchingEnvironment somehow instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't believe you can, since the DataFetchingEnvironment is part of graphql-java itself and is agnostic and wants to remain agnostic of underlying technology. So they're not going to support file parts. The only thing under our control to pass along to the resolvers is in fact the GraphQLContext which can be retrieved from the DataFetchingEnvironment. That's why these they were exposed in the context, so the consumer could then in that resolver grab the files and handle them appropriately.

Copy link
Member

@oliemansm oliemansm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored the GraphQLContext implementation a bit to have a DefaultGraphQLContext, a DefaultGraphQLServletContext and a DefaultGraphQLWebSocketContext. Because when you use subscriptions you'll use the GraphQLWebSocketContext and when you use query/mutation you use the GraphQLServletContext.

return new GraphQLContextBuilder() {
@Override
GraphQLContext build(HttpServletRequest httpServletRequest, HttpServletResponse httpServletResponse, Map<String, List<Part>> fileParts) {
new DefaultGraphQLServletContext(registry(), null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are currently the only reason why DefaultGraphQLServletContext needs to be public. Preferrably we can make that implementation class package private. Also it seems currently the context cannot be built using the builder to pass in the registry and subject. I'm guessing that part is still work in progress?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially planned just the one static method that required the registry/subject but added the other and hadn't yet added the with method. Will update. Previously the context objects were public, I could see being able to extend it and not having to rewrite the base interface being useful, do we not want to allow that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. but then we need to make the constructor protected instead of private. I tend to try to keep things package private usually so it's not part of the public api and I'm free to mess around with it without breaking anything for consumers. But in this case they're in fact very simple POJO's so might as well make them public.

public class DefaultGraphQLContextBuilder implements GraphQLContextBuilder {

@Override
public GraphQLContext build(HttpServletRequest httpServletRequest, HttpServletResponse httpServletResponse, Map<String, List<Part>> fileParts) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we're passing this fileParts all the way down to this builder after we've extracted them with some heavy logic within the AbstractGraphQLHttpServlet. We should extract that part into its own class for starters, but also: we might as well push the extraction all the way down to the builder where we construct the actual context. Seeing that the HttpServletRequest that we retrieve the parts from is passed down as well anyway. That would clean up a lot of intermediate methods that now have to pass along that fileParts parameter.

@jmccaull
Copy link
Contributor Author

I've also found a bug with the dispatching instrumentation where it isn't dispatching as expected past the first level of data fetching. Not sure if this is something I introduced or already existed. Working on enhancing the specification test to catch this.

@jmccaull
Copy link
Contributor Author

jmccaull commented Jun 21, 2019

So I've pushed a rough fix for the dispatching behavior. Need to clean it up and have yet to get to work on the other issues.
I realized performance testing this that there needed to be some sort of check for if each request was at the same "dispatching level" as the others, otherwise one would get fully processed, then as the others were getting there they would prematurely signal they were ready and dispatching essentially went back to one at a time. I can't seem to replicate this behavior in a simple test, as I think everything simply gets resolved too quickly. I also realized that this check needed to take into consideration the shape of the queries, otherwise one that had less depth would cause others to hang. The best case solution I could come up with perfectly dispatches queries that exactly match, or up until they differ. I achieved this by keeping track of the selection set with the dispatch state, and as soon as the selection set differs that comparison can no longer disqualify dispatching. This was fairly easy to create test for and seems to be working.
A side effect of this is that the instrumentation is now a bit beefier. In our system, we are heavily I/O limited so this still ends up still being a huge throughput gain. I could see some not needing this though and I plan on implementing two more context options- one for each context type but that stops the servlet from adding this dispatching instrumentation(or some similar mechanism to avoid adding the instrumentation).
Please look over the instrumentation/tests and let me know what you think. Thanks!

Edit: Actually, removing the completed executions and re-evaluating the dispatches makes tracking the selection set unnecessary, I'm not sure what I was thinking. I still think an option to disable the instrumentation all together is still worthwhile.

case PER_REQUEST_WITHOUT_INSTRUMENTATION:
//Intentional fallthrough
case PER_REQUEST_WITH_INSTRUMENTATION:
return new PerRequestBatchedInvocationInput(requests, schema, contextSupplier.get(), root);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I get the difference between PerQueryBatchedInvocationInput and PerRequestBatchedInvocationInput, but I think it would difficult for contextSupplier to provide different context for PerQueryBatchedInvocationInput, since the contextSupplier does not have visibility into which query the context is required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be up to the person customizing the ContextBuilder to keep in mind which setting they are going to be using. Do you think they need some part of the input object that they can't get from the http objects?

Copy link
Contributor Author

@jmccaull jmccaull Jun 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the intent for the context isn't that they are based on any property of the query, so I'm not sure it would be an issue

@jmccaull jmccaull changed the title WIP: Configurable per query or per request context/dataloader batching Configurable per query or per request context/dataloader batching Jun 25, 2019
@jmccaull
Copy link
Contributor Author

Change log/migration instructions for this PR:

  • Introduced a more defined package structure - import statements will need to be updated
  • BatchExecutionHandler removed - replaced with simpler BatchInputPreProcessor interface
    • BatchInputPreProcessor removes the dependency on http objects and no longer requires implementations to handle writing the result.
      *BatchInputPreProcessResult wraps either the batch to execute or the status code/message to return to the client
  • GraphQLContext has been split into three interfaces: GraphQLContext, GraphQLServletContext and GraphQLWebSocketContext
    • The default context implementations have been made immutable and can be created using builders obtained with static methods. See DeafultGraphQLContextBuilder for examples.
  • Four new context settings introduced to align more closely with Apollo-link-batch. "PER_QUERY_WITH_INSTRUMENTATION" is defaulted because it most closely resembles previous behavior. See readme.md for details

* @return if all managed executions are ready to be dispatched.
*/
public boolean allReady() {
List<Integer> dispatchStack = activeRequests.values().stream().findFirst().map(CallStack::getDispatchedLevels).orElse(Collections.emptyList());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not quite sure about the logic here, if activeRequests contains multiple elements, would findFirst() still make sense ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each time a level of the query hierarchy is started one of the active stacks will get incremented above the others. Because the futures are created serially, once that one finishes resolving they will all evaluate as ready according to the first part of the if. To make sure we only dispatch once each active query has created their load calls for the current level we just need to make sure that they all evaluate to ready and also have the same dispatch level completed, so it doesn't matter which one we start with, they just all need to be equal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in this context the first part of the if means there are no fetch calls that are unfinished and the second part is that each query has been processed for this step of the hierarchy. The instrumentation removes active requests in the instrumentation call to instrument the result to prevent this from causing deeper queries from hanging waiting on a higher level that will never execute to that depth. Does that help?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I could do something like .stream().distinct().count() <= 1 and it would be clearer

@oliemansm oliemansm merged commit 373869b into graphql-java-kickstart:master Jun 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants