ocrd_zip: drop Manifestation-Depth, disallow fetch.txt #182
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR removes the
Ocrd-Manifestation-Depth
parameter and disallows thefetch.txt
mechanism.We introduced these to allow for iterative ingestions of only the changed files into OCR-D GT endpoints like the OCR-D GT Repo or OLA-HD.
However, I think this flexibility is a premature optimization. Yes, workspaces can become very large and ingesting full manifestations for every update is inefficient. But bandwidth has not been an issue so far and it will be difficult to map these mechanisms to the (messy) real-life data we want to process, e.g. with hard-to-categorize
@xlink:href
(thinkfile:/
URL from Goobi/Kitodo).I therefore think it would be better if we focussed on packaging all the OCR-D produced data in a well-defined way and ensure that data consumers don't have to do any extra steps (that might fail!) to create a complete manifestation. "What you see is what you get" is more important than maximum efficiency for re-ingestion.
We should still have such a mechanism for updating an ingested OCRD-ZIP but we should find a more efficient and less ambiguous way POSTing an an incomplete bag (such as a set of patches against the contents of the OCRD-ZIP or an API to PUT specific results in the OCRD-ZIP).