Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch partial #106

Closed
wants to merge 1 commit into from
Closed

Fetch partial #106

wants to merge 1 commit into from

Conversation

VolkerHartmann
Copy link
Contributor

@VolkerHartmann VolkerHartmann commented Jan 30, 2019

Unfortunately all files have to be fetched to validate bag due to specification of kunze!

Only the last (two?) commits please.

@cneud
Copy link
Member

cneud commented Aug 8, 2019

Closing as this was included with #94.

@cneud cneud closed this Aug 8, 2019
@cneud cneud reopened this Aug 8, 2019
@cneud
Copy link
Member

cneud commented Aug 8, 2019

Need to adapt according to #106 (comment)

@kba
Copy link
Member

kba commented Aug 8, 2019

Best to wait for @VolkerHartmann, it's not a high priority issue, concerns only the validators AIFAICS. In the end, it cannot be an error if that rule is broken because real data is messy.

@cneud
Copy link
Member

cneud commented Aug 8, 2019

Exactly. But would be good if validation does catch this. IME most real data is messy ;)

@EEngl52
Copy link

EEngl52 commented May 20, 2021

@cneud @kba do you still want to adapt this with @VolkerHartmann no longer available?

@cneud
Copy link
Member

cneud commented May 20, 2021

@EEngl52 This has implications on spec (bagit), so we should not close without a solution (even if not currently worked on).

@kba
Copy link
Member

kba commented May 20, 2021

I don't see the use case for the fetch.txt mechanism or a strict "if in METS then in bag/fetch.txt" rule. We're using relative files anyway virtually exclusively and OCRD-ZIP is meant to store the OCR processing results, not be a 1:1 mapping of the METS. The main drawback of not using fetch.txt is that the bags are very large. The main advantage is that the bags are fully self-contained, no need for (possibly failing) network requests after receiving the bag.

I think there are other areas that are under- or unspecified that should be a much higher priority, such as the question how we can map back the relative filenames to HTTP URL for DFG Viewer compatibility or how we integrate a future workflow engine/parallelization setup with provenance etc.

@cneud
Copy link
Member

cneud commented May 20, 2021

100% ACK, but before closing here we should probably still mention this in the relating docs and what about the validation?

@kba
Copy link
Member

kba commented May 20, 2021

You're right, in fact I think we should set `Allow-Fetch.txt: false in the bagit profile and remove any mention of it and partial OCRD-ZIP. BTW the proposed change made it into the spec anyway and has been for two years.

I think the validation is actually easier without fetch.txt because WYSIWYG, no network problems interfering.

@bertsky
Copy link
Collaborator

bertsky commented May 20, 2021

Is this related to OCR-D/core#323 by any chance?

@kba
Copy link
Member

kba commented May 20, 2021

Is this related to OCR-D/core#323 by any chance?

Yes and by extension to OCR-D/core#176

@cneud
Copy link
Member

cneud commented Apr 27, 2022

Since afaict OCR-D/core#323 is unaffected, this should also be closed with #182.

@kba
Copy link
Member

kba commented Apr 28, 2022

Superseded by #182

@kba kba closed this Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants