[test][rfct] propagate pytest #779

M3ssman · 2021-12-29T17:05:32Z

Here are some more refactorings for both test_ocrd_page and test_workspace modules.
The major challenge was to get where paths-n-files actually are located, which is a quite hard for someone not familiar with the pushd_popd functionalities. Further, I divided test cases where beside regular assertions also exceptions were risen to have the exception handling being placed in it's own, separate test case.

One point that still puzzles me is the behavior of the ocrd_workspace remove_...- calls with the force parameter for invalid data (i.e., data which doesn't exist at all). I know we had already discussions about this, but still, I don't get it, since one might check the returned value from these calls, which in will result into 'None' for invalid data but are considered success.

Please take into account all the TODO marks I've placed. They can be dropped if the reasons clear out.

kba · 2022-01-12T11:22:40Z

One point that still puzzles me is the behavior of the ocrd_workspace remove_...- calls with the force parameter for invalid data (i.e., data which doesn't exist at all). I know we had already discussions about this, but still, I don't get it, since one might check the returned value from these calls, which in will result into 'None' for invalid data but are considered success.

There's different conditions here that are relevant for remove_file:

Does the mets:file referenced by the kwargs of remove_file exist or not in the METS XML?
Does the local file in the file system exist

With force=True, the method will succeed if either of these is False. Will only return None for the case that a file was not found but the ID was a regular expression IIRC.

And for remove_file_group:

Does the fileGrp exist or not in the METS-XML
Do all the files match the conditions for remove_file?

With force=True, will succeed if either of these is False.

Without force, you might be left with inconsistent data, e.g. an expected file to be deleted was not there (which might be because using the wrong mets:file/@ID for removal) or one of the AlternativeImages was not available. And for remove_file, without force, might lead to a half-deleted file group because one of the files in it was not found.

Would separating these aspects of concerns into different (and less generically named) kwargs help?

M3ssman · 2022-01-17T21:13:49Z

If there's a fight on two fronts behind the scenes, I guess this can be reflected by a attenzione:just proposing returned tuple of two numbers, where the first counts deleted Metadata-structures an the 2nd counts deleted physical files - both should match in an ideal world.

The use case with the half-life files because of a broken removal and therefore missing or ghost items sounds very reasonable. When processing large batches in automated fashion, this is a rather familiar scenario.

Instead of modifying the API, I prefer to return more meaningful responses.
If you can get along with this, too, I'll do an update.

M3ssman · 2022-02-03T16:10:53Z

To clear out my latest comment about the parameter force on example both remove_file/remove_file_group:

It checks if a) the METS itself matches the provided condition and if b) there are some physical files that would be affected. That's two different things. Now I propose, instead of returning None, to return a tuple where the first element accords to the metadata, and the second to physical files.

Having a file 0001.jpg and a group MAX, you end up with 4 combinations:

file referenced, but not physical existing => file removed from metadata: (1,0, actual using force: success)
file not referenced, but physical existing => file removed from directory: (0,1, actual using force: success)
file referenced, and physical existing => file removed from both: (1,1)
file neither referenced, nor physical existing => nothing done (0,0, actual None)

This reflects all 4 behaviors which can be checked/tested independently.
With the current implementation one cannot check from outside if any inconsistencies were encountered.
But realize inconsistencies might reveal some different problems/errors as well.

If you follow me, I would adjust the implementations from the remove-calls this way. And since you did not differentiate this before, it wouldn't affect existing client-code.

M3ssman · 2022-02-03T16:15:28Z

This holds for removals of complete mets:fileGrp as well, where the behavior could even reflect how many single files belonging to this group were dropped. One might even think of an inconsistent situation, where x files are referenced in the METS, but actually y files do exists physically.

kba · 2022-02-03T18:14:33Z

I am still not 100% sure wether I understand how conditions 2) and 4) might arise - if a file is not referenced in the metadata, it will never be deleted. We don't delete files not related to the workspace, at least we should not.

But you seem to have a clear idea on how you'd go about implementing it and having a more expressive return value instead of a void method is indeed backwards-compatible, so by all means, go ahead, it probably becomes clearer with a (proof-of-concept) implementation 👍

M3ssman · 2022-02-04T11:14:55Z

You're right concerning this, maybe these scenarios are somewhat too academical.

I looked again at Workspace.remove_file , and still the behavior and outcome is very different. The more I gaze, it feels like the parameters force and keep_file do overlap, at least when both are negated. Setting parameter force implicitly to True when the main object's property is set to overwrite feels like this parameter should also be an object property. Further, according to the doc, using the page_recursive and page_same_group one can push out similar images - a behavior that intersects very much with Workspace.remove_file_group. Since from method name I'd expect that a call of Workspace.remove_file removes at most a single file.

To get on with the overall topic of just transforming the tests, I've updated the PR from recent master and I guess we should get along with that before this freezes again.

kba

🎉 Great work! Let's just make sure that all asserts are actually asserting and it runs in Python 3.7.

tests/model/test_ocrd_page.py

tests/test_workspace.py

[test][rfct] propagate pytest

6958f9a

kba mentioned this pull request Jan 13, 2022

Consistently use pytest for testing OCR-D/zenhub#27

Open

2 tasks

kba requested a review from bertsky February 2, 2022 10:13

Merge branch 'master' into feat/rfct/pytest4

13f738b

M3ssman mentioned this pull request Feb 5, 2022

[test][fix] capsys fixture recognizes streams #795

Open

kba requested changes Feb 5, 2022

View reviewed changes

tests/model/test_ocrd_page.py Show resolved Hide resolved

tests/model/test_ocrd_page.py Outdated Show resolved Hide resolved

tests/test_workspace.py Outdated Show resolved Hide resolved

kba added 2 commits February 5, 2022 15:58

wrap shutil.copytree for python <= 3.7

ced76d0

test_ocrd_page: assert with ==

55fd25a

kba approved these changes Feb 5, 2022

View reviewed changes

test_ocrd_page: typo 6 -> 65

08d661c

kba merged commit 197437b into OCR-D:master Feb 5, 2022

kba deleted the feat/rfct/pytest4 branch February 5, 2022 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[test][rfct] propagate pytest #779

[test][rfct] propagate pytest #779

M3ssman commented Dec 29, 2021

kba commented Jan 12, 2022 •

edited

Loading

M3ssman commented Jan 17, 2022

M3ssman commented Feb 3, 2022

M3ssman commented Feb 3, 2022

kba commented Feb 3, 2022

M3ssman commented Feb 4, 2022

kba left a comment

[test][rfct] propagate pytest #779

[test][rfct] propagate pytest #779

Conversation

M3ssman commented Dec 29, 2021

kba commented Jan 12, 2022 • edited Loading

M3ssman commented Jan 17, 2022

M3ssman commented Feb 3, 2022

M3ssman commented Feb 3, 2022

kba commented Feb 3, 2022

M3ssman commented Feb 4, 2022

kba left a comment

Choose a reason for hiding this comment

kba commented Jan 12, 2022 •

edited

Loading