Skip to content
This repository was archived by the owner on Nov 11, 2022. It is now read-only.

FileBasedSink removes temp files from successful bundles #484

Merged
merged 1 commit into from
Nov 15, 2016

Conversation

jkff
Copy link
Contributor

@jkff jkff commented Nov 14, 2016

This is a weaker but backward-compatible version of #482.

R: @dhalperi @lukecwik

@@ -248,20 +248,20 @@ private void testRemoveTemporaryFiles(int numFiles, String baseTemporaryFilename
PipelineOptions options = PipelineOptionsFactory.create();
SimpleSink.SimpleWriteOperation writeOp = buildWriteOperation(baseTemporaryFilename);

List<File> temporaryFiles = new ArrayList<>();
List<String> temporaryFiles = new ArrayList<>();
List<File> outputFiles = new ArrayList<>();
for (int i = 0; i < numFiles; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a unit test that specifically passes in temporary files that would have never been matched.

Copy link
Contributor Author

@jkff jkff Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops. There was a bug (FWIW the bug is not present in beam). Thanks for pushing for the test.

Set<String> allMatches = new HashSet<>(matches);
allMatches.addAll(knownFiles);
LOG.debug(
"Removing {} temporary files matching {} ({} matched glob, {} additional known files)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log statement might be confusing since X != Y + Z because of the set union. Is there a better way to explain this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean: the last parameter of the log message, Z, is defined as X - Y.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm

This is a weaker but backward-compatible version of respective Beam
changes:
apache/beam#1050
apache/beam#1278
@lukecwik lukecwik merged commit 9c3af6e into GoogleCloudPlatform:master Nov 15, 2016
@jkff jkff deleted the file-based-sink branch November 15, 2016 21:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants