Skip to content
This repository was archived by the owner on Nov 11, 2022. It is now read-only.

Fix HadoopFileSource’s split size estimate #534

Merged

Conversation

igorbernstein2
Copy link
Contributor

No description provided.

Copy link
Contributor

@dhalperi dhalperi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Can you please make sure to submit these fixes to Apache Beam (https://beam.apache.org/contribute/contribution-guide/) so they are not lost in Dataflow 2.x, which will be based on Beam?

Job job = Job.getInstance(); // new instance
for (FileStatus st : listStatus(createFormat(job), job)) {
size += st.getLen();
}
} catch (IOException | NoSuchMethodException | InvocationTargetException
| IllegalAccessException | InstantiationException e) {
| IllegalAccessException | InstantiationException | InterruptedException e) {
// ignore, and return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change does not look right. At the very least, you should restore the thread interrupted state when catching an InterruptedException.

Maybe move that to its own catch block and handle specifically?

@igorbernstein2
Copy link
Contributor Author

Fixed handling of InterruptedException. However I don't think this fix applies to beam. I don't think HadoopFileSource ever made it to apache beam.

@dhalperi
Copy link
Contributor

@igorbernstein2
Copy link
Contributor Author

Thanks for pointing it out. Will submit a PR there as well.

@dhalperi dhalperi merged commit efd33cc into GoogleCloudPlatform:master Jan 26, 2017
@igorbernstein2 igorbernstein2 deleted the fix-hadoop-file-source branch January 26, 2017 20:10
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants