-
Notifications
You must be signed in to change notification settings - Fork 323
Fix HadoopFileSource’s split size estimate #534
Fix HadoopFileSource’s split size estimate #534
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Can you please make sure to submit these fixes to Apache Beam (https://beam.apache.org/contribute/contribution-guide/) so they are not lost in Dataflow 2.x, which will be based on Beam?
Job job = Job.getInstance(); // new instance | ||
for (FileStatus st : listStatus(createFormat(job), job)) { | ||
size += st.getLen(); | ||
} | ||
} catch (IOException | NoSuchMethodException | InvocationTargetException | ||
| IllegalAccessException | InstantiationException e) { | ||
| IllegalAccessException | InstantiationException | InterruptedException e) { | ||
// ignore, and return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change does not look right. At the very least, you should restore the thread interrupted state when catching an InterruptedException.
Maybe move that to its own catch
block and handle specifically?
Fixed handling of InterruptedException. However I don't think this fix applies to beam. I don't think HadoopFileSource ever made it to apache beam. |
It got renamed to |
Thanks for pointing it out. Will submit a PR there as well. |
No description provided.