Skip to content

Multithreading (or rather multiprocessing) #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
XorUnison opened this issue May 25, 2020 · 12 comments
Open

Multithreading (or rather multiprocessing) #80

XorUnison opened this issue May 25, 2020 · 12 comments
Labels
enhancement Additions and improvements in general

Comments

@XorUnison
Copy link
Collaborator

XorUnison commented May 25, 2020

A small disclaimer, this is more of a longterm issue with quite a bit of planning before any PR with actual merge intent should ever be attached to this.

First of all, what should we use?
There's the threading module, however it does not use additional cores. Using it isn't advised thusly, because it's theoretically possible max performance improvement is just +100% for all scenarios (you know, two threads per core usually), and actual performance increases are expected to be somewhere around +10% to +50%. Nothing irrelevant, but the simple fact is we can do better.

There's the multiprocessing module, and this is almost what we should use. There's also the multiprocess module that should generally be more robust (better pickling etc).
So multiprocess it is.

So, can it work? Yes it can. I've had to jump through some hoops but I managed to hack something together to multiprocess 4 simple scenes side by side. Theoretically possible speed bonus is +300% and actual was... +200%. Now that is nothing to scoff at. Obviously with more things processed simultaneously both theoretical and actual speed goes up, capped really only by the running hardware and available splitting. On my hardware with 16 cores I expect to be able to reach an actual improvement of about +1000% for example. That's around an order of magnitude so... it'd change things up a lot.

Alright, we've got the intent and the module, now onto the plan.

First we should enable multiprocessing for scenes. Multithreading is really easy, but multiprocess is a bit more picky. As I said, I have gotten scenes rendered in it but something about the exact implementation in extract_scenes.py is still making it fail while pickling. I'm not sure what it is, I just know that it can be fixed, since I was able to sidestep it.
(Fixed, see next post for that)

Once that is done we should be able to use manim (assuming 4 scenes in a file) with something like this:
manim project -at
And have all 4 scenes dropped into their own process.

Once this is implemented and works well, we can move on to the real prize.
In order to make manim faster in general we can split scenes into intervals, and then have each interval be handled by a separate process. For instance if a scene has 8 animations then we could split them like [1,2], [3,4], [5,6], [7,8] and throw each interval into its own process.
Now this would mean that some calculations are done multiple times. While the process rendering [1,2] would have no overhead, the one rendering [3,4] would have to do the calculations of [1,2] before it can start rendering. However those calculations are usually very, very short compared to the rendering process, so that's not really much of an issue.

Last but not least, we'll want this multiprocessing to be an option, not replacing the current way of serial rendering. Aside from making sure we have options if something ever causes issues, some people use manim to render stuff that is actually very heavy in calculations, like fractals. For those cases multiprocessing could actually be actively detrimental, and aside from just needlessly hogging computing power could also overflow the RAM. So multiprocessing shouldn't be on by default, but when it's in we should make sure everyone knows it's there. Maybe even drop a small message every time manim runs with just 1 process.

@XorUnison XorUnison added the enhancement Additions and improvements in general label May 25, 2020
@XorUnison
Copy link
Collaborator Author

XorUnison commented May 25, 2020

Alright, the pickling problem has been solved. The issue was in the way manim imported the project module, by spec. Importing it normally works.

I've made a new branch with the according edited file, (and again, no PRs and merges for that branch):
Edited extract_scene.py

The changes are simple:

  1. Import the multiprocess library
  2. Change the way the project module is imported
  3. Switch the direct scene rendering call for a process wrapped one
  4. Turn off open_file_if_needed. No need to fix this, when someone renders multiple videos at once putting them into preview doesn't make sense anyway (unless we'd put them into a playlist, but that's not exactly a high priority I'd think)

The problems that are still left:

  1. Simple as this implementation is, currently it just replaces serial rendering instead of adding the option.
  2. Some of you might remember I said that with threading the visual feedback looked almost okay... That is certainly not the case anymore. The processes now all dump their progress into the same bar, and the info dump on created files and played animations usually isn't a beauty either.

I really need to do other things for a while though, so I'm leaving that cleaned up Proof of Concept file not just for others but also for myself to come back to later as well.
And if someone wants to test the waters with adapting the UI already that'd be welcome as well. I won't be doing that as that's completely out of my realm of experience.

@XorUnison
Copy link
Collaborator Author

There's now an issue with the new Tex handling, which will presumably be fixed by #98, so I'm waiting on that to continue.

I've worked a bit more on the likely to be implemented structure, and will leave that here mostly for myself, and as a general reference.

In order to properly implement multiprocessing we'll likely have to do something like this:

  1. Once a final config file is in place, it'll need a bool for whether multiprocessing is used or not and an argument for the process count. Or maybe make it simpler and just have 1 process act as a False upon check. If it isn't enabled (by default or cli), trod along normally, else switch to the new code.
  2. Initialize a worker pool in extract_scene.py. Each scene will be thrown into the pool using an async method. Which one is yet to be determined (apply, map, starmap, etc.)
  3. The splitting logic will likely have to be placed in scene.py, within self.construct(). Due to the way it's structured the only real place I see this working is right in play().
    We'll probably have to wrap and adapt this part in specific:
        animations = self.compile_play_args_to_animation_list(*args, **kwargs)
        self.begin_animations(animations)
        self.progress_through_animations(animations)
        self.finish_animations(animations)
  1. These newly created play processes all need to be tracked and waited for to finish within the scene, before moving on from self.construct(), else only evil things can be expected to happen:
  2. Because the partial_movie_file_list.txt should be produced only after all processes for play statements are finished, there should be no problems here. If somehow videos are spliced together in wrong order, this would be the place to look.

@leotrs
Copy link
Contributor

leotrs commented Jul 12, 2020

We live in a post-#98 world. Can we revisit this?

Also, RE point 1. above: you don't need a bool AND an argument. You can just set the default number of workers to 1 for no multiprocessing.

@XorUnison
Copy link
Collaborator Author

We live in a post-#98 world. Can we revisit this?

Not forgetting about this, but while #98 was brewing I had been working on the 3 new geometry classes (#187) to which I still I need some input too. Since both are pretty much entirely on my shoulders working on one delays the others and I'm not sure which to prioritize. The classes definitely have more general utility.

Also, RE point 1. above: you don't need a bool AND an argument. You can just set the default number of workers to 1 for no multiprocessing.

Right, like I mentioned there.

@XorUnison
Copy link
Collaborator Author

Making a note to myself here to also wait for #166 to be finished since it touches the same files multiprocessing has to be spliced into.

@leotrs
Copy link
Contributor

leotrs commented Sep 2, 2020

Just wanted to say that there are some changes planned down the line for the config system that would work perfectly with multiprocessing but not with multithreading. So in case there is a change from multiprocessing to multithreading down the line, please let me know first so we can discuss.

@leotrs
Copy link
Contributor

leotrs commented Nov 2, 2020

Update here:

The next TO DO item on the config system is to make it not global anymore, but local to each Scene class. Once this is done, it should be fairly simple to implement parallel scene rendering using multiprocessing.

There have also been refactors in the scene file writer that may make this easier.

@leotrs
Copy link
Contributor

leotrs commented Dec 3, 2020

Related to multiprocessing: sphinx has an option to run in parallel. If we make sure our docs can build in parallel, then we should be able to speed up sphinx builds by 2x or 4x (assuming the RTD machines have many CPUs...)

@XorUnison
Copy link
Collaborator Author

Just wanted to say that there are some changes planned down the line for the config system that would work perfectly with multiprocessing but not with multithreading. So in case there is a change from multiprocessing to multithreading down the line, please let me know first so we can discuss.

This isn't ever going to happen though. In terms of performance multiprocessing would be a massive increase in speed for most cases while multithreading would be... well not completely useless, but honestly, mostly.

The next TO DO item on the config system is to make it not global anymore, but local to each Scene class. Once this is done, it should be fairly simple to implement parallel scene rendering using multiprocessing.

There have also been refactors in the scene file writer that may make this easier.

Speaking of which, how far along is this exactly?

Related to multiprocessing: sphinx has an option to run in parallel. If we make sure our docs can build in parallel, then we should be able to speed up sphinx builds by 2x or 4x (assuming the RTD machines have many CPUs...)

Sounds good. Is my still planned multiprocessing project related to implementing that at all though? If, how?

@leotrs
Copy link
Contributor

leotrs commented Dec 5, 2020

This isn't ever going to happen though. In terms of performance multiprocessing would be a massive increase in speed for most cases while multithreading would be... well not completely useless, but honestly, mostly.

Fair!

The next TO DO item on the config system is to make it not global anymore, but local to each Scene class. Once this is done, it should be fairly simple to implement parallel scene rendering using multiprocessing.
There have also been refactors in the scene file writer that may make this easier.

Speaking of which, how far along is this exactly?

Honestly, the only thing left to do is make each Scene have its own instance of a ManimConfig object. This is not a difficult PR, but it will break basically every scene in existence. So there's that to consider. Is this something that's blocking this project? We could try to speed that up if so.

Related to multiprocessing: sphinx has an option to run in parallel. If we make sure our docs can build in parallel, then we should be able to speed up sphinx builds by 2x or 4x (assuming the RTD machines have many CPUs...)

Sounds good. Is my still planned multiprocessing project related to implementing that at all though? If, how?

Well, the manim_directive uses manim to render all the examples. Each directive can be marked as safe or unsafe to parallelize. Right now, it is marked as unsafe because manim has no support for parallelization. So in the course of working on this project, it would be good to keep in mind for further down the line. (Having said that, I don't actually know if multiprocessing would count as "safe", or if it's something that sphinx would even recognize...)

@MrDiver MrDiver moved this to 🆕 New in Dev Board Jun 18, 2022
@jeertmans
Copy link
Contributor

Hi all! Is there any update on this?

@XorUnison
Copy link
Collaborator Author

Unfortunately not from my side at least. I can't live off of this and have other stuff to attend, especially right now, so I simply can't tackle it. I'd have to reconnect with the current state of the repo too. What originally kept me tethered to Manim was somewhat temporarily severed by the pandemic. I might get back in touch with all of this eventually, but if I do... it will be another year, and that's the optimistic estimate.

vikingout pushed a commit to vikingout/manimCommunity that referenced this issue May 17, 2025
Explicitly add a license : CC BY-NC-SA
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Additions and improvements in general
Projects
Status: 🆕 New
Development

No branches or pull requests

3 participants