Skip to content

Reimplementation using Provisioner architecture #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

caenrigen
Copy link
Contributor

@caenrigen caenrigen commented Mar 27, 2025

Hi!

My prototype is ready.

I think the easiest if for you to have a look at the new README on GitHub: https://github.com/caenrigen/sshpyk/blob/provisioner/README.rst and to let me know your thoughts.

The meat of the code is inside provisioner.py. Docs on the provisioner concept over here in the jupyter_client.

Both the cli tool and the README was largely AI-written in interest of time, etc.. I did a few tests, its seemed to work as intended.

Hope you find some time to give it a try. I tested it with JupyterLab between macOS machines. Would be good to confirm things works well between other systems.

Fixes #2 in principle

@caenrigen
Copy link
Contributor Author

As a quick update I am working as well on a draft of the sshfs features that allows to mount local dirs on remote: https://github.com/caenrigen/sshpyk/tree/sshfs

@schiebel
Copy link
Contributor

Hi @caenrigen
Thanks very much for this addition. I appreciate the work you have put in! The kernel provisioner architecture is new since I last looked at the Jupyter documentation. It seems like a great improvement. The sshpyk cli is also very nice! I will test this today.

In looking at the code changes, there was one piece of functionality that it seems may have been excluded. The current behavior of ls is to use green text to indicate kernel specifications that seem to be valid (e.g. the local and remote python executables can still be located) and red text to indicate kernel specifications that seem to be broken. This is useful for conda users because conda environments are created and removed often. The choice of red/green as an indicator may not have been the best choice, but I think this functionality is useful... it is also possible that I missed where this happens in cli.py... :-)

@caenrigen
Copy link
Contributor Author

Hi,

Thank you for coming back to this!

You are right that there is some functionality that is missing indeed from the previous implementation.
I agree that the ability to verifying "kernel integrity" is very useful.
I intend to improve the functionality around this part, e.g. doing some validation when adding a new kernel to prevent adding a configuration that seems broken upfront (unless the user forces it)

More to come :)

@caenrigen
Copy link
Contributor Author

caenrigen commented Mar 31, 2025

Hi, in case you did not have a look yet, I just pushed some sanity checks for the remote and split the output into 2 tables one for local and one for remote.

Another thing that might be missing in my PR is handling of remote env. But I think the best way to deal with that might be to give users the option to source some specific file on remote before executing remote commands if needed, e.g. conda activate my_env.

@caenrigen
Copy link
Contributor Author

caenrigen commented Apr 1, 2025

Hi, in case you did not have a look yet, I just pushed some sanity checks for the remote and split the output into 2 tables one for local and one for remote.

Printing tables does not work well since several parameters of interest are paths that can get very long.
I changed the output to a vertical approach. You can see a sample in the latest README.

This is also handy for my sshfs branch which has some more configs in the kernel.json to be displayed when running sshpyk list command. Same things for other configurable features we might want

@schiebel
Copy link
Contributor

schiebel commented Apr 3, 2025

Thanks very much for the updates. The reoriented listing seems good. I'll try it out in the morning.

@caenrigen
Copy link
Contributor Author

Btw, I ended up making more changes/improvements in my sshfs branch which I think is also ready now.

Would you mind if I add the sshfs commits in this PR?

@schiebel
Copy link
Contributor

schiebel commented Apr 3, 2025

Would you mind if I add the sshfs commits in this PR?

Thanks for the update. Lets do the sshfs in a separate PR so that I can look at the changes separately.

@caenrigen
Copy link
Contributor Author

Lets do the sshfs in a separate PR so that I can look at the changes separately.

Sounds good. And in the meantime I run into some bugs that I need to fix on that branch

@caenrigen
Copy link
Contributor Author

I run into some issues while debugging sshfs, so ended up simplifying the handling of the stdout/stderr of subprocesses.
These changes are in this commit caenrigen@93874e7 on the sshfs branch

@schiebel
Copy link
Contributor

schiebel commented Apr 3, 2025

Thanks for the sshfs update.
I've been trying to understand how the valid and working kernels are distinguished in the list output. I have some valid kernels (including the last one below) and some invalid kernels (including the next to last one below). How does the user know when there is a problem with the kernel state? It is OK if this is something for me to fix, if this is not useful for your application.
PR006_Problem_002

@caenrigen
Copy link
Contributor Author

How does the user know when there is a problem with the kernel state?

Perfectly valid question :)

The "problem" is that my reimplementation changes the way the information of the remote kernel is stored in kernel.json. Effectively this means it is not backwards compatible with the previous implementation. This is because the provisioner facilities require information to be stored in kernel.json["meta"]["kernel_provisioner"]["config"].

Here is an example of a new SSH kernel.json

{
  "argv": [],
  "env": {},
  "display_name": "Python3.13 (Remote MacBook Pro)",
  "language": "python",
  "interrupt_mode": "message",
  "metadata": {
    "kernel_provisioner": {
      "provisioner_name": "sshpyk-provisioner",
      "config": {
        "ssh": null,
        "ssh_host_alias": "mbp_ext",
        "remote_python_prefix": "/opt/homebrew/anaconda3/envs/g",
        "remote_kernel_name": "python3",
        "remote_kernel_launch_timeout": 15
      }
    }
  }
}

And here is how my sshpyk list looks like:

Screenshot 2025-04-04 at 09 38 57

The green (v) [or red (x) / cyan (?)] indicate the status per "component". More verifications could be added.

@caenrigen
Copy link
Contributor Author

So in your case, all kernels are considered to be Local hence no verification is being executed.

I understand this is a bit of a bummer for someone who is using the previous version of sshpyk. You have to add the kernel using the new sshpyk add command, or edit the kernel.json directly to make it compatible with what the code in this PR expects.

As a side effect of the new implementation, detecting which kernels belong to sshpyk is more robust since the kernel.json must state explicitly the usage of the sshpyk-provisioner.

The meaning of each verification/check (v) perhaps needs to be explicit, thoughts?

Also worth noting that my implementation is intentionally agnostic to how the remote kernel is implemented and what command it runs to launch the corresponding kernel. It delegates that part to jupyter itself by running on remote jupyter-kernel --KernelApp.kernel_name=.... Because of this, for now, the code only checks if jupyter-kernel executable is available in the specified python prefix. I think it would be good to be able to check the executable of the kernel itself, i.e. the argv[0] (e.g. python) but it is a bit more complex to keep it compatible with any kernel language and what is the absolute path of the executable if the kernel.json does not specify an absolute path.

@schiebel
Copy link
Contributor

schiebel commented Apr 7, 2025

Sorry for the slow reply. I have been on travel since Thursday evening.

Thanks for the explanation. It is probably fine if listing old sshpyk kernels does not work as expect. However, I also find that the new provisioner implementation is not backwards compatible. Newly created kernels do not work with nteract. The jupyter_client docs indicate that it is possible to maintain backwards compatibility. It seems like this may be done by populating the argv field (as well as the provisioner stanza of the metadata) of the kernel specification:
Screenshot 2025-04-06 at 10 24 03 PM
but I am not completely sure.

These are the errors I see when starting a newly configured kernel in nteract:

Screenshot 2025-04-06 at 9 52 24 PM Screenshot 2025-04-06 at 9 53 13 PM

@caenrigen
Copy link
Contributor Author

caenrigen commented Apr 7, 2025

Hi!

I understand the problem and I run into it myself in VS Code. The issue is that many of the tools that launch Jupyter kernels make the assumption that only the argv is required to launch the kernel and that the connection file is one of the args. Imo that is a bad assumption because Jupyter might require extra data from the json for launching a kernel correctly.

Nonetheless, I do agree that we want to be able to launch an ssh kernel independently of a full JupyterLab/Jupyter server. Which should make it backwards compatible with tools that rely on the argv.

I did a quick test that did not work, but I want to come back to it. The idea is to assemble an argv that invokes some Jupyter (or sshpky) command that internally runs the provisioner AND uses the provided connection file, but there is some devil in the detail because the local connection file needs to be compatible with the remote ofc, and some details about which one is generated first, etc.

@caenrigen
Copy link
Contributor Author

I refactored the code to include the relevant changes from my sshfs branch.

Pull the code of this branch and add an ssh kernel with sshpyk add.
I figured out a command to put in the argv (in the kernel.json) that should make it work with nteract.

Give it a try and let me know

@caenrigen
Copy link
Contributor Author

Small update, I realised I am not implementing a proper shutdown sequence, needs some improvements

@schiebel
Copy link
Contributor

Thanks for the update and all of the work! I'll give it a try

@schiebel
Copy link
Contributor

Hi @caenrigen unfortunately, I have had no luck using this new version. Using argv to start the kernel with bin/juptyer-kernel seems like a good approach, but even though nteract shows that the remote kernel is connected. I cannot execute any commands (see below). The initial errors are:

app.js:3664 [KernelApp] [SSHPYK] Remote KernelApp launched, RPID=1335524
app.js:3664 [KernelApp] [SSHPYK] Connection file on remote machine: /users/dschieb/.local/share/jupyter/runtime/kernel-df05541d-58cb-4355-9c45-72c1f0b9834f.json
app.js:3664 [KernelApp] [SSHPYK] SSH tunnels launched, PID=36856
app.js:3664 [KernelApp] ERROR | Session key was not preserved (key_prev=b'81bab72f-a435-4698-ad08-0766911dfba3' vs key_new=b''Traceback (most recent call last):  File "/Users/drs/.conda-miniforge3/envs/bokeh/lib/python3.10/site-packages/jupyter_client/manager.py", line 87, in wrapper    out = await method(self, *args, **kwargs)  File "/Users/drs/.conda-miniforge3/envs/bokeh/lib/python3.10/site-packages/jupyter_client/manager.py", line 439, in _async_start_kernel    await self._async_launch_kernel(kernel_cmd, **kw)  File "/Users/drs/.conda-miniforge3/envs/bokeh/lib/python3.10/site-packages/jupyter_client/manager.py", line 354, in _async_launch_kernel    connection_info = await self.provisioner.launch_kernel(kernel_cmd, **kw)  File "/Users/drs/.conda-miniforge3/envs/bokeh/lib/python3.10/site-packages/sshpyk/provisioning.py", line 534, in launch_kernel    raise RuntimeError(RuntimeError: Session key was not preserved (key_prev=b'81bab72f-a435-4698-ad08-0766911dfba3' vs key_new=b''
Screenshot 2025-04-13 at 4 04 34 PM

I'm not sure what the Session key was not preserved error means.

@schiebel
Copy link
Contributor

schiebel commented Apr 14, 2025

I find that the key file that is created on the remote host does not contain a key:

{
  "shell_port": 33353,
  "iopub_port": 40453,
  "stdin_port": 40385,
  "control_port": 60119,
  "hb_port": 53633,
  "ip": "127.0.0.1",
  "key": "",
  "transport": "tcp",
  "signature_scheme": "hmac-sha256",
  "kernel_name": "python3"
}

After the failed (from the local perspective) startup of the remote kernel, two processes are started on the remote system:

drs  1343124  5.6  0.0 366516 49676 ?        Sl   21:48   0:00 /home/zuul06/conda/envs/py310-sshipy/bin/python3.10 /home/zuul06/conda/envs/py310-sshipy/bin/jupyter-kernel --KernelApp.kernel_name=python3 --ConnectionFileMixin.Session.keyfile=/dev/stdin
drs  1343126  7.1  0.0 973456 61260 ?        Ssl  21:48   0:00 /home/zuul06/conda/envs/py310-sshipy/bin/python3.10 -m ipykernel_launcher -f /home/zuul06-2/drs/_tilda-home-dot-local_/share/jupyter/runtime/kernel-f4ff1758-6ec7-4937-8a3a-8996a278633f.json

after shutdown one still remains:

drs  1343124  0.4  0.0 366516 49676 ?        Sl   21:48   0:00 /home/zuul06/conda/envs/py310-sshipy/bin/python3.10 /home/zuul06/conda/envs/py310-sshipy/bin/jupyter-kernel --KernelApp.kernel_name=python3 --ConnectionFileMixin.Session.keyfile=/dev/stdin

@caenrigen
Copy link
Contributor Author

Hi, thanks for trying it out, I think I came across this problem and fixed it on my sshfs branch, I need to do some more testing and porting it to this branch.

The process remaining on the remote system will be solved too.

@caenrigen
Copy link
Contributor Author

Give a try again, made some substancial changes to the logic of shutdowns/restarts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Questions about contributing
2 participants