Sunday, June 9, 2019

ipywidgets.Text background-color

This took some time to figure out so I'll post it here in hope it helps someone else. To set the color of an ipywidgets Text box using CSS, you need to:

  • add a CSS class name to the Text widget
  • render an HTML <style> block for an <input> element descendant of that class
  • Set !important on the background-color, to prevent it being overridden by a later Jupyter-declared style.

For my case the Text widget was being included in a VBox, allowing the HTML widget containing the <style> to be included in the list.


data_input_style = "<style>.data_input input { background-color:#D0F0D0 !important; }</style>"
value_entry = ipywidgets.Text(value='')
value_entry.add_class('data_input')

children = [
    ...
    ipywidgets.HTML(data_input_style),
    value_entry,
    ...
]
ipywidgets.VBox(children=children, ...)

Wednesday, June 5, 2019

Discourse as a JupyterHub Service

JupyterHub is a system for managing cloud-hosted Jupyter Notebooks, allowing users to log in and spawning a notebook or Jupyterlab instance for them. JupyterHub has a notion of Services, separate processes either started by or at least managed by JupyterHub alongside the notebook instances. All JupyterHub Services appear at https://jupyterhub.example.com/services/name_of_service.

Discourse is a software package for a discussion forum, and quite nice compared to a number of the alternatives. Discourse is distributed as a Docker container, and strongly recommends using that container not trying to install it any other way. When running by itself on a server, Discourse uses docker-proxy to forward HTTP and HTTPS connections from the external IP address through to the container. In order to run Discourse on the same server as JupyterHub, we need to remove the docker-proxy and let it be handled by handled by JupyterHub's front-end Traefik reverse proxy, which is already bound to ports 80 and 443 on a hub server.

To run alongside JupyterHub we need to reconfigure Discourse to not use docker-proxy. The docker-proxy passes through SSL to be terminated within the container, while Traefik has to be able to see the URL path component in order to route the request, so we're also moving SSL termination out of Discourse and into Traefik.



Some searching turned up several articles looked relevant, but did not turn out to be applicable. To save the trouble:

  • Running other websites on the same machine as Discourse explains how to set up NGINX as a reverse proxy and use a unix domain socket to communicate from NGINX to Discourse. JupyterHub checks the syntax of the URL configured for its services, I didn't find a way to make a Unix socket work within the JupyterHub Services mechanism.
  • Discourse behind Traefik describes how to create Docker networks via Traefik configuration. Though this might have worked, I found it much easier to use HTTP over the docker0 interface.

For bootstrapping, discourse provides a discourse-setup script to ask a few questions and create an app.yml file used to drive construction of the docker container. discourse-setup fails if there is already a webserver on port 80, and I did not find a reasonable alternative to it. In my case, I briefly shut down the JupyterHub server and ran discourse-setup. Running discourse-setup on a separate VM and copying the resulting /var/discourse would likely also work.

Starting from the /var/discourse created by discourse-setup, perform the following steps to make it run as a JupyterHub service.

  1. cd /var/discourse
  2. edit containers/app.yml to let Traefik handle the reverse-proxy function. We comment out the external port in the expose section, which will disable docker-proxy and let us handle the reverse proxy function using traefik.
    ## which TCP/IP ports should this container expose?
    ## If you want Discourse to share a port with another
    ## webserver like Apache or nginx,
    ## see https://meta.discourse.org/t/17247 for details
    expose:
    #  - "80:80"   # http
    #  - "443:443" # https
      - "80"
    in the "env:" section at the bottom:
      ## TODO: The domain name this Discourse instance will respond to
      ## Required. Discourse will not work with a bare IP number.
      DISCOURSE_HOSTNAME: jupyterhub.example.com
    
      # Running Discourse as a JupyterHub Service
      DISCOURSE_RELATIVE_URL_ROOT: /services/discourse
    Replace the "run:" section with the recipe to adjust the URL path for /services/discourse:
    ## Any custom commands to run after building
    run:
        - exec:
            cd: $home
            cmd:
              - mkdir -p public/services/discourse
              - cd public/services/discourse && ln -s ../uploads && ln -s ../backups
        - replace:
           global: true
           filename: /etc/nginx/conf.d/discourse.conf
           from: proxy_pass http://discourse;
           to: |
              rewrite ^/(.*)$ /services/discourse/$1 break;
              proxy_pass http://discourse;
        - replace:
           filename: /etc/nginx/conf.d/discourse.conf
           from: etag off;
           to: |
              etag off;
              location /services/discourse {
                 rewrite ^/services/discourse/?(.*)$ /$1;
              }
        - replace:
             filename: /etc/nginx/conf.d/discourse.conf
             from: $proxy_add_x_forwarded_for
             to: $http_your_original_ip_header
             global: true
  3. Run:
    ./launcher rebuild app
    to construct a new docker container.
  4. Add the configuration for a Discourse service to JupyterHub. I'm using The Littlest JupyterHub, where we create a snippet in /opt/tljh/config/jupyterhub_config.d/discourse-service.py.
    Find the IP address to use within the output of "docker inspect app"; look in NetworkSettings for IpAddress and Ports.
    c.JupyterHub.services = [
        {
            'name': 'discourse',
            'url': 'http://172.17.0.2:80/',
            'api_token': 'no_token',
        }
    ]
  5. Then restart JupyterHub with the new configuration:
    tljh-config reload
    tljh-config reload proxy

Discourse should now appear on https://jupyterhub.example.com/services/discourse

If something doesn't work, logs can be found in:

sudo journalctl --since "1 hour ago" -u jupyterhub
sudo journalctl --since "1 hour ago" -u jupyterhub-proxy
sudo journalctl --since "1 hour ago" -u traefik
/var/discourse/shared/standalone/log/rails/unicorn.stderr.log
/var/discourse/shared/standalone/log/var-log/nginx/access.log

Monday, June 3, 2019

JupyterHub open lab notebook at login

Instructions for JupyterHub configuration state that to start JupyterLab by default, one should use a configuration of:

c.Spawner.args = ['--NotebookApp.default_url=/lab']

To start a classic Notebook by default, use:

c.Spawner.args = ['--NotebookApp.default_url=/tree']

To start the classic Notebook and open a specific ipynb file, use:

c.Spawner.args = ['--NotebookApp.default_url=/tree/path/to/file.ipynb']

One might therefore assume that opening /lab/path/to/file.ipynb would open a specific file in JupyterLab, but this does not work and results in an error. The correct configuration is /lab/tree:

c.Spawner.args = ['--NotebookApp.default_url=/lab/tree/path/to/file.ipynb']

Saturday, June 1, 2019

JupyterHub OAuth via gitlab.com: setting scopes

Recently I set up a JupyterHub instance, a system for cloud-hosting Jupyter notebooks. JupyterHub supports authentication by a number of different mechanisms. As the code for the notebook is hosted on GitLab, I set up OAuth to GitLab as the main authentication mechanism.

Gitlab supports a number of scopes to limit what the granted OAuth token is allowed to do:

apiGrants complete read/write access to the API, including all groups and projects.
read_userGrants read-only access to the authenticated user's profile through the /user API endpoint, which includes username, public email, and full name. Also grants access to read-only API endpoints under /users.
read_repositoryGrants read-only access to repositories on private projects using Git-over-HTTP (not using the API).
write_repositoryGrants read-write access to repositories on private projects using Git-over-HTTP (not using the API).
read_registryGrants read-only access to container registry images on private projects.
sudoGrants permission to perform API actions as any user in the system, when authenticated as an admin user.
openidGrants permission to authenticate with GitLab using OpenID Connect. Also gives read-only access to the user's profile and group memberships.
profileGrants read-only access to the user's profile data using OpenID Connect.
emailGrants read-only access to the user's primary email address using OpenID Connect.

However I found that if I didn't grant api permissions on the gitlab side, the authentication would always fail with "The requested scope is invalid, unknown, or malformed." It appears that the JupyterHub OAuth client was not requesting any specific scope, and that gitlab.com defaults to "api" — which is far too powerful a permission to grant for this purpose, as it allows read/write access to everything when all we really need to know is that the user exists.

Setting the OAuth scope for the JupyterHub client to request turns out to be quite simple to do in the configuration, albeit not documented:

  c.GitLabOAuthenticator.scope = ['read_user']

A pull request to add documentation on this for GitLabOAuthenticator has been submitted.