git - Use Jupyter together with file share or mounted folder -


how can synchronize notebooks between jupyter service , other services (google cloud storage or git repository)?

some background on question:

currently on way moving google's datalab own container. motivation have more control on data region (datalab beta offered in us) , packages want use current tensorflow version.

based on ideas google (see github), build own docker image , run on kubernetes cluster in google container engine. gcp package can installed i have explained. google uses node.js server sync git datalab instance - not able running self-deployed container in eu.

second try gcsfuse driver. 1 not work non-priviliged containers of kubernetes v1.0 , google container engine. full stop.

my docker file (based on google's gce datalab image):

from debian:jessie  # setup os , core packages run apt-get clean run echo "deb-src http://ftp.be.debian.org/debian testing main" >> /etc/apt/sources.list && \ apt-get update -y && \ apt-get install --no-install-recommends -y -q \     curl wget unzip git vim build-essential ca-certificates pkg-config \     libatlas-base-dev liblapack-dev gfortran \     libpng-dev libfreetype6-dev libxft-dev \     libxml2-dev \     python2.7 python-dev python-pip python-setuptools python-zmq && \ mkdir -p /tools && \ mkdir -p /srcs && \ cd /srcs && apt-get source -d python-zmq && cd  workdir /datalab   # setup google cloud sdk run apt-get install --no-install-recommends -y -q wget unzip git -y run wget -nv https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.zip && \ unzip -qq google-cloud-sdk.zip -d tools && \ rm google-cloud-sdk.zip && \ tools/google-cloud-sdk/install.sh --usage-reporting=false \     --path-update=false --bash-completion=false \     --disable-installation-options && \ tools/google-cloud-sdk/bin/gcloud config set --scope=installation \     component_manager/fixed_sdk_version 0.9.57 && \ tools/google-cloud-sdk/bin/gcloud -q components update \     gcloud core bq gsutil compute preview alpha beta && \ rm -rf /root/.config/gcloud  # install fuse driver gce run apt-get install -y lsb-release run echo "deb http://packages.cloud.google.com/apt gcsfuse-jessie main" >     /etc/apt/sources.list.d/gcsfuse.list run curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - run apt-get update && apt-get install -y gcsfuse run mkdir /datalab/mount  # setup python packages run pip install -u \     tornado==4.2.1 pyzmq==14.4.0 jinja2==2.7.3 \     jsonschema==2.5.1 py-dateutil==2.2 pytz==2015.4 pandocfilters==1.2.4 pygments==2.0.2 \     argparse==1.2.1 mock==1.2.0 requests==2.4.3 oauth2client==1.4.12 httplib2==0.9.2 \     futures==3.0.3 && \     pip install -u numpy==1.9.2 && \     pip install -u pandas==0.16.2 && \     pip install -u scikit-learn==0.16.1 && \     pip install -u scipy==0.15.1 && \     pip install -u sympy==0.7.6 && \     pip install -u statsmodels==0.6.1 && \     pip install -u matplotlib==1.4.3 && \     pip install -u ggplot==0.6.5 && \     pip install -u seaborn==0.6.0 && \     pip install -u notebook==4.0.2 && \     pip install -u pyyaml==3.11 && \     easy_install pip && \     find /usr/local/lib/python2.7 -type d -name tests | xargs rm -rf  # path configuration env path $path:/datalab/tools/google-cloud-sdk/bin env pythonpath /env/python  # ipython configuration workdir /datalab run ipython profile create default run jupyter notebook --generate-config add ipython.py /root/.ipython/profile_default/ipython_config.py  # install tensorflow. run wget -nv https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.7.1-cp27-none-linux_x86_64.whl && \   pip install --upgrade tensorflow-0.7.1-cp27-none-linux_x86_64.whl && rm tensorflow-0.7.1-cp27-none-linux_x86_64.whl   # add build artifacts add build/lib/gcpdata-0.1.0.tar.gz /datalab/lib/ add build/lib/gcpdatalab-0.1.0.tar.gz /datalab/lib/ add setup-repo.sh /datalab add setup-env.sh /datalab add run.sh /datalab run chmod 755 /datalab/*  # install build artifacts run cd /datalab/lib/gcpdata-0.1.0 && python setup.py install run cd /datalab/lib/gcpdatalab-0.1.0 && python setup.py install  run mkdir /datalab/content workdir /datalab/content expose 6006 expose 8123 # see https://github.com/ipython/ipython/issues/7062 cmd ["/datalab/run.sh"] 

ok, solved problem:

  1. use post-save hooks explained in previous post
  2. use several git commands in hook explained blog

here code (2.) archiving. goes ipython.py:

import os subprocess import check_call shlex import split  ...  def post_save(model, os_path, contents_manager):     """post-save hook doing git commit / push"""     if model['type'] != 'notebook':         return  # notebooks     workdir, filename = os.path.split(os_path)     if filename.startswith('scratch') or filename.startswith('untitled'):         return  # skip scratch , untitled notebooks     # git add / git commit / git push     check_call(split('git add {}'.format(filename)), cwd=workdir)     check_call(split('git commit -m "notebook save" {}'.format(filename)), cwd=workdir)     check_call(split('git push'), cwd=workdir)  c.filecontentsmanager.post_save_hook = post_save 

my run.sh utilizes setup-env.sh , setup-repo.sh google datalab , consequently depends on gcloud commands , kubernetes deployments credentials. otherwise please ensure extend dockerfile credentials.

cd /datalab/content . /datalab/setup-env.sh . /datalab/setup-repo.sh if [ $? != "0" ];     exit 1 fi cd /datalab/content/master_branch  # multiple branches not planed here! /usr/local/bin/jupyter notebook --ip=* --no-browser --port=8123 

Comments

Popular posts from this blog

html - Styling progress bar with inline style -

java - Oracle Sql developer error: could not install some modules -

How to use autoclose brackets in Jupyter notebook? -