git - Use Jupyter together with file share or mounted folder -
how can synchronize notebooks between jupyter service , other services (google cloud storage or git repository)?
some background on question:
currently on way moving google's datalab own container. motivation have more control on data region (datalab beta offered in us) , packages want use current tensorflow version.
based on ideas google (see github), build own docker image , run on kubernetes cluster in google container engine. gcp package can installed i have explained. google uses node.js server sync git datalab instance - not able running self-deployed container in eu.
second try gcsfuse driver. 1 not work non-priviliged containers of kubernetes v1.0 , google container engine. full stop.
my docker file (based on google's gce datalab image):
from debian:jessie # setup os , core packages run apt-get clean run echo "deb-src http://ftp.be.debian.org/debian testing main" >> /etc/apt/sources.list && \ apt-get update -y && \ apt-get install --no-install-recommends -y -q \ curl wget unzip git vim build-essential ca-certificates pkg-config \ libatlas-base-dev liblapack-dev gfortran \ libpng-dev libfreetype6-dev libxft-dev \ libxml2-dev \ python2.7 python-dev python-pip python-setuptools python-zmq && \ mkdir -p /tools && \ mkdir -p /srcs && \ cd /srcs && apt-get source -d python-zmq && cd workdir /datalab # setup google cloud sdk run apt-get install --no-install-recommends -y -q wget unzip git -y run wget -nv https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.zip && \ unzip -qq google-cloud-sdk.zip -d tools && \ rm google-cloud-sdk.zip && \ tools/google-cloud-sdk/install.sh --usage-reporting=false \ --path-update=false --bash-completion=false \ --disable-installation-options && \ tools/google-cloud-sdk/bin/gcloud config set --scope=installation \ component_manager/fixed_sdk_version 0.9.57 && \ tools/google-cloud-sdk/bin/gcloud -q components update \ gcloud core bq gsutil compute preview alpha beta && \ rm -rf /root/.config/gcloud # install fuse driver gce run apt-get install -y lsb-release run echo "deb http://packages.cloud.google.com/apt gcsfuse-jessie main" > /etc/apt/sources.list.d/gcsfuse.list run curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add - run apt-get update && apt-get install -y gcsfuse run mkdir /datalab/mount # setup python packages run pip install -u \ tornado==4.2.1 pyzmq==14.4.0 jinja2==2.7.3 \ jsonschema==2.5.1 py-dateutil==2.2 pytz==2015.4 pandocfilters==1.2.4 pygments==2.0.2 \ argparse==1.2.1 mock==1.2.0 requests==2.4.3 oauth2client==1.4.12 httplib2==0.9.2 \ futures==3.0.3 && \ pip install -u numpy==1.9.2 && \ pip install -u pandas==0.16.2 && \ pip install -u scikit-learn==0.16.1 && \ pip install -u scipy==0.15.1 && \ pip install -u sympy==0.7.6 && \ pip install -u statsmodels==0.6.1 && \ pip install -u matplotlib==1.4.3 && \ pip install -u ggplot==0.6.5 && \ pip install -u seaborn==0.6.0 && \ pip install -u notebook==4.0.2 && \ pip install -u pyyaml==3.11 && \ easy_install pip && \ find /usr/local/lib/python2.7 -type d -name tests | xargs rm -rf # path configuration env path $path:/datalab/tools/google-cloud-sdk/bin env pythonpath /env/python # ipython configuration workdir /datalab run ipython profile create default run jupyter notebook --generate-config add ipython.py /root/.ipython/profile_default/ipython_config.py # install tensorflow. run wget -nv https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.7.1-cp27-none-linux_x86_64.whl && \ pip install --upgrade tensorflow-0.7.1-cp27-none-linux_x86_64.whl && rm tensorflow-0.7.1-cp27-none-linux_x86_64.whl # add build artifacts add build/lib/gcpdata-0.1.0.tar.gz /datalab/lib/ add build/lib/gcpdatalab-0.1.0.tar.gz /datalab/lib/ add setup-repo.sh /datalab add setup-env.sh /datalab add run.sh /datalab run chmod 755 /datalab/* # install build artifacts run cd /datalab/lib/gcpdata-0.1.0 && python setup.py install run cd /datalab/lib/gcpdatalab-0.1.0 && python setup.py install run mkdir /datalab/content workdir /datalab/content expose 6006 expose 8123 # see https://github.com/ipython/ipython/issues/7062 cmd ["/datalab/run.sh"]
ok, solved problem:
- use post-save hooks explained in previous post
- use several git commands in hook explained blog
here code (2.) archiving. goes ipython.py:
import os subprocess import check_call shlex import split ... def post_save(model, os_path, contents_manager): """post-save hook doing git commit / push""" if model['type'] != 'notebook': return # notebooks workdir, filename = os.path.split(os_path) if filename.startswith('scratch') or filename.startswith('untitled'): return # skip scratch , untitled notebooks # git add / git commit / git push check_call(split('git add {}'.format(filename)), cwd=workdir) check_call(split('git commit -m "notebook save" {}'.format(filename)), cwd=workdir) check_call(split('git push'), cwd=workdir) c.filecontentsmanager.post_save_hook = post_save
my run.sh utilizes setup-env.sh , setup-repo.sh google datalab , consequently depends on gcloud commands , kubernetes deployments credentials. otherwise please ensure extend dockerfile credentials.
cd /datalab/content . /datalab/setup-env.sh . /datalab/setup-repo.sh if [ $? != "0" ]; exit 1 fi cd /datalab/content/master_branch # multiple branches not planed here! /usr/local/bin/jupyter notebook --ip=* --no-browser --port=8123
Comments
Post a Comment