Deploying BigDL on Microsoft’s Azure* Data Science Virtual Machine

Automated Installation of BigDL Using Deploy to Azure*

To make it easier to deploy BigDL, we created a “Deploy to Azure” button on top of the Linux* (Ubuntu*) edition of the Data Science Virtual Machine (DSVM). This button encapsulates all the necessary installation steps to create a new Azure* DSVM and installs BigDL after the virtual machine (VM) is provisioned.

Azure Virtual Machines provide a mechanism to automatically run a script during post provisioning when using Azure Resource Manager (ARM) templates. On Github*, we have published the Azure Resource Manager (ARM) template and the script to install BigDL on the DSVM for Linux (Ubuntu) when creating the VM on Azure.

Clicking the Deploy to Azure button takes the user to the Azure portal wizard, leads them through the VM creation process, and automatically executes the necessary script to install/configure BigDL so that it is ready for use once the VM is successfully provisioned. The user can directly run /opt/BigDL/run_notebooks.sh to start a Jupyter* notebook server to execute the samples.

Note: It may take as long as 10 minutes to fully provision DSVM—perfect time for a coffee break!

Please note: For ease of use, we suggest selecting the password option rather than the SSH option in the DSVM provisioning prompt.

For completeness, we also provide, below, the manual, step-by-step installation procedure to create the data science steps in case you already have a DSVM (Ubuntu) instance, or just want to understand the details of what the automated steps do, above.

Manual Installation of BigDL on the DSVM

Provisioning DSVM

Before you start, you need to provision the Microsoft Data Science Virtual Machine for Linux (Ubuntu) by visiting the Azure product detail page and following the directions in the VM creation wizard.

When DSVM is configured, make a note of its public IP address or DNS name; you will need it to connect to DSVM via your connect tool of choice. The recommended tool for text interface is SSH or Putty. For the graphical interface, Microsoft* recommends an X Client called X2GO*.

Note: You may need to configure your proxy server correctly if your network administrators require all connections to go through your network proxy. The only session type supported by default on DSVM is Xfce*.

Building Intel’s BigDL

Change to root and clone BigDL from Github; switch to released branch-0.1:

sudo -s

     cd /opt

git clone https://github.com/intel-analytics/BigDL.gi

     git checkout branch-0.1

Building BigDL with Spark* 2.0:

     $ cd BigDL
       $ bash make-dist.sh -P spark_2.0

If successful, you should see the following messages:

Examples of DSVM Configuration Steps to Run BigDL

Switch to Python* 2.7.

     $ source /anaconda/bin/activate root

Confirm Python* version.

     $ python - - version

Install Python Packages

     $ /anaconda/bin/pip install wordcloud
     $ /anaconda/bin/pip install tensorboard

Creating Script Files to Run Jupyter* Notebook and TensorBoard*

In the directory where you cloned BigDL library (/opt/BigDL), create a script, and run_notebook.sh with the following content:

#begin run_notebook.sh
#!/bin/bash
#setup paths
BigDL_HOME=~/BigDL

#this is needed for MSFT DSVM
export PYTHONPATH=${BigDL_HOME}/pyspark/dl:${PYTHONPATH}
#end MSFT DSVM-specific config

#use local mode or cluster mode
#MASTER=spark://xxxx:7077
MASTER="local[4]"
PYTHON_API_ZIP_PATH=${BigDL_HOME}/dist/lib/bigdl-0.1.0-python-api.zip
BigDL_JAR_PATH=${BigDL_HOME}/dist/lib/bigdl-0.1.0-jar-with-dependencies.jar
export PYTHONPATH=${PYTHON_API_ZIP_PATH}:${PYTHONPATH}
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --notebook-dir=~/notebooks  --ip=* "

source ${BigDL_HOME}/dist/bin/bigdl.sh

${SPARK_HOME}/bin/pyspark \
    --master ${MASTER} \
    --driver-cores 5  \
    --driver-memory 10g  \
    --total-executor-cores 8  \
    --executor-cores 1  \
    --executor-memory 10g \
    --conf spark.akka.frameSize=64 \
  --properties-file ${BigDL_HOME}/dist/conf/spark-bigdl.conf \
    --py-files ${PYTHON_API_ZIP_PATH} \
    --jars ${BigDL_JAR_PATH} \
    --conf spark.driver.extraClassPath=${BigDL_JAR_PATH} \
    --conf spark.executor.extraClassPath=bigdl-0.1.0--jar-with-dependencies.jar
# end of create_notebook.sh
-----

chmod +x run_notebook.sh

In the same BigDL directory, create start_tensorboard.sh with the following content:

#begin start_tensorboard.sh
PYTHONPATH=/anaconda/lib/python2.7/site-packages:$PYTHONPATH
/anaconda/lib/python2.7/site-packages/tensorboard/tensorboard --logdir=/tmp/bigdl_summaries
#end start_tensorboard.sh

Please note that ‘/anaconda/lib/python2.7/site-packages/’ is installation-dependent and may change in future releases of DSVM. Thus, if these instructions do not work for you out of the box, you may need to update this path.

Note the URL at the end of the log http://10.0.2.4:6006. Open your DSVM browser with it to see the TensorBoard pane.

Launching a Text Classification Example

Execute run_notebook.sh and start_tensorboard.sh via bash commands from different terminals:

       $bash run_notebook.sh
       $bash start_tensorboard.sh

Open two browser tabs, one for text_classification.ipynb and another for TensorBoard.

Navigate to the text_classification example:

http://localhost:YOUR_PORT_NUMBER/notebooks/pyspark/dl/example/tutorial/simple_text_classification/text_classfication.ipynb# —Check location of sample.

Run the notebook. This will take a few minutes. In the end, you will see a loss graph like this one:

Your TensorBoard may look like this for the Text Classification example.

Automating the Installation of BigDL on DSVM

Azure Virtual Machines provide a mechanism to automatically run a script during post provisioning when using Azure Resource Manager (ARM) templates. On Github, we published the ARM template and the script to install BigDL on the DSVM for Linux (Ubuntu) when creating the VM on Azure. On the same Github directory there is also a Deploy to Azure button that takes the user to the Azure portal wizard, leads them through the VM creation, and automatically executes the above script to install/configure BigDL so that it is ready for use once the VM is successfully provisioned. The user can directly run /opt/BigDL/run_notebooks.sh to start a Jupyter notebook server to execute the samples.

Conclusion

In this blog post, we demonstrated that in just a few small steps one can take advantage of the Intel BigDL library running on Apache Spark* to execute deep learning jobs on Microsoft’s Data Science Virtual Machine. BigDL continues to evolve and enjoys solid support from the open-source community as well as from Intel’s dedicated software engineering team.

Resources

Learn more about Data Science Virtual Machine for Linux on Azure
Learn more about Azure HDInsight
Artificial Intelligence Software and Hardware at Intel
BigDL Introductory Video
Raise your BigDL questions in the BigDL Google Group.

Appendix

Installing and configuring Spark 1.6 for legacy code implementation:

Installing Spark 1.6.1 WITH spark 2.0)

Install Spark 1.6.1: http://spark.apache.org/downloads.html
Select 1.6.1.
  Download
  cd Downloads
  tar -xzf spark-1.6.1-bin-hadoop2.6.tgz

Move the directory from the download location to where Spark is stored on the system.

To switch back to the Python 3.5 environment:

     $source activate py35 (for Python 3.5)

To install Python packages in the Python 3.5 environment:

     $sudo /anaconda/envs/py35/bin/conda install xxxx (for Python 3.5 env)

(Do the same for pip installs.)

Installing BigDL on the Data Science Virtual Machine for Linux (CentOS*):

To run BigDL on DSVM CentOS* edition, first you need to install Maven* on the DSVM before compiling BigDL.

Installing Maven. Note that on CentOS-based Linux, instead of Ubuntu's apt-get, you need to use yum to install new packages:

DSVM’s default JAVA_HOME environmental variable points to an empty directory, "/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-1.b15.el7_2.x86_64". You need to change it to another already existing one that contains the Java* 8 installation:

   Export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.121-0.b13.el7_3.x86_64".

Check that Maven is installed correctly:

   $ mvn –v

After this, you should be able to run a build on BigDL following the steps in the main section above.

Deploying BigDL on Microsoft’s Azure* Data Science Virtual Machine

Automated Installation of BigDL Using Deploy to Azure*

Manual Installation of BigDL on the DSVM

Provisioning DSVM

Building Intel’s BigDL

sudo -s

Building BigDL with Spark* 2.0:

Examples of DSVM Configuration Steps to Run BigDL

Install Python Packages

Creating Script Files to Run Jupyter* Notebook and TensorBoard*

Launching a Text Classification Example

Automating the Installation of BigDL on DSVM

Conclusion

Resources

Appendix

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112