Appendix B — Configuring your computer to use Python for scientific computing

B.1 Why Python?

There are plenty of programming languages that are widely used in data science and in scientific computing more generally. Some of these, in addition to Python, are Matlab/Octave, Mathematica, R, Julia, Java, JavaScript and C++.

I have chosen to use Python. I believe language wars are counterproductive and welcome anyone to port the code we use to any language of their choice, I nonetheless feel we should explain this choice.

Python is a flexible programming language that is widely used in many applications. This is in contrast to more domain-specific languages like R and Julia. It is easily extendable, which is in many ways responsible for its breadth of use. We find that there is a decent Python-based tool for many applications we can dream up, certainly in data science. However, the Python-based tool is often not the very best for the particular task at hand, but it is almost always pretty good. Thus, knowing Python is like having a Swiss Army knife; you can wield it to effectively accomplish myriad tasks. Finally, we also find that it has a shallow learning curve with most students.

Perhaps most importantly, specifically for neuroscience applications, is that Python is widely used in machine learning and AI. The development of packages like TensorFlow, PyTorch, JAX, Keras, and scikit-learn have led to very widespread adoption of Python.

B.2 Jupyter notebooks

The materials of this workshop are constructed from Jupyter notebooks. To quote Jupyter’s documentation,

Jupyter Notebook and its flexible interface extends the notebook beyond code to visualization, multimedia, collaboration, and more. In addition to running your code, it stores code and output, together with markdown notes, in an editable document called a notebook.

This allows for executable documents that have code, but also richly formatted text and graphics, enabling the reader to interact with the material as they read it.

While you read the materials, you can read the HTML-rendered versions of the notebooks. To execute (and even edit!) code in the notebooks, you will need to run them. There are many options available to run Jupyter notebooks. Here are a few we have found useful.

JupyterLab: This is a browser-based interface to Jupyter notebooks and more (including a terminal application, text editor, file manager, etc.). As of March 2025, Chrome, Firefox, Safari, and Edge are supported. I encourage you to run your code own machine. I give instructions below on how to do the necessary installations and launch JupyterLab.
VSCode: This is an excellent source code editor that supports Jupyter notebooks. Be sure to read the documentation on how to use Jupyter notebooks in VSCode.
Google Colab: Google offers this service to run notebooks in the cloud on their machines. There are a few caveats, though. First, not all packages and updates are available in Colab. Furthermore, not all interactivity that will work natively in Jupyter notebooks works with Colab. If a notebook sits idle for too long, you will be disconnected from Colab. Finally, there is a limit to resources that are available for free, and as of March 2025, that limit is unpublished and can vary. All of the notebooks in the HTML rendering of this book have an “Open in Colab” button at the upper right that allows you to launch the notebook in Colab. This is a quick-and-easy way to execute the book’s contents.

B.3 Installing a Python distribution

Prior to embarking on your journey into biological circuits, you need to have a functioning Python distribution installed on your computer. Toward that end, we will use conda, a widely used package management system for data science and scientific computing. Importantly, we can set up environments that have packages with appropriate versions to use for specific tasks.

B.4 Downloading and installing Miniconda

If you already have Anaconda or Miniconda installed on your machine, you can skip this step and proceed to install node.js.

To download and install Miniconda, do the following.

B.4.1 Windows

Go to the Miniconda page and go to the “Quick command line install” section.
Click on the “Windows PowerShell” tab.
Copy all of the contents in the gray box (starting the curl).
Go to the Start menu and search for “PowerShell.” Click to open a PowerShell window. Alternatively, you can hit Windows + R and type PowerShell in the text box.
Paste the copied text into the PowerShell window and hit enter.

B.4.2 macOS

Go to the Miniconda page and go to the “Quick command line install” section.
Click on the “macOS” tab.
Copy all of the contents in the gray box (starting the mkdir).
Open a Terminal window. You can do this by hitting Command-space bar, typing Terminal, and hitting enter. Alternatively, the Terminal application is located in the /System/Applications/Utilities/ folder, which you can navigate to using Finder.
Paste the copied text into the Terminal window and hit enter.

B.4.3 Linux

Go to the Miniconda page and go to the “Quick command line install” section.
Click on the “Linux” tab.
Copy all of the contents in the gray box (starting the mkdir).
Open a terminal window. I assume you know how to do this if you are using Linux.
Paste the copied text into the terminal window and hit enter.

B.5 Install node.js

node.js is a platform that enables you to run JavaScript outside of the browser. We will not use it directly, but it needs to be installed for some of the more sophisticated JupyterLab functionality. Install node.js by downloading the appropriate installer for your machine from the node.js website.

B.6 Setting up a conda environment

I have created a conda environment for use in this workshop. You can install this environment by executing the following on the command line.

conda env create -f https://raw.githubusercontent.com/caltech-datasai/caltech-datasai.github.io/refs/heads/main/datasai.yml

This will build the environment for you (it may take several minutes). To then activate the environment, enter

conda activate datasai

on the command line. You will need to activate the environment every time you open a new terminal (or PowerShell) window. (If you are using the command line, you can have this happen automatically if you like by adding conda activate datasai to your configuration file, e.g., .bashrc.)

B.7 Launching JupyterLab

You can launch JupyterLab via your operating system’s terminal program (Terminal on macOS and PowerShell on Windows). If you are on a Mac, open the Terminal program. You can do this hitting Command + space bar and searching for “terminal.” Using Windows, you should launch PowerShell. You can do this by hitting Windows + R and typing “powershell” in the text box.

You need to make sure you are using the datasai environment whenever you launch JupyterLab, so you should do conda activate datasai each time you open a terminal.

Now that you have activated the datasai environment, you can launch JupyterLab by typing

jupyter lab

on the command line. You will have an instance of JupyterLab running in your default browser. If you want to specify the browser, you can, for example, type

jupyter lab --browser=firefox

on the command line.

Alternatively, if you are using VSCode, you can use its menu system to open .ipynb files.

B.8 Stan installation

We will be using Stan for some of our modeling. Stan has a probabilistic programming language. Programs written in this language, called Stan programs, are translated into C++ by the Stan parser, and then the C++ code is compiled. As you will see throughout the class, there are many advantages to this approach.

There are many interfaces for Stan, including the two most widely used RStan and PyStan, which are R and Python interfaces, respectively. We will use a simpler interface, CmdStanPy, which has several advantages that will become apparent when you start using it.

Whichever interface you use needs to have Stan installed and functional, which means you have to have an installed C++ toolchain. Installation and compilation can be tricky and varies from operating system to operating system. The instructions below are not guaranteed to work; you may have to do some troubleshooting on your own. Note that you can use Google Colab (or other cloud computing resources) for computing as well, so you do not need to worry if you have trouble installing Stan locally.

B.8.1 Configuring a C++ toolchain for macOS

On MacOS, you an install Xcode command line tools by running the following on the command line.

xcode-select --install

B.8.2 Configuring a C++ toolchain for Windows

According to the CmdStanPy documentation, you can skip this step, though I did previously verify that the below worked on a Windows machine.

You need to install a C++ toolchain for Windows. One possibility is to install a MinGW toolchain, and one way to do that is using conda.

conda install libpython m2w64-toolchain -c msys2

When you do this, make sure you are in the datasai environment.

B.8.3 Configuring a C++ toolchain for Linux

If you are using Linux, we assume you already have the C++ utilities installed.

B.8.4 Installing Stan with CmdStanPy

If you have a functioning C++ toolchain, you can use CmdStanPy to install Stan/CmdStan. You can do this by running the following at a Python prompt (either Python, IPython, or in a Jupyter notebook) (again making sure you are in the datasai environment).

import cmdstanpy; cmdstanpy.install_cmdstan()

This may take several minutes to run. (I did it on my Raspberry Pi, and it took hours.)

If you are using Windows and you skipped configuration of the C++ toolchain, instead run:

import cmdstanpy; cmdstanpy.install_cmdstan(compiler=True)

B.9 Checking your distribution

Let’s now run a quick test to make sure things are working properly. We will make a quick plot that requires some of the scientific libraries we will use.

Launch a Jupyter notebook in JupyterLab. In the first cell (the box next to the [ ]: prompt), paste the code below. To run the code, press Shift+Enter while the cursor is active inside the cell. You should see a plot that looks like the one below. If you do, you have a functioning Python environment for scientific computing!

import os, glob

import numpy as np

import bebi103
import cmdstanpy
import arviz as az

import bokeh.plotting
import bokeh.io
bokeh.io.output_notebook()

schools_data = {
    "J": 8,
    "y": [28, 8, -3, 7, -1, 1, 18, 12],
    "sigma": [15, 10, 16, 11, 9, 11, 10, 18],
}

schools_code = """
data {
  int<lower=0> J; // number of schools
  vector[J] y; // estimated treatment effects
  vector<lower=0>[J] sigma; // s.e. of effect estimates
}

parameters {
  real mu;
  real<lower=0> tau;
  vector[J] eta;
}

transformed parameters {
  vector[J] theta = mu + tau * eta;
}

model {
  eta ~ normal(0, 1);
  y ~ normal(theta, sigma);
}
"""

with open("schools_code.stan", "w") as f:
    f.write(schools_code)

with bebi103.stan.disable_logging():
    sm = cmdstanpy.CmdStanModel(stan_file="schools_code.stan")
    samples = sm.sample(data=schools_data, output_dir="./", show_progress=False)

samples = az.from_cmdstanpy(samples)

# Clean up
bebi103.stan.clean_cmdstan()
for fname in glob.glob("schools_code*"):
    os.remove(fname) 

# Make a plot of samples
p = bokeh.plotting.figure(
    frame_height=250, frame_width=250, x_axis_label="μ", y_axis_label="τ"
)
p.scatter(
    np.ravel(samples.posterior["mu"]), 
    np.ravel(samples.posterior["tau"]), 
    alpha=0.1
)

bokeh.io.show(p)

Loading BokehJS ...

Computing environment

%load_ext watermark
%watermark -v -p numpy,cmdstanpy,arviz,bebi103,bokeh,jupyterlab
print("CmdStan : {0:d}.{1:d}".format(*cmdstanpy.cmdstan_version()))

Python implementation: CPython
Python version       : 3.12.9
IPython version      : 8.30.0

numpy     : 2.1.3
cmdstanpy : 1.2.5
arviz     : 0.21.0
bebi103   : 0.1.26
bokeh     : 3.6.2
jupyterlab: 4.3.6

CmdStan : 2.36