9  Example file format: TDT files

Open in Google Colab | Download notebook

Data set 1 download

Data set 2 download


Code
# Colab setup ------------------
import os, sys, subprocess
if "google.colab" in sys.modules:
    cmd = "pip install --upgrade polars iqplot tdt watermark"
    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
    data_path = "https://s3.amazonaws.com/bebi103.caltech.edu/data/"
else:
    data_path = "../data/"
import polars as pl
import numpy as np

import tdt

# HoloViews is a high-level plotting package
import holoviews as hv
import holoviews.operation.datashader
hv.extension('bokeh')

import datashader

import bokeh.io
bokeh.io.output_notebook()
Loading BokehJS ...

Tucker-Davis Technologies make hardware and accompanying software for a variety of neuroscience applications. Their data formats are specific to their instruments and are not in a standard format.

For this example, I will walk through how I figured out how to install the necessary software and load in TDT data sets.

  1. I Googled TDT data file format Python and I got a hit for the company’s website on offline data analysis tools.
  2. The documentation there recommends installing the software with pip install tdt, but I wanted to check dependencies first. I could not find this package on GitHub or other public repository, so I went straight to the Python Package Index for the tdt package.
  3. I downloaded the files for the package. This does not install them, but simply saved on my machine.
  4. I unzipped the download and navigated into the directory on the command line. I ran ls on the command line, and I found the following files and directories: PKG-INFO, setup.cfg, tdt/, README.md, setup.py, tdt.egg-info/.
  5. There was no requirements.txt file, so I looked in the setup.cfg and setup.py files to see if there were dependencies. There were not.
  6. I therefore navigated into the tdt/ directory and checked all import statements to find all packages and modules that the tdt package uses. I did this by executing grep import * on the command line.
  7. I found that all imports were from the standard library except for Numpy. I therefore felt safe installing the tdt package in my current conda environment without the need to create a new one. Therefore, tdt is included in the datasai environment.

9.1 The sample data set

To demonstrate how TDT data are loaded and manipulated, we use a sample data set kindly donated from a friendly lab. The data set was acquired as part of a set of experiments looking at dopamine signaling in the nucleus accumbens area of a mouse brain in response to presentation and consumption of a palatable food stimulus, in this case a sucrose pellet. The nucleus accumbens was injected with an adeno-associated virus (AAV) delivering dLight, a fluorescent dopamine sensor. Then, a fiberoptic cannula was inserted into the nucleus accumbens to monitor fluorescence during the experiment.

During the imaging session, each mouse was placed in an experimental arena and allowed to acclimate for 5 minutes. A sucrose pellet was then placed in a designated area of the arena. Once the pellet was consumed, the experimenter waited a few minutes before placing the next pellet. This process was repeated until each mouse received a total of 12 sucrose pellets. Time stamps of pellet consumption were manually recorded based on the video recording of the experiment.

The TDT instrument measured two channels. The Ct1A channel was a measure of fluorescnece from a 405 nm excitation as a dopamine-independent control. The GC1A channel was excited at 465 nm and features dopamine-based fluorescence.

9.2 Loading in the TDT data

Now, let’s load in the data set! Let’s start with what we know how to do, and load in the time points where the pellets were presented based on a CSV file the experimenters manually constructed.

df_pellets = pl.read_csv(os.path.join(data_path, 'FB015_timepoints_pellets_V106N.csv'))

# Take a look
df_pellets
shape: (13, 6)
ID Session Timepoint Hours Minutes Seconds
str i64 str i64 i64 f64
"V106N" 1 "start" 13 11 41.31
"V106N" 1 "p01" 13 17 4.58
"V106N" 1 "p02" 13 19 49.54
"V106N" 1 "p03" 13 22 34.12
"V106N" 1 "p04" 13 24 51.67
"V106N" 1 "p08" 13 33 16.24
"V106N" 1 "p09" 13 35 15.04
"V106N" 1 "p10" 13 37 24.36
"V106N" 1 "p11" 13 39 31.08
"V106N" 1 "p12" 13 41 34.59

We would like to convert Hours-Minutes-Seconds columns to total seconds.

df_pellets = df_pellets.with_columns(
    (pl.col('Hours') * 3600 + pl.col('Minutes') * 60 + pl.col('Seconds'))
    .alias('total_seconds')
)

# Look at our handiwork
df_pellets
shape: (13, 7)
ID Session Timepoint Hours Minutes Seconds total_seconds
str i64 str i64 i64 f64 f64
"V106N" 1 "start" 13 11 41.31 47501.31
"V106N" 1 "p01" 13 17 4.58 47824.58
"V106N" 1 "p02" 13 19 49.54 47989.54
"V106N" 1 "p03" 13 22 34.12 48154.12
"V106N" 1 "p04" 13 24 51.67 48291.67
"V106N" 1 "p08" 13 33 16.24 48796.24
"V106N" 1 "p09" 13 35 15.04 48915.04
"V106N" 1 "p10" 13 37 24.36 49044.36
"V106N" 1 "p11" 13 39 31.08 49171.08
"V106N" 1 "p12" 13 41 34.59 49294.59

Finally, we want to make the start time be zero, and then we can delete the row stating when the start was.

df_pellets = df_pellets.with_columns(
    pl.col('total_seconds') - pl.col('total_seconds').min()
).filter(
    pl.col('Timepoint') != 'start'
)

# Take a look
df_pellets
shape: (12, 7)
ID Session Timepoint Hours Minutes Seconds total_seconds
str i64 str i64 i64 f64 f64
"V106N" 1 "p01" 13 17 4.58 323.27
"V106N" 1 "p02" 13 19 49.54 488.23
"V106N" 1 "p03" 13 22 34.12 652.81
"V106N" 1 "p04" 13 24 51.67 790.36
"V106N" 1 "p05" 13 27 12.23 930.92
"V106N" 1 "p08" 13 33 16.24 1294.93
"V106N" 1 "p09" 13 35 15.04 1413.73
"V106N" 1 "p10" 13 37 24.36 1543.05
"V106N" 1 "p11" 13 39 31.08 1669.77
"V106N" 1 "p12" 13 41 34.59 1793.28

Now that we have that loaded, we can proceed to loading the data from the TDT instrument. The main function to do this, based on the documentation is tdt.read_block().

tdt_data = tdt.read_block(os.path.join(data_path, 'C57Bl6J_V106N_240910-131134'))

# Take a look
tdt_data
read from t=0s to t=1855.58s
epocs   [struct]
snips   [struct]
streams [struct]
scalars [struct]
info    [struct]
time_ranges:    array([[ 0.],
       [inf]])

This view is useful, but we can get more information about the respective fields by investigating the object structure using the __dict__ attribute that every instance of a Python class has.

tdt_data.__dict__
{'epocs': {},
 'snips': {},
 'streams': GC1A    [struct]
 Ct1A   [struct],
 'scalars': {},
 'info': tankpath:  '../data'
 blockname: 'C57Bl6J_V106N_240910-131134'
 start_date:    datetime.datetime(2024, 9, 10, 13, 11, 39, 999999)
 utc_start_time:    '13:11:39'
 stop_date: datetime.datetime(2024, 9, 10, 13, 42, 35, 576203)
 utc_stop_time: '13:42:35'
 duration:  datetime.timedelta(seconds=1855, microseconds=576204)
 stream_channel:    0
 snip_channel:  0,
 'time_ranges': array([[ 0.],
        [inf]])}

Apparently, the info and steams entries are the only ones with information. Specifically, the info entry has the duration of the experiment, which we can convert to total number of seconds since it is a datatime.timedelta object.

Now, we can turn our attention to the streams entry.

tdt_data.streams.__dict__
{'GC1A': name:  'GC1A'
 code:  np.uint32(1093747527)
 size:  np.uint32(138)
 type:  np.uint32(33025)
 type_str:  'streams'
 ucf:   np.False_
 fs:    np.float64(1017.2526245117188)
 dform: np.uint32(0)
 start_time:    np.float64(0.0)
 data:  array([  0.4464025 ,   0.45884764,   0.4714184 , ..., 163.25513   ,
        163.2471    , 163.23853   ], dtype=float32)
 channel:   [1],
 'Ct1A': name:  'Ct1A'
 code:  np.uint32(1093760067)
 size:  np.uint32(138)
 type:  np.uint32(33025)
 type_str:  'streams'
 ucf:   np.False_
 fs:    np.float64(1017.2526245117188)
 dform: np.uint32(0)
 start_time:    np.float64(0.0)
 data:  array([1.6589369e-01, 1.6555177e-01, 1.6559452e-01, ..., 1.6825778e+02,
        1.6824736e+02, 1.6823669e+02], dtype=float32)
 channel:   [1]}

The two data sets are in the tdt_data.streams.Ct1A.data and tdt_data.streams.GC1A.data Numpy arrays. Since we can now extract the lengths of these arrays, we can compute the time points of the measurements, knowing also the duration of the experiment.

t = (
    np.arange(len(tdt_data.streams.Ct1A.data)) 
    / len(tdt_data.streams.Ct1A.data) 
    * tdt_data.info.duration.total_seconds()
)

For convenience, we can make an array of the time when the pellets were presented.

t_pellets = df_pellets['total_seconds'].to_numpy()

We should see how many time points we have as we start to think about plotting.

len(t)
1887616

That’s 2 million time points. We could probably plot that with normal Bokeh, but to be safe, we will use Datashader. Let’s build the datashaded plots.

# Plot for control
curve_control = hv.Curve(
    data=(t, tdt_data.streams.Ct1A.data),
    kdims='time (s)',
    vdims='signal (a.u.)'
).opts(
    color='tomato'
)

# Plot for experiment
curve_exp = hv.Curve(
    data=(t, tdt_data.streams.GC1A.data),
    kdims='time (s)',
    vdims='signal (a.u.)'
)

# Datashade the plots
data_shaded_curve_control = hv.operation.datashader.datashade(
    curve_control,
    cmap='tomato',
    aggregator=datashader.any()
)
data_shaded_curve_exp = hv.operation.datashader.datashade(
    curve_exp,
    cmap='#1f77b4',
    aggregator=datashader.any()
)

Next, we should make a plot of line indicating where the pellets were introduced. This can be done using the hv.VSpans element.

pellets_plot = hv.VSpans((t_pellets, t_pellets)).opts(color='orange')

Finally, we can compose the elements together to make our plot.

# Compose the overlaid plot
(data_shaded_curve_control * data_shaded_curve_exp * pellets_plot).opts(
    frame_height=200,
    frame_width=550,
    padding=0.01,
    show_grid=True,
)

The first few seconds show a rapid increase in the signal. Domain experts tell me that this is always the case at the beginning of a measurement as the instrument calibrates. So, we will omit the first couple of seconds.

Apparently, we should perform background subtraction, which will also appropriately scale the vertical axes because the first second will not start from such a low level in the background-substracted signal.

# Subtract background, then center and scale
s = tdt_data.streams.GC1A.data - tdt_data.streams.Ct1A.data
s = (s - s.mean()) / s.std()

# Cut out first couple seconds
s = s[t >= 2]
t = t[t >= 2]

# Plot background subtracted signal
curve_exp = hv.Curve(
    data=(t, s),
    kdims='time (s)',
    vdims='signal (a.u.)'
)

# Datashade
data_shaded_curve_exp = hv.operation.datashader.datashade(
    curve_exp,
    cmap='#1f77b4',
    aggregator=datashader.any()
)

# Compose
(data_shaded_curve_exp * pellets_plot).opts(
    frame_height=200,
    frame_width=550,
    padding=0.01,
    show_grid=True,
)

We now have a nice, explorable visualization!

9.3 Computing environment

%load_ext watermark
%watermark -v -p numpy,polars,tdt,holoviews,datashader,jupyterlab
Python implementation: CPython
Python version       : 3.12.11
IPython version      : 9.1.0

numpy     : 2.1.3
polars    : 1.30.0
tdt       : 0.6.7
holoviews : 1.20.2
datashader: 0.18.0
jupyterlab: 4.3.7