Welcome to deltascope’s documentation!¶
Features¶
- Compare sets of 3D biological images to identify differences
- Automatically align the structure in the image to correct for variation introduced during mounting and imaging
- Generate descriptive graphs that quantify both the average and variation of the data
- Use machine learning techniques to classify samples and identify regions of statistically significant difference
Check out Start Here to find out if deltascope is right for you.
Support¶
- Complete documentation is available on Read the Docs.
- Check out the Frequently Asked Question page.
- Submit an issue describing a problem or question on the project’s Github issue tracker.
Contribute¶
- Issue Tracker: https://github.com/msschwartz21/deltascope/issues
- Source Code: https://github.com/msschwartz21/deltascope
License¶
This project is licensed under the GNU General Public License.
Contents¶
Start Here¶
Is deltascope right for you?¶
deltascope may be able to help you if:
- your data consists of a set of 3D image stacks
- your data contains a clear structure and shape that has consistent gross morphology between control and experimental samples
- you want to identify extreme or subtle differences in the structure between your experimental groups
- you have up to 4 different channels to compare
deltascope cannot help if:
- your data was collected using cryosections that need to be aligned after imaging
- your experiment changes the gross anatomy of the structure between control and experimental samples
Installation¶
deltascope can be installed using pip, Python’s package installer. Some deltascope dependencies are not available through pip, so we recommend that you install Anaconda which automatically includes and installs the remaining dependencies. See Setting up a Python environment for more details.
$ pip install deltascope
Note
If you are unfamiliar with the command line, check out this tutorial. Additional resources regarding the command line are available here.
Warning
Packages required for deltascope depend on Visual C++ Build Tools, which can be downloaded at build tools.
Setting up a Python environment¶
If you’re new to scientific computing with Python, we recommend that you install Anaconda to manage your Python installation. Anaconda is a framework for scientific computing with Python that will install important packages (numpy, scipy, and matplotlib).
Warning
deltascope is written in Python 3, and requires the installation the Python 3 version of Anaconda.
Data Preparation¶
Biological Questions¶
deltascope compares biological structures in three dimensions in order to preserve spatial relationship data that is lost in maximum intensity projections (MIPs). In order to apply deltascope to a new biological structure, certain conditions must be satisfied:
- The gross morphology of the structure must be roughly consistent between samples.
- The x, y, and z dimensions of the structure cannot be too similar in extent. For example, in the spinal cord, the anterior-posterior axis is the longest and the medial-lateral axis is the smallest with the dorsal-ventral axis falling between these two dimensions. The different proportional sizes of these axes enables us to consistently align the structure in 3D space regardless of the sample’s orientation during image collection.
- The gross morphology of the structure can be described with a simple polynomial equation. For example, the spinal cord can be described by a line that falls at the midline of the medial-lateral axis: y = mx + b.
File Format¶
Most microscopes save data in their own proprietary data format: for example, Zeiss, .czi
; Leica, .lif
. In order to ensure that image data is legible to all components of the workflow, files need to be converted to the HDF5 (.h5
) format specified by ImageJ. This conversion can be easily executed in Fiji using the BioFormats plugin to import proprietary file formats and the HDF5 plugin to export HDF5 files. Multichannel collections need to be split into individual channels before being saved as HDF5 files, with coherent file names utilized to preserve file history.
Signal Normalization¶
Biological fluorescence microscopy data contains variation in signal intensity due to both biological and technical error. For example, the top of the sample is frequently brighter than the bottom because it is closer to the objective and also has not been as bleached by the collection of previous optical sections. If we were to try to select the set of points that represent ‘true signal’ by applying a single intensity threshold, points that represent background at the top of the stack may have the same intensity as points of true signal at the bottom of the stack.
We have implemented an adaptive thresholding protocol that avoids these challenges, utilizing the open-source software, Ilastik. This software uses machine learning principles in order to predict the likelihood that a particular pixel contains true signal. The probability is calculated based on user annotation of images, in which regions of true signal and background are labeled. This protocol allows the user to apply their domain knowledge of the sample in order to best distinguish signal from background. Tutorials describing how to install and implement Ilastiks pixel classification workflows are available on Ilastiks website.
deltascope Data Processing¶
Note
This guide will describe a set of parameters that the user needs to specify when running deltascope. A complete list of all parameters if available on the Parameter Reference page.
Intensity Thresholding¶
Goal¶
At this point, each sample/channel should have been processed by Ilastik to create a new c1_10_Probabilities.h5
file. If you open this file in Fiji utilizing the fiji HDF5 plugin, it should contain three dimensions and two channels (signal and background, but for our purposes the two are interchangeable). Each pixel should have a value ranging between 0 and 1. If we are inspecting the signal channel, pixels with a value close to 0 (low p value) are highly likely to be true signal. Correspondingly, pixels with a value close to 1 are likely to be background. The c1_10_Probabilities.h5
file by Ilastik contains two channels (signal and background), which are inverse images. For a given pixel, the background intensity value is 1 minus the signal intensity value. In order to simplify our data, we will apply a threshold that will divide the data into two sets of pixels: signal and background. In the steps that follow, we will only use the set of pixels that correspond to true signal. This set of pixels may be also referred to as a set of points or a point cloud. In order to avoid keeping track of which channel is the signal channel, we assume that after applying a threshold there will be more points will fall in the background group than in the signal group. If this is not true, better Ilastik training or a stricter threshold is reccomended, or deltascope will instead be examining the structure of your background.
Warning
If your data contains more points of signal than background, deltascope will select your background channel as the signal channel. In order to correct this assumption, the inequality in brain.read_data()
will need to be changed to equate a large number of points with signal instead of background.
Setting Parameters¶
In order to divide the image into two sets of pixels (signal and background), we use a parameter, genthresh
, to determine which group the pixels fall into. We have found that a value of 0.5 is sufficient to divide the data and have not found that the results vary greatly if it is changed; however, if your data contains a lot of intermediate background values (0.4-0.7), you may benefit from a smaller threshold, e.g. 0.3.
The following code requires a parameter micron
, which specifies the dimensions of the image voxel in microns. It is a list of the form [x,y,z]
. This information can typically be found in the metadata of a microscope collection file.
This section of the code also includes a deprecated parameter scale
, which must be set to [1,1,1]
.
Code Instructions¶
The following instructions apply to processing a single sample. Details regarding function parameters can be found under embryo
and brain.preprocess_data()
. Raw image data will be imported by embryo.add_channel()
.
import deltascope
#Create an embryo object that facilitates data processing
e = deltascope.embryo(experiment-name,sample-number,directory)
#For each channel in your sample, add a channel with a unique name, e.g. 'c1' or 'c2'
e.add_channel(c1-filepath,c1-name)
e.add_channel(c2-filepath,c2-name)
#Threshold each channel and scale points according to voxel dimension in microns
e.chnls[c1-name].preprocess_data(genthresh,scale,microns)
e.chnls[c2-name].preprocess_data(genthresh,scale,microns)
Sample alignment using principle component analysis (PCA)¶
Goal¶
Principle Component Analysis¶
During typical collections of biological samples, each sample will be oriented slightly differently in relation to the microscope due to variations in the shape and size of the sample as well as human error during the mounting process. As a result of this variation, we cannot directly compare samples in 3D space without realigning them. We have implemented principle component analysis in order to automate the process of alignment without a need for human supervision. Fig. 1 shown below illustrates how PCA can be used to align a set of points in 2D. deltascope uses the same process in 3D to identify biologically meaningful axes present in biological structures.

Principle component analysis (PCA) can be used to identify and align samples along consistent axes. (1) In 2D, PCA first selects the axis that captures the most variability in the data (1st PC). The 2nd PC is selected in a position orthogonal to the 1st PC that captures the remaining variation in the data. (2) The data are then rotated so that the 1st and 2nd PCs correspond with x and y axes respectively. (3) Finally, we have added a step of identifying the center of the data and shifting it to the origin.
Structural Channel Processing¶
We designate one channel, the structural channel
, which we will use for PCA to align samples. Since we are interested in the gross morphology of this channel, we apply two data preprocessing steps to reduce the data down to only essential points. First, we return to brain.raw_data
and apply a new threshold, medthresh
, which is typically more stringent than genthresh
. This step ensures we are only considering points of signal with extremely high likelihood of being real. Second, we apply a median filter to the data twice, which smooths out the structure and eliminates small points of variation that may interfere with the alignment process of the gross structure.
PCA outputs three new dimensions, the 1st, 2nd, and 3rd PCs. These components will be reassigned to the X, Y, and Z axes to help the user maintain orientation in regards to their data. In the case of the zebrafish post-optic commissure shown below (Fig. 2), the 1st PC is reassigned to the X axis and the 2nd and 3rd PCs are assigned to Z and Y respectively. These assignments honor the user’s expectation of the sample’s alignment in 3D space. The assignment of components to axes can be modified using the parameter comporder
.
Warning
In order for PCA to consistently align your samples in the same orientation, we are assuming that the three dimensions of your structure are of different relative sizes. Since PCA looks for the axes that capture the most variation in your data, a sample that has axes of the same relative size will not have any distinguishing characteristics that PCA can use to identify and separate different axes.

This example illustrates the efficacy of PCA at changing the orientation of the zebrafish post optic commissure. In this case, the 1st PC is significantly longer than the 2nd and 3rd. While these two remaining components are similar in size, the typically longer depth of the 2nd PC distinguishes it from the 3rd PC.
Model Fitting¶
In addition to rotating the orientation of the data in 3D space, we also want to align the center of all samples at the origin. In order to determine the center of the data, we fit a simple mathematical model to the data that will also be used later in the analysis. The zebrafish post optic commissure shown above forms a parabolic structure, which can be described by y = ax^2 + bx + c. For simplicity, we fit the model in two dimensions, while holding one dimension constant. In the case of the POC, the parabolic structure lies flat in the XZ plane, which means that the structure can be described using exclusively the X and Z dimensions. The dimensions, which will be used to fit the 2D model, are specified in the parameter fitdim
. Additionally the deg
parameter specifies the degree of the function that fits to the data.
Setting Parameters¶
medthresh
is typically set to 0.25, in comparison to a value of 0.5 for genthresh
. If your data contains aberrant signal that does not contribute to the gross morphology of the structure, an even lower medthresh
may help limit the negative influence of noisy signal. Additionally, the radius
of the median filter can also be tuned to eliminate noisy signal. The typical value for radius
is 20, which refers to the number of neighboring points that are considered in the median filter. A smaller value for radius
will preserve small variation in signal, while a larger value will cause even more blunting and smoothing of the data. Prior to running the median threshold, it is reccomended that the user load several HDF5 files containing the structure of interest into FIJI and test several median thresholds to determine which best resolves their structure. Utilize the best median threshold radius in radius
.
The comporder
parameter controls how principle components are reassigned to the typical Cartesian coordinate system (XYZ) that most users are familiar with. It takes the form of an array of length 3 that specifies the index of the component that will be assigned to the X, Y, or Z axis: [x index,y index,z index]
. Please note that the index that matches each principle component starts counting at 0, e.g. 1st PC = 0, 2nd PC = 1, and 3rd PC = 2. For example, if we want to assign the 1st PC to the x axis, the 2nd to the Z axis, and the 3rd to the y axis, the comporder
parameter would be [0,2,1]
.
Finally, the remaining two parameters determines how the model will be fit to the data. fitdim
determines which 2 axes will be used to fit the 2D model. It takes the form of a list of 2 of the 3 dimensions specified as a lowercase string, e.g. 'x','y','z'
. If we wanted to fit a model in the XZ plane, while holding the Y axis constant, the fitdim
parameter would be ['x','z']
. deg
specifies the degree of the function that will be fit to the data. The default is 2
, which specifies a parabolic function. A deg of 1
would fit a linear function, eg. y=mx + b.
Warning
The ability to specify degrees other than 2 is still being developed. Check here for updates.
Code Instructions¶
#Run PCA on the structural channel, in this case, c1
e.chnls['c1'].calculate_pca_median(e.chnls['c1'].raw_data,medthresh,radius,microns)
#Save the pca object that includes the transformation matrix
pca = e.chnls['c1'].pcamed
#Transform the structural channel using the saved pca object
e.chnls['c1'].pca_transform_3d(e.chnls['c1'].df_thresh,pca,comporder,fitdim,deg=2)
#Save the mathematical model and vertex (center point) of the structural channel
mm = e.chnls['AT'].mm
vertex = e.chnls['AT'].vertex
#Transform any additional channels using the pca object calculated based on the structural channel
e.chnls['c2'].pca_transform_3d(e.chnls['c2'].df_thresh,pca,comporder,fitdim,deg=2,mm=mm,vertex=vertex)
Cylindrical Coordinates¶
Goal¶
The ultimate goal with this workflow of data processing is to enable us to study small differences in biological 3D structures when comparing a set of control samples to experimental samples. While our samples are now aligned to all fall in the same region of 3D space, our points are still defined by the xyz coordinates defined by the microscope. In order to detect changes in our structure, we will define the position of points relative to the structure using a cylindrical coordinate system. We will rely on the previously defined mathematical model, brain.mm
, to represent the underlying shape of our structure. From there we will define the position of each point relative to the mathematical model (Fig. 3). The first dimension, R, is defined as the shortest distance between the point and the model. Second, alpha is defined as the distance from the point’s intersection with the model to the midline or center of the structure. Third, the position of the point around the model is defined in theta. Following the completion of the transformation, the final dataset is saved to a .psi
file.

To enable analysis of data point relative to a biological structure, points are transformed from a Cartesian coordinate system (x,y,z) into a cylindrical coordinate system ($alpha$,$theta$,R) defined relative to the structure.
Code Instructions¶
This transformation does not require defining any parameters; however, it assumes the data has already been thresholded and aligned using PCA.
#Transform each channel to cylindrical coordinates
e.chnls['c1'].transform_coordinates()
e.chnls['c2'].transform_coordinates()
#Save processed data to .psi file
e.save_psi()
Warning
This processing step is time consuming. We recommend running multiple samples in parallel in order to reduce the total amount of computational time required.
Batch Processing¶
In order to reduce processing time, we have implemented a basic multiprocessing tool that runs 5 samples in parallel at a time. For more information, see Batch Processing: Transformation.
Landmark Calculation¶
Following the steps described in deltascope Data Processing, your data should be saved as a set of .psi
files containing 6 values for each point: x,y,z,alpha,r,theta. Depending on the size and resolution of your original image, you will have thousands of points, which are unwieldy when you are trying to compare sample sets with several images. In order to reduce the size of the data and facilitate direct comparison, we calculate a set of landmarks that describe the data.
What is a landmark?¶
Landmark points are frequently used in the study of morphology to describe and compare structures. Classically, an individual with expert knowledge of the structure would define a set of points that are present in all structures, but subject to variation. For example, in the human face, landmarks might be placed at the corners of the eyes and mouth as well as the tip of the nose. If we were to compare many different faces, we could use the difference in the position of the landmarks to describe how the faces varied.
Unbiased Landmarks¶
The challenge with the classical approach to landmark analysis lies in the step of assigning landmarks. If an expert user is selecting regions of the structure to assign landmarks to, they are projecting their own expectations as to where they expect to see variation. We have developed a method of automatically calculating landmarks that describe the structure without bias and allow the user to discover new regions of interest.
How are landmarks calculated?¶
The calculation of unbiased landmarks relies on Cylindrical Coordinates that were previously defined (Fig. 3). First, the data is divided into equally sized sections along the alpha axis (Fig. 4.1). The user specifies the number of divisions anum
and the data is divided accordingly. Next, each alpha subdivision is divided into radial wedges (Fig. 4.2) according to the parameter tsize
, which specifies the size of each wedge. Finally, the distribution of points in the r axis is calculated according to the percentiles specified by the user in percbins
(Fig. 4.3). Following these three steps, each subdivision can be represented by a single point that describes the distribution of the data in all three dimensions (Fig. 4.4) For more information on these parameters, see Landmark Calculation.

In order to calculate landmarks, we will subdivide the data along the alpha and theta axes before calculating the r value that describes the distribution of the data.
Code Sample¶
import deltascope
import numpy as np
anum = 30
tstep = np.pi/4
#Create a landmark object
lm = deltascope.landmarks(percbins=[50],rnull=15)
lm.calc_bins(dfs,anum,tstep)
#Calculate landmarks for each sample and append to a single dataframe
outlm = pd.DataFrame()
for k in dfs.keys():
outlm = lm.calc_perc(dfs[k],k,'stype',outlm)
Selecting anum
¶
The anumSelect
can be used to identify the optimum number of sections along alpha. We use two measure of variance to test a range of anum
. The first test compares the variance of adjacent landmark wedges. The second test compares the variability of samples in a landmark. As shown in Fig. 5, the optimum value of anum
minimizes the variance of both tests.

We select the value of anum
that minimizes both the bin variance and the sample variance.
Code Sample¶
import deltascope
#Create a optimization object
opt = deltascope.anumSelect(dfs)
tstep = np.pi/4
#Initiate parameter sweep
opt.param_sweep(tstep,amn=2,amx=50,step=1,percbins=[50],rnull=15)
#Plot raw data
opt.plot_rawdata()
poly_degree = 4
#Test polynomial fit
opt.plot_fitted(poly_degree)
best_guess = 30
#Find the optimum value of anum
opt.find_optimum_anum(poly_degree,best_guess)
Graphing Landmark Data¶
In order to facilitate easy visualization, the graphSet
and graphData
classes manage graphing commands and any necessary data transformation.
Todo
Code sample for graphing functions
Sample Classification¶
Goal¶
Previously we went through the process of calculating landmarks in order to identify a reduced set of point that are comparable between samples. In order to learn more about which points distinguish control samples from experimental samples, we can build a statistical model that will use landmarks to classify control and experimental. After we have developed a classification model, we can look at which landmarks were most important for classification and use this information to identify regions of biological difference.
Dimensionality Reduction¶
Currently, your dataset may have several hundred landmark points that describe the shape and variability of the data. However, it is likely that the number of landmark points outnumbers the number of samples. Statistical modeling techniques are most effective when the number of points is smaller than the number samples that are training the model. In this situation we can think of each landmark point as a dimension of the data. In order to reduce the dimensionality of the data (e.g. the number of landmark points), we will use principle component analysis (PCA) to identify a reduced set of components (new dimensions) that capture all of the variability that is present in the landmark points. Each component is a mixture of landmark points with some points having greater influence than others.
Coding Instructions¶
import deltascope
import pandas as pd
#Read csv file that contains landmark data for both sample groups
df = pd.read_csv(landmark_file)
#Create the tree classifier object
tc = deltascope.treeClassifier(df)
#Apply pca to automatically reduce the dimensionality to the optimum number of dimensions
tc.apply_pca()
#Fit the classifier based on landmarks
tc.fit_classifier()
#Visualize the landmarks that had the highest impact on the classifier
tc.plot_top_components(index=10)
Checklist¶
- Convert data to HDF5 files containing a single sample/channel per file
- Train Ilastik on each channel and process all files to create ‘_Probability.h5’ files
Parameter Reference¶
Transformation¶
-
genthresh
¶ This parameter defines the cutoff point that will divide the
brain.raw_data
into a set of true signal points and background points based on the probability that each point is true signal. The_Probabilties.h5
dataset is generated after running the Ilastik pixel classification workflow described in Signal Normalization. Pixels with a value of 1 are likely to be background. Correspondingly, pixels with a value close to 0 are most likely to be true signal. We have found that a threshold of 0.5 is sufficient to divide true signal from background; however, if your data contains a lot of intermediate background values (0.4-0.7), you may benefit from a smaller threshold, e.g. 0.3.Recommended:
0.5
-
microns
¶ This parameter is a list of the form
[x,y,z]
that specifies the dimensions of the voxel in microns. The data inbrain.raw_data
is scaled bymicrons
to control for datasets in which thez
dimension is larger than thex
andy
dimensions.Example:
[0.16,0.16,0.21]
-
scale
¶ This is a deprecated parameter that should always be set to
[1,1,1]
.Required:
[1,1,1]
-
medthresh
¶ This parameter serves the same purpose as
genthresh
; however, it is used exclusively on data used for aligning samples using PCA. This threshold is typically more stringent thangenthresh
to ensure that any noise in the data does not interfere with the alignment process.Recommended:
0.25
-
radius
¶ The
radius
of the median filter can be tuned to eliminate noisy signal. The typical value forradius
is 20, which refers to the number of neighboring points that are considered in the median filter. A smaller value forradius
will preserve small variation in signal, while a larger value will cause even more blunting and smoothing of the data.Recommended:
20
-
comporder
¶ This parameter controls how principle components are reassigned to the typical Cartesian coordinate system (XYZ) that most users are familiar with. It takes the form of an array of length 3 that specifies the index of the component that will be assigned to the X, Y, or Z axis:
[x index,y index,z index]
. Please note that the index that matches each principle component starts counting at 0, e.g. 1st PC = 0, 2nd PC = 1, and 3rd PC = 2.For example, if we want to assign the 1st PC to the x axis, the 2nd to the Z axis, and the 3rd to the y axis, the
comporder
parameter would be[0,2,1]
.Example:
[0,2,1]
-
fitdim
¶ This parameter determines which 2 axes will be used to fit the 2D model. It takes the form of a list of 2 of the 3 dimensions specified as a lowercase string, e.g.
'x','y','z'
.If we wanted to fit a model in the XZ plane, while holding the Y axis constant, the
fitdim
parameter would be['x','z']
.Example:
['x','z']
-
deg
¶ This parameter specifies the degree of the function that will be fit to the data. The default is
2
, which specifies a parabolic function. A deg of1
would fit a linear function.Default:
2
Warning
The infrastructure to support degrees other than 2 is not currently in place. Check here for updates.
Landmark Calculation¶
-
anum
¶ This integer specifies the number of divisions along the alpha axis when calculating landmarks. See Selecting anum for guidance on setting this parameter.
Example:
20
-
tsize
¶ This parameter sets the size of each radial wedge in the landmark calculation. The program works in radians so this parameter should be a float that can evenly divide into 2Pi. We have found that Pi/4 (45º) is a biologically appropriate division for our typical structures.
Example:
0.79
-
percbins
¶ This parameter is a list of integers that specifies what percentile should be used to calculate the distribution of points along r.
Example:
[50]
Useful Resources¶
Command line tutorials¶
Pip package installer¶
Version control with Git and Github¶
Jupyter Notebook¶
Frequently Asked Questions¶
My file paths are causing errors?¶
Try changing each individual slash in your path to have two slashes.
What is a .psi
file?¶
A .psi
file is similar in structure to a comma separated value (.csv
) file with the addition of header text that defines metadata for the file. For example:
# PSI Format 1.0
#
# column[0] = "Id"
# column[1] = "x"
# column[2] = "y"
# column[3] = "z"
# column[4] = "ac"
# symbol[4] = "A"
# type[4] = float
# column[5] = "r"
# symbol[5] = "R"
# type[5] = float
# column[6] = "theta"
# symbol[6] = "T"
# type[6] = float
52337 0 0
1 0 0
0 1 0
0 0 1
Batch Processing: Transformation¶
mp-transformation.py
can be run from the command line after setting parameters in mp-transformation-config.json
. When you run the script, you only need to provide the path to the config file as an argument
$ python mp-transformation.py "C:\\path\\to\\mp-transformation-config.json"
In addition to the parameters described in Parameter Reference, mp-transformation-config.json
requires a set of additional parameters:
-
rootdir
¶ Required: String specifying the complete path to a directory where an output folder should be created
-
expname
¶ Required: String specifying the experiment name that will be incorporated into output files
-
c1-dir
¶ Required: String specifying the path to the directory containing the
_Probabilities.files
for the structural channel
-
c1-key
¶ Required: String that will serve as a key for the structural channel and will name the output files
-
twoD
¶ A boolean value that specifies, which PCA transformation functions will be used.
True
:brain.calculate_pca_median_2d()
andbrain.pca_transform_2d()
will be used to hold one axis constant, while the other two are realigned with PCAFalse
:brain.calculate_pca_median()
andbrain.pca_transform_3d()
will be used to transform and realign samples in all three dimensions
API¶
-
class
deltascope.mpTransformation.
paramsClass
(path)[source]¶ A class to read and validate parameters for multiprocessing transformation. Validated parameters can be read as attributes of the object
API¶
-
class
deltascope.
brain
[source]¶ Object to manage biological data and associated functions.
-
setup_test_data
(size=None, gthresh=0.5, scale=[1, 1, 1], microns=[0.16, 0.16, 0.21], mthresh=0.2, radius=20, comp_order=[0, 2, 1], fit_dim=['x', 'z'], deg=2)[source]¶ Setup a test dataset to use for testing transform coordinates :param int size: Number of points to sample for the test dataset
-
read_data
(filepath)[source]¶ Reads 3D data from file and selects appropriate channel based on the assumption that the channel with the most zeros has zero as the value for no signal
Parameters: filepath (str) – Filepath to hdf5 probability file Returns: Creates the variable brain.raw_data
-
raw_data
¶ Array of shape [z,y,x] containing raw probability data
-
-
create_dataframe
(data, scale)[source]¶ Creates a pandas dataframe containing the x,y,z and signal/probability value for each point in the
brain.raw_data
arrayParameters: - data (array) – Raw probability data in 3D array
- scale (array) – Array of length three containing the micron values for [x,y,z]
Returns: Pandas DataFrame with xyz and probability value for each point
-
plot_projections
(df, subset)[source]¶ Plots the x, y, and z projections of the input dataframe in a matplotlib plot
Parameters: - df (pd.DataFrame) – Dataframe with columns: ‘x’,’y’,’z’
- subset (float) – Value between 0 and 1 indicating what percentage of the df to subsample
Returns: Matplotlib figure with three labeled scatterplots
-
preprocess_data
(threshold, scale, microns)[source]¶ Thresholds and scales data prior to PCA
Creates
brain.threshold
,brain.df_thresh
, andbrain.df_scl
Parameters: - threshold (float) – Value between 0 and 1 to use as a cutoff for minimum pixel value
- scale (array) – Array with three values representing the constant by which to multiply x,y,z respectively
- microns (array) – Array with three values representing the x,y,z micron dimensions of the voxel
-
threshold
¶ Value used to threshold the data prior to calculating the model
-
df_thresh
¶ Dataframe containing only points with values above the specified threshold
-
df_scl
¶ Dataframe containing data from
brain.df_thresh
after a scaling value has been applied
-
process_alignment_data
(data, threshold, radius, microns)[source]¶ Applies a median filter twice to the data which is used for alignment
Ensures than any noise in the structural data does not interfere with alignment
Parameters: - data (array) – Raw data imported by the function
brain.read_data()
- threshold (float) – Value between 0 and 1 to use as a cutoff for minimum pixel value
- radius (int) – Integer that determines the radius of the circle used for the median filter
- microns (array) – Array with three values representing the x,y,z micron dimensions of the voxel
Returns: Dataframe containing data processed with the median filter and threshold
- data (array) – Raw data imported by the function
-
calculate_pca_median
(data, threshold, radius, microns)[source]¶ Calculate PCA transformation matrix,
brain.pcamed
, based on data (brain.pcamed
) after applying median filter and thresholdParameters: - data (array) – 3D array containing raw probability data
- threshold (float) – Value between 0 and 1 indicating the lower cutoff for positive signal
- radius (int) – Radius of neighborhood that should be considered for the median filter
- microns (array) – Array with three values representing the x,y,z micron dimensions of the voxel
-
median
¶ Pandas dataframe containing data that has been processed with a median filter twice and thresholded
-
pcamed
¶ PCA object managing the transformation matrix and any resulting transformations
-
calculate_pca_median_2d
(data, threshold, radius, microns)[source]¶ Calculate PCA transformation matrix for 2 dimensions of data,
brain.pcamed
, based on data after applying median filter and thresholdWarning
fit_dim is not used to determine which dimensions to fit. Defaults to x and z
Parameters: - data (array) – 3D array containing raw probability data
- threshold (float) – Value between 0 and 1 indicating the lower cutoff for positive signal
- radius (int) – Radius of neighborhood that should be considered for the median filter
- microns (array) – Array with three values representing the x,y,z micron dimensions of the voxel
-
pca_transform_2d
(df, pca, comp_order, fit_dim, deg=2, mm=None, vertex=None, flip=None)[source]¶ Transforms df in 2D based on the PCA object, pca, whose transformation matrix has already been calculated
Calling
brain.align_data()
createsbrain.df_align
Warning
fit_dim is not used to determine which dimensions to fit. Defaults to x and z
Parameters: - df (pd.DataFrame) – Dataframe containing thresholded xyz data
- pca (pca_object) – A pca object containing a transformation object, e.g.
brain.pcamed
- comp_order (array) – Array specifies the assignment of components to x,y,z. Form [x component index, y component index, z component index], e.g. [0,2,1]
- fit_dim (array) – Array of length two containing two strings describing the first and second axis for fitting the model, e.g. [‘x’,’z’]
- deg (int) – (or None) Degree of the function that should be fit to the model. deg=2 by default
- mm – (
math_model
or None) Math model for primary channel - vertex (array) – (or None) Array of type [vx,vy,vz] (
brain.vertex
) indicating the translation values - flip (Bool) – (or None) Boolean value to determine if the data should be rotated by 180 degrees
-
pca_transform_3d
(df, pca, comp_order, fit_dim, deg=2, mm=None, vertex=None, flip=None)[source]¶ Transforms df in 3D based on the PCA object, pca, whose transformation matrix has already been calculated
Parameters: - df (pd.DataFrame) – Dataframe containing thresholded xyz data
- pca (pca_object) – A pca object containing a transformation object, e.g.
brain.pcamed
- comp_order (array) – Array specifies the assignment of components to x,y,z. Form [x component index, y component index, z component index], e.g. [0,2,1]
- fit_dim (array) – Array of length two containing two strings describing the first and second axis for fitting the model, e.g. [‘x’,’z’]
- deg (int) – (or None) Degree of the function that should be fit to the model. deg=2 by default
- mm – (
math_model
or None) Math model for primary channel - vertex (array) – (or None) Array of type [vx,vy,vz] (
brain.vertex
) indicating the translation values - flip (Bool) – (or None) Boolean value to determine if the data should be rotated by 180 degrees
-
align_data
(df_fit, fit_dim, deg=2, mm=None, vertex=None, flip=None)[source]¶ Apply PCA transformation matrix and align data so that the vertex is at the origin
Creates
brain.df_align
andbrain.mm
Parameters: - df (pd.DataFrame) – dataframe containing thresholded xyz data
- comp_order (array) – Array specifies the assignment of components to x,y,z. Form [x component index, y component index, z component index], e.g. [0,2,1]
- fit_dim (array) – Array of length two containing two strings describing the first and second axis for fitting the model, e.g. [‘x’,’z’]
- deg (int) – (or None) Degree of the function that should be fit to the model. deg=2 by default
- mm – (
math_model
or None) Math model for primary channel - vertex (array) – (or None) Array of type [vx,vy,vz] (
brain.vertex
) indicating the translation values - flip (Bool) – (or None) Boolean value to determine if the data should be rotated by 180 degrees
-
df_align
¶ Dataframe containing point data aligned using PCA
-
mm
¶ Math model object fit to data in brain object
-
flip_data
(df)[source]¶ Rotate data by 180 degrees
Parameters: df (dataframe) – Pandas dataframe containing x,y,z data Returns: Rotated dataframe
-
fit_model
(df, deg, fit_dim)[source]¶ Fit model to dataframe
Parameters: - df (pd.DataFrame) – Dataframe containing at least x,y,z
- deg (int) – Degree of the function that should be fit to the model
- fit_dim (array) – Array of length two containing two strings describing the first and second axis for fitting the model, e.g. [‘x’,’z’]
Returns: math model
Return type:
-
find_distance
(t, point)[source]¶ Find euclidean distance between math model(t) and data point in the xy plane
Parameters: - t (float) – float value defining point on the line
- point (array) – array [x,y] defining data point
Returns: distance between the two points
Return type: float
-
find_min_distance
(row)[source]¶ Find the point on the curve that produces the minimum distance between the point and the data point using scipy.optimize.minimize(
brain.find_distance()
)Parameters: row (pd.Series) – row from dataframe in the form of a pandas Series Returns: point in the curve (xc, yc, zc) and r Return type: floats
-
integrand
(x)[source]¶ Function to integrate to calculate arclength
Parameters: x (float) – integer value for x Returns: arclength value for integrating Return type: float
-
find_arclength
(xc)[source]¶ Calculate arclength by integrating the derivative of the math model in xy plane
Parameters: row (float) – Postion in the x axis along the curve Returns: Length of the arc along the curve between the row and the vertex Return type: float
-
find_theta
(row, zc, yc)[source]¶ Calculate theta for a row containing data point in relationship to the xz plane
Parameters: - row (pd.Series) – row from dataframe in the form of a pandas Series
- yc (float) – Y position of the closest point in the curve to the data point
- zc (float) – Z position of the closest point in the curve to the data point
Returns: theta, angle between point and the model plane
Return type: float
-
find_r
(row, zc, yc, xc)[source]¶ Calculate r using the Pythagorean theorem
Parameters: - row (pd.Series) – row from dataframe in the form of a pandas Series
- yc (float) – Y position of the closest point in the curve to the data point
- zc (float) – Z position of the closest point in the curve to the data point
- xc (float) – X position of hte closest point in the curve to the data point
Returns: r, distance between the point and the model
Return type: float
-
calc_coord
(row)[source]¶ Calculate alpah, r, theta for a particular row
Parameters: row (pd.Series) – row from dataframe in the form of a pandas Series Returns: pd.Series populated with coordinate of closest point on the math model, r, theta, and ac (arclength)
-
transform_coordinates
()[source]¶ Transform coordinate system so that each point is defined relative to math model by (alpha,theta,r) (only applied to
brain.df_align
)Returns: appends columns r, xc, yc, zc, ac, theta to brain.df_align
-
subset_data
(df, sample_frac=0.5)[source]¶ Takes a random sample of the data based on the value between 0 and 1 defined for sample_frac
Creates the variable
brain.subset
Parameters: - pd.DataFrame – Dataframe which will be sampled
- sample_frac (float) – (or None) Value between 0 and 1 specifying proportion of the dataset that should be randomly sampled for plotting
-
subset
¶ Random sample of the input dataframe
-
add_thresh_df
(df)[source]¶ Adds dataframe of thresholded and transformed data to
brain.df_thresh
Parameters: df (pd.DataFrame) – dataframe of thesholded and transformed data Returns: brain.df_thresh
-
add_aligned_df
(df)[source]¶ Adds dataframe of aligned data
Warning
Calculates model, but assumes that the dimensions of the fit are x and z
Parameters: df (pd.DataFrame) – Dataframe of aligned data Returns: brain.df_align
-
-
class
deltascope.
embryo
(name, number, outdir)[source]¶ Class to managed multiple brain objects in a multichannel sample
Parameters: - name (str) – Name of this sample set
- number (str) – Sample number corresponding to this embryo
- outdir (str) – Path to directory for output files
-
outdir
¶ Path to directory for output files
-
name
¶ Name of this sample set
-
number
¶ Sample number corresponding to this embryo
-
add_channel
(filepath, key)[source]¶ Add channel to
embryo.chnls
dictionaryParameters: - filepath (str) – Complete filepath to image
- key (str) – Name of the channel
-
process_channels
(mthresh, gthresh, radius, scale, microns, deg, primary_key, comp_order, fit_dim)[source]¶ Process all channels through the production of the
brain.df_align
dataframeParameters: - mthresh (float) – Value between 0 and 1 to use as a cutoff for minimum pixel value for median data
- gthresh (float) – Value between 0 and 1 to use as a cutoff for minimum pixel value for general data
- radius (int) – Size of the neighborhood area to examine with median filter
- scale (array) – Array with three values representing the constant by which to multiply x,y,z respectively
- microns (array) – Array with three values representing the x,y,z micron dimensions of the voxel
- deg (int) – Degree of the function that should be fit to the model
- primary_key (str) – Key for the primary structural channel which PCA and the model should be fit too
- comp_order (array) – Array specifies the assignment of components to x,y,z. Form [x component index, y component index, z component index], e.g. [0,2,1]
- fit_dim (array) – Array of length two containing two strings describing the first and second axis for fitting the model, e.g. [‘x’,’z’]
-
save_projections
(subset)[source]¶ Save projections of both channels into png files in
embryo.outdir
following the naming scheme [embryo.name
]_[embryo.number
]_[channel name]_MIP.pngParameters: subset (float) – Value between 0 and 1 to specify the fraction of the data to randomly sample for plotting
-
save_psi
()[source]¶ Save all channels into psi files following the naming scheme [
embryo.name
]_[embryo.number
]_[channel name].psi
-
class
deltascope.
math_model
(model)[source]¶ Object to contain attributes associated with the math model of a sample
Parameters: model (array) – Array of coefficients calculated by np.polyfit -
cf
¶ Array of coefficients for the math model
-
p
¶ Poly1d function for the math model to allow calculation and plotting of the model
-
-
class
deltascope.
landmarks
(percbins=[10, 50, 90], rnull=15)[source]¶ Class to handle calculation of landmarks to describe structural data
Parameters: - percbins (list) – (or None) Must be a list of integers between 0 and 100
- rnull (int) – (or None) When the r value cannot be calculated it will be set to this value
-
brain.
lm_wt_rf
¶ pd.DataFrame, which wildtype landmarks will be added to
-
brain.
lm_mt_rf
¶ pd.DataFrame, which mutant landmarks will be added to
-
brain.
rnull
¶ Integer specifying the value which null landmark calculations will be set to
-
brain.
percbins
¶ Integer specifying the percentiles which will be used to calculate landmarks
-
calc_bins
(Ldf, ac_num, tstep)[source]¶ Calculates alpha and theta bins based on ac_num and tstep
Creates
landmarks.acbins
andlandmarks.tbins
Warning
tstep does not handle scenarios where 2pi is not evenly divisible by tstep
Parameters: - Ldf (dict) – Dict dataframes that are being used for the analysis
- ac_num (int) – Integer indicating the number of divisions that should be made along alpha
- tstep (float) – The size of each bin used for alpha
-
acbins
¶ List containing the boundaries of each bin along alpha based on ac_num
-
tbins
¶ List containing the boundaries of each bin along theta based on tstep
-
calc_perc
(df, snum, dtype, out)[source]¶ Calculate landmarks for a dataframe based on the bins and percentiles that have been previously defined
Parameters: - df (pd.DataFrame) – Dataframe containing columns x,y,z,alpha,r,theta
- snum (str) – String containing a sample identifier that can be converted to an integer
- dtype (str) – String describing the sample group to which the sample belongs, e.g. control or experimental
Returns: pd.DataFrame with new landmarks appended
-
deltascope.
reformat_to_cart
(df)[source]¶ Take a dataframe in which columns contain the bin parameters and convert to a cartesian coordinate system
Parameters: df (pd.DataFrame) – Dataframe containing columns with string names that contain the bin parameter Returns: pd.DataFrame with each landmark as a row and columns: x,y,z,r,r_std,t,pts
-
deltascope.
convert_to_arr
(xarr, tarr, DT, mdf, Ldf=[])[source]¶ Convert a pandas dataframe containing landmarks as columns and samples as rows into a 3D numpy array
The columns of mdf determine which landmarks will be saved into the array. Any additional dataframes that need to be converted can be included in Ldf
Parameters: - xarr (np.array) – Array containing all unique x values of landmarks in the dataset
- tarr (np.array) – Array containing all unique t values of landmarks in the dataset
- DT (str) – Either
r
orpts
indicating which data type should be saved to the array - mdf (pd.DataFrame) – Main landmark dataframe containing landmarks as columns and samples as rows
- Ldf (list) – List of additional pd.DataFrames that should also be converted to arrays
Returns: Array of the main dataframe and list of arrays converted from Ldf
-
deltascope.
calc_variance
(anum, dfs)[source]¶ Calculate the variance between samples according to bin position and variance between adjacent bins
Parameters: - anum (int) – Number of bins which the arclength axis should be divided into
- dfs (dict) – Dictionary of dfs which are going to be processed
Returns: Two arrays: svar (anum,tnum) and bvar (anum*tnum,snum)
Return type: np.array
-
deltascope.
subplot_lmk
(ax, p, avg, sem, parr, xarr, tarr, dtype, Pn={'alpha': 0.3, 'cmap': 'Greys_r', 'mtc': 'r', 'tarr': None, 'wtc': 'b', 'xarr': None, 'zfb': 1, 'zln': 2, 'zpt': 3})[source]¶ Plot a ribbon of average and standard error of the mean onto the subplot, ax
Parameters: - ax (plt.Subplot) – Matplotlib subplot onto which the data should be plotted
- p (list) – List of two theta values that should be plotted
- avg (np.array) – Array of shape (xvalues,tvalues) containing the average values of the data
- sem (np.array) – Array of shape (xvalues,tvalues) containing the standard error of the mean values of the data
- parr (np.array) – Array of shape (xvalues,tvalues) containing the p values for the data
- dtype (str) – String describing sample type
- Pn – Dictionary containing the following values: ‘zln’:2,’zpt’:3,’zfb’:1,’wtc’:’b’,’mtc’:’r’,’alpha’:0.3,’cmap’:’Greys_r’
Type: dict or None
-
deltascope.
write_header
(f)[source]¶ Writes header for PSI file with columns Id,x,y,z,ac,r,theta
Parameters: f (file) – file object created by ‘open(filename,’w’)`
-
deltascope.
write_data
(filepath, df)[source]¶ Writes data in PSI format to file after writing header using
write_header()
. Closes file at the conclusion of writing data.Parameters: - filepath (str) – Complete filepath to output file
- df (pd.DataFrame) – dataframe containing columns x,y,z,ac,r,theta
-
deltascope.
read_psi
(filepath)[source]¶ Reads psi file at the given filepath and returns data in a pandas DataFrame
Parameters: filepath (str) – Complete filepath to file Returns: pd.Dataframe containing data
-
deltascope.
read_psi_to_dict
(directory, dtype)[source]¶ Read psis from directory into dictionary of dfs with filtering based on dtype
Parameters: - directory (str) – Directory to get psis from
- dtype (str) – Usually ‘AT’ or ‘ZRF1’
Returns: Dictionary of pd.DataFrame
-
deltascope.
process_sample
(num, root, outdir, name, chs, prefixes, threshold, scale, deg, primary_key, comp_order, fit_dim, flip_dim)[source]¶ Process single sample through
brain
class and saves df to csvWarning
Out of date and will probably fail
Parameters: - num (str) – Sample number
- root (str) – Complete path to the root directory for this sample set
- name (str) – Name describing this sample set
- outdir (str) – Complete path to output directory
- chs (array) – Array containing strings specifying the directories for each channel
- prefixes (array) – Array containing strings specifying the file prefix for each channel
- threshold (float) – Value between 0 and 1 to use as a cutoff for minimum pixel value
- scale (array) – Array with three values representing the constant by which to multiply x,y,z respectively
- deg (int) – Degree of the function that should be fit to the model
- primary_key (str) – Key for the primary structural channel which PCA and the model should be fit too
-
deltascope.
calculate_models
(Ldf)[source]¶ Calculate model for each dataframe in list and add to new dataframe
Parameters: Ldf (list) – List of dataframes containing aligned data Returns: pd.Dataframe with a,b,c values for parabolic model
-
deltascope.
generate_kde
(data, var, x, absv=False)[source]¶ Generate list of KDEs from either dictionary or list of data
Parameters: - data – pd.DataFrames to convert
- var (str) – Name of column to select from df
- x (array) – Array of datapoints to evaluate KDE on
- absv (bool) – (or None) Set to True to use absolute value of selected data for KDE calculation
Type: dict or list
Returns: List of KDE arrays
-
deltascope.
calculate_area_error
(pdf, Lkde, x)[source]¶ Calculate area between PDF and each kde in Lkde
Parameters: - pdf (array) – Array of probability distribution function that is the same shape as kdes in Lkde
- Lkde (list) – List of arrays of Kdes
- x (array) – Array of datapoints used to generate pdf and kdes
Returns: List of error values for each kde in Lkde
-
deltascope.
rescale_variable
(Ddfs, var, newvar)[source]¶ Rescale variable from -1 to 1 and save in newvar column on original dataframe
Parameters: - Ddfs (dict) – Dictionary of pd.DataFrames
- var (str) – Name of column to select from dfs
- newvar (str) – Name to use for new data in appended column
Returns: Dictionary of dataframes containing column of rescaled data
-
class
deltascope.
paramsClass
(path=None, dparams=None)[source]¶ A class to read and validate parameters for multiprocessing transformation. Validated parameters can be read as attributes of the object