deltascope Data Processing¶
This guide will describe a set of parameters that the user needs to specify when running deltascope. A complete list of all parameters if available on the Parameter Reference page.
At this point, each sample/channel should have been processed by Ilastik to create a new
c1_10_Probabilities.h5 file. If you open this file in Fiji utilizing the fiji HDF5 plugin, it should contain three dimensions and two channels (signal and background, but for our purposes the two are interchangeable). Each pixel should have a value ranging between 0 and 1. If we are inspecting the signal channel, pixels with a value close to 0 (low p value) are highly likely to be true signal. Correspondingly, pixels with a value close to 1 are likely to be background. The
c1_10_Probabilities.h5 file by Ilastik contains two channels (signal and background), which are inverse images. For a given pixel, the background intensity value is 1 minus the signal intensity value. In order to simplify our data, we will apply a threshold that will divide the data into two sets of pixels: signal and background. In the steps that follow, we will only use the set of pixels that correspond to true signal. This set of pixels may be also referred to as a set of points or a point cloud. In order to avoid keeping track of which channel is the signal channel, we assume that after applying a threshold there will be more points will fall in the background group than in the signal group. If this is not true, better Ilastik training or a stricter threshold is reccomended, or deltascope will instead be examining the structure of your background.
If your data contains more points of signal than background, deltascope will select your background channel as the signal channel. In order to correct this assumption, the inequality in
brain.read_data() will need to be changed to equate a large number of points with signal instead of background.
In order to divide the image into two sets of pixels (signal and background), we use a parameter,
genthresh, to determine which group the pixels fall into. We have found that a value of 0.5 is sufficient to divide the data and have not found that the results vary greatly if it is changed; however, if your data contains a lot of intermediate background values (0.4-0.7), you may benefit from a smaller threshold, e.g. 0.3.
The following code requires a parameter
micron, which specifies the dimensions of the image voxel in microns. It is a list of the form
[x,y,z]. This information can typically be found in the metadata of a microscope collection file.
This section of the code also includes a deprecated parameter
scale, which must be set to
The following instructions apply to processing a single sample. Details regarding function parameters can be found under
brain.preprocess_data(). Raw image data will be imported by
import deltascope #Create an embryo object that facilitates data processing e = deltascope.embryo(experiment-name,sample-number,directory) #For each channel in your sample, add a channel with a unique name, e.g. 'c1' or 'c2' e.add_channel(c1-filepath,c1-name) e.add_channel(c2-filepath,c2-name) #Threshold each channel and scale points according to voxel dimension in microns e.chnls[c1-name].preprocess_data(genthresh,scale,microns) e.chnls[c2-name].preprocess_data(genthresh,scale,microns)
Sample alignment using principle component analysis (PCA)¶
Principle Component Analysis¶
During typical collections of biological samples, each sample will be oriented slightly differently in relation to the microscope due to variations in the shape and size of the sample as well as human error during the mounting process. As a result of this variation, we cannot directly compare samples in 3D space without realigning them. We have implemented principle component analysis in order to automate the process of alignment without a need for human supervision. Fig. 1 shown below illustrates how PCA can be used to align a set of points in 2D. deltascope uses the same process in 3D to identify biologically meaningful axes present in biological structures.
Structural Channel Processing¶
We designate one channel, the
structural channel, which we will use for PCA to align samples. Since we are interested in the gross morphology of this channel, we apply two data preprocessing steps to reduce the data down to only essential points. First, we return to
brain.raw_data and apply a new threshold,
medthresh, which is typically more stringent than
genthresh. This step ensures we are only considering points of signal with extremely high likelihood of being real. Second, we apply a median filter to the data twice, which smooths out the structure and eliminates small points of variation that may interfere with the alignment process of the gross structure.
PCA outputs three new dimensions, the 1st, 2nd, and 3rd PCs. These components will be reassigned to the X, Y, and Z axes to help the user maintain orientation in regards to their data. In the case of the zebrafish post-optic commissure shown below (Fig. 2), the 1st PC is reassigned to the X axis and the 2nd and 3rd PCs are assigned to Z and Y respectively. These assignments honor the user’s expectation of the sample’s alignment in 3D space. The assignment of components to axes can be modified using the parameter
In order for PCA to consistently align your samples in the same orientation, we are assuming that the three dimensions of your structure are of different relative sizes. Since PCA looks for the axes that capture the most variation in your data, a sample that has axes of the same relative size will not have any distinguishing characteristics that PCA can use to identify and separate different axes.
In addition to rotating the orientation of the data in 3D space, we also want to align the center of all samples at the origin. In order to determine the center of the data, we fit a simple mathematical model to the data that will also be used later in the analysis. The zebrafish post optic commissure shown above forms a parabolic structure, which can be described by y = ax^2 + bx + c. For simplicity, we fit the model in two dimensions, while holding one dimension constant. In the case of the POC, the parabolic structure lies flat in the XZ plane, which means that the structure can be described using exclusively the X and Z dimensions. The dimensions, which will be used to fit the 2D model, are specified in the parameter
fitdim. Additionally the
deg parameter specifies the degree of the function that fits to the data.
medthresh is typically set to 0.25, in comparison to a value of 0.5 for
genthresh. If your data contains aberrant signal that does not contribute to the gross morphology of the structure, an even lower
medthresh may help limit the negative influence of noisy signal. Additionally, the
radius of the median filter can also be tuned to eliminate noisy signal. The typical value for
radius is 20, which refers to the number of neighboring points that are considered in the median filter. A smaller value for
radius will preserve small variation in signal, while a larger value will cause even more blunting and smoothing of the data. Prior to running the median threshold, it is reccomended that the user load several HDF5 files containing the structure of interest into FIJI and test several median thresholds to determine which best resolves their structure. Utilize the best median threshold radius in
comporder parameter controls how principle components are reassigned to the typical Cartesian coordinate system (XYZ) that most users are familiar with. It takes the form of an array of length 3 that specifies the index of the component that will be assigned to the X, Y, or Z axis:
[x index,y index,z index]. Please note that the index that matches each principle component starts counting at 0, e.g. 1st PC = 0, 2nd PC = 1, and 3rd PC = 2. For example, if we want to assign the 1st PC to the x axis, the 2nd to the Z axis, and the 3rd to the y axis, the
comporder parameter would be
Finally, the remaining two parameters determines how the model will be fit to the data.
fitdim determines which 2 axes will be used to fit the 2D model. It takes the form of a list of 2 of the 3 dimensions specified as a lowercase string, e.g.
'x','y','z'. If we wanted to fit a model in the XZ plane, while holding the Y axis constant, the
fitdim parameter would be
deg specifies the degree of the function that will be fit to the data. The default is
2, which specifies a parabolic function. A deg of
1 would fit a linear function, eg. y=mx + b.
The ability to specify degrees other than 2 is still being developed. Check here for updates.
#Run PCA on the structural channel, in this case, c1 e.chnls['c1'].calculate_pca_median(e.chnls['c1'].raw_data,medthresh,radius,microns) #Save the pca object that includes the transformation matrix pca = e.chnls['c1'].pcamed #Transform the structural channel using the saved pca object e.chnls['c1'].pca_transform_3d(e.chnls['c1'].df_thresh,pca,comporder,fitdim,deg=2) #Save the mathematical model and vertex (center point) of the structural channel mm = e.chnls['AT'].mm vertex = e.chnls['AT'].vertex #Transform any additional channels using the pca object calculated based on the structural channel e.chnls['c2'].pca_transform_3d(e.chnls['c2'].df_thresh,pca,comporder,fitdim,deg=2,mm=mm,vertex=vertex)
The ultimate goal with this workflow of data processing is to enable us to study small differences in biological 3D structures when comparing a set of control samples to experimental samples. While our samples are now aligned to all fall in the same region of 3D space, our points are still defined by the xyz coordinates defined by the microscope. In order to detect changes in our structure, we will define the position of points relative to the structure using a cylindrical coordinate system. We will rely on the previously defined mathematical model,
brain.mm, to represent the underlying shape of our structure. From there we will define the position of each point relative to the mathematical model (Fig. 3). The first dimension, R, is defined as the shortest distance between the point and the model. Second, alpha is defined as the distance from the point’s intersection with the model to the midline or center of the structure. Third, the position of the point around the model is defined in theta. Following the completion of the transformation, the final dataset is saved to a
This transformation does not require defining any parameters; however, it assumes the data has already been thresholded and aligned using PCA.
#Transform each channel to cylindrical coordinates e.chnls['c1'].transform_coordinates() e.chnls['c2'].transform_coordinates() #Save processed data to .psi file e.save_psi()
This processing step is time consuming. We recommend running multiple samples in parallel in order to reduce the total amount of computational time required.