API¶
-
class
deltascope.
brain
[source]¶ Object to manage biological data and associated functions.
-
setup_test_data
(size=None, gthresh=0.5, scale=[1, 1, 1], microns=[0.16, 0.16, 0.21], mthresh=0.2, radius=20, comp_order=[0, 2, 1], fit_dim=['x', 'z'], deg=2)[source]¶ Setup a test dataset to use for testing transform coordinates :param int size: Number of points to sample for the test dataset
-
read_data
(filepath)[source]¶ Reads 3D data from file and selects appropriate channel based on the assumption that the channel with the most zeros has zero as the value for no signal
Parameters: filepath (str) – Filepath to hdf5 probability file Returns: Creates the variable brain.raw_data
-
raw_data
¶ Array of shape [z,y,x] containing raw probability data
-
-
create_dataframe
(data, scale)[source]¶ Creates a pandas dataframe containing the x,y,z and signal/probability value for each point in the
brain.raw_data
arrayParameters: - data (array) – Raw probability data in 3D array
- scale (array) – Array of length three containing the micron values for [x,y,z]
Returns: Pandas DataFrame with xyz and probability value for each point
-
plot_projections
(df, subset)[source]¶ Plots the x, y, and z projections of the input dataframe in a matplotlib plot
Parameters: - df (pd.DataFrame) – Dataframe with columns: ‘x’,’y’,’z’
- subset (float) – Value between 0 and 1 indicating what percentage of the df to subsample
Returns: Matplotlib figure with three labeled scatterplots
-
preprocess_data
(threshold, scale, microns)[source]¶ Thresholds and scales data prior to PCA
Creates
brain.threshold
,brain.df_thresh
, andbrain.df_scl
Parameters: - threshold (float) – Value between 0 and 1 to use as a cutoff for minimum pixel value
- scale (array) – Array with three values representing the constant by which to multiply x,y,z respectively
- microns (array) – Array with three values representing the x,y,z micron dimensions of the voxel
-
threshold
¶ Value used to threshold the data prior to calculating the model
-
df_thresh
¶ Dataframe containing only points with values above the specified threshold
-
df_scl
¶ Dataframe containing data from
brain.df_thresh
after a scaling value has been applied
-
process_alignment_data
(data, threshold, radius, microns)[source]¶ Applies a median filter twice to the data which is used for alignment
Ensures than any noise in the structural data does not interfere with alignment
Parameters: - data (array) – Raw data imported by the function
brain.read_data()
- threshold (float) – Value between 0 and 1 to use as a cutoff for minimum pixel value
- radius (int) – Integer that determines the radius of the circle used for the median filter
- microns (array) – Array with three values representing the x,y,z micron dimensions of the voxel
Returns: Dataframe containing data processed with the median filter and threshold
- data (array) – Raw data imported by the function
-
calculate_pca_median
(data, threshold, radius, microns)[source]¶ Calculate PCA transformation matrix,
brain.pcamed
, based on data (brain.pcamed
) after applying median filter and thresholdParameters: - data (array) – 3D array containing raw probability data
- threshold (float) – Value between 0 and 1 indicating the lower cutoff for positive signal
- radius (int) – Radius of neighborhood that should be considered for the median filter
- microns (array) – Array with three values representing the x,y,z micron dimensions of the voxel
-
median
¶ Pandas dataframe containing data that has been processed with a median filter twice and thresholded
-
pcamed
¶ PCA object managing the transformation matrix and any resulting transformations
-
calculate_pca_median_2d
(data, threshold, radius, microns)[source]¶ Calculate PCA transformation matrix for 2 dimensions of data,
brain.pcamed
, based on data after applying median filter and thresholdWarning
fit_dim is not used to determine which dimensions to fit. Defaults to x and z
Parameters: - data (array) – 3D array containing raw probability data
- threshold (float) – Value between 0 and 1 indicating the lower cutoff for positive signal
- radius (int) – Radius of neighborhood that should be considered for the median filter
- microns (array) – Array with three values representing the x,y,z micron dimensions of the voxel
-
pca_transform_2d
(df, pca, comp_order, fit_dim, deg=2, mm=None, vertex=None, flip=None)[source]¶ Transforms df in 2D based on the PCA object, pca, whose transformation matrix has already been calculated
Calling
brain.align_data()
createsbrain.df_align
Warning
fit_dim is not used to determine which dimensions to fit. Defaults to x and z
Parameters: - df (pd.DataFrame) – Dataframe containing thresholded xyz data
- pca (pca_object) – A pca object containing a transformation object, e.g.
brain.pcamed
- comp_order (array) – Array specifies the assignment of components to x,y,z. Form [x component index, y component index, z component index], e.g. [0,2,1]
- fit_dim (array) – Array of length two containing two strings describing the first and second axis for fitting the model, e.g. [‘x’,’z’]
- deg (int) – (or None) Degree of the function that should be fit to the model. deg=2 by default
- mm – (
math_model
or None) Math model for primary channel - vertex (array) – (or None) Array of type [vx,vy,vz] (
brain.vertex
) indicating the translation values - flip (Bool) – (or None) Boolean value to determine if the data should be rotated by 180 degrees
-
pca_transform_3d
(df, pca, comp_order, fit_dim, deg=2, mm=None, vertex=None, flip=None)[source]¶ Transforms df in 3D based on the PCA object, pca, whose transformation matrix has already been calculated
Parameters: - df (pd.DataFrame) – Dataframe containing thresholded xyz data
- pca (pca_object) – A pca object containing a transformation object, e.g.
brain.pcamed
- comp_order (array) – Array specifies the assignment of components to x,y,z. Form [x component index, y component index, z component index], e.g. [0,2,1]
- fit_dim (array) – Array of length two containing two strings describing the first and second axis for fitting the model, e.g. [‘x’,’z’]
- deg (int) – (or None) Degree of the function that should be fit to the model. deg=2 by default
- mm – (
math_model
or None) Math model for primary channel - vertex (array) – (or None) Array of type [vx,vy,vz] (
brain.vertex
) indicating the translation values - flip (Bool) – (or None) Boolean value to determine if the data should be rotated by 180 degrees
-
align_data
(df_fit, fit_dim, deg=2, mm=None, vertex=None, flip=None)[source]¶ Apply PCA transformation matrix and align data so that the vertex is at the origin
Creates
brain.df_align
andbrain.mm
Parameters: - df (pd.DataFrame) – dataframe containing thresholded xyz data
- comp_order (array) – Array specifies the assignment of components to x,y,z. Form [x component index, y component index, z component index], e.g. [0,2,1]
- fit_dim (array) – Array of length two containing two strings describing the first and second axis for fitting the model, e.g. [‘x’,’z’]
- deg (int) – (or None) Degree of the function that should be fit to the model. deg=2 by default
- mm – (
math_model
or None) Math model for primary channel - vertex (array) – (or None) Array of type [vx,vy,vz] (
brain.vertex
) indicating the translation values - flip (Bool) – (or None) Boolean value to determine if the data should be rotated by 180 degrees
-
df_align
¶ Dataframe containing point data aligned using PCA
-
mm
¶ Math model object fit to data in brain object
-
flip_data
(df)[source]¶ Rotate data by 180 degrees
Parameters: df (dataframe) – Pandas dataframe containing x,y,z data Returns: Rotated dataframe
-
fit_model
(df, deg, fit_dim)[source]¶ Fit model to dataframe
Parameters: - df (pd.DataFrame) – Dataframe containing at least x,y,z
- deg (int) – Degree of the function that should be fit to the model
- fit_dim (array) – Array of length two containing two strings describing the first and second axis for fitting the model, e.g. [‘x’,’z’]
Returns: math model
Return type:
-
find_distance
(t, point)[source]¶ Find euclidean distance between math model(t) and data point in the xy plane
Parameters: - t (float) – float value defining point on the line
- point (array) – array [x,y] defining data point
Returns: distance between the two points
Return type: float
-
find_min_distance
(row)[source]¶ Find the point on the curve that produces the minimum distance between the point and the data point using scipy.optimize.minimize(
brain.find_distance()
)Parameters: row (pd.Series) – row from dataframe in the form of a pandas Series Returns: point in the curve (xc, yc, zc) and r Return type: floats
-
integrand
(x)[source]¶ Function to integrate to calculate arclength
Parameters: x (float) – integer value for x Returns: arclength value for integrating Return type: float
-
find_arclength
(xc)[source]¶ Calculate arclength by integrating the derivative of the math model in xy plane
Parameters: row (float) – Postion in the x axis along the curve Returns: Length of the arc along the curve between the row and the vertex Return type: float
-
find_theta
(row, zc, yc)[source]¶ Calculate theta for a row containing data point in relationship to the xz plane
Parameters: - row (pd.Series) – row from dataframe in the form of a pandas Series
- yc (float) – Y position of the closest point in the curve to the data point
- zc (float) – Z position of the closest point in the curve to the data point
Returns: theta, angle between point and the model plane
Return type: float
-
find_r
(row, zc, yc, xc)[source]¶ Calculate r using the Pythagorean theorem
Parameters: - row (pd.Series) – row from dataframe in the form of a pandas Series
- yc (float) – Y position of the closest point in the curve to the data point
- zc (float) – Z position of the closest point in the curve to the data point
- xc (float) – X position of hte closest point in the curve to the data point
Returns: r, distance between the point and the model
Return type: float
-
calc_coord
(row)[source]¶ Calculate alpah, r, theta for a particular row
Parameters: row (pd.Series) – row from dataframe in the form of a pandas Series Returns: pd.Series populated with coordinate of closest point on the math model, r, theta, and ac (arclength)
-
transform_coordinates
()[source]¶ Transform coordinate system so that each point is defined relative to math model by (alpha,theta,r) (only applied to
brain.df_align
)Returns: appends columns r, xc, yc, zc, ac, theta to brain.df_align
-
subset_data
(df, sample_frac=0.5)[source]¶ Takes a random sample of the data based on the value between 0 and 1 defined for sample_frac
Creates the variable
brain.subset
Parameters: - pd.DataFrame – Dataframe which will be sampled
- sample_frac (float) – (or None) Value between 0 and 1 specifying proportion of the dataset that should be randomly sampled for plotting
-
subset
¶ Random sample of the input dataframe
-
add_thresh_df
(df)[source]¶ Adds dataframe of thresholded and transformed data to
brain.df_thresh
Parameters: df (pd.DataFrame) – dataframe of thesholded and transformed data Returns: brain.df_thresh
-
add_aligned_df
(df)[source]¶ Adds dataframe of aligned data
Warning
Calculates model, but assumes that the dimensions of the fit are x and z
Parameters: df (pd.DataFrame) – Dataframe of aligned data Returns: brain.df_align
-
-
class
deltascope.
embryo
(name, number, outdir)[source]¶ Class to managed multiple brain objects in a multichannel sample
Parameters: - name (str) – Name of this sample set
- number (str) – Sample number corresponding to this embryo
- outdir (str) – Path to directory for output files
-
outdir
¶ Path to directory for output files
-
name
¶ Name of this sample set
-
number
¶ Sample number corresponding to this embryo
-
add_channel
(filepath, key)[source]¶ Add channel to
embryo.chnls
dictionaryParameters: - filepath (str) – Complete filepath to image
- key (str) – Name of the channel
-
process_channels
(mthresh, gthresh, radius, scale, microns, deg, primary_key, comp_order, fit_dim)[source]¶ Process all channels through the production of the
brain.df_align
dataframeParameters: - mthresh (float) – Value between 0 and 1 to use as a cutoff for minimum pixel value for median data
- gthresh (float) – Value between 0 and 1 to use as a cutoff for minimum pixel value for general data
- radius (int) – Size of the neighborhood area to examine with median filter
- scale (array) – Array with three values representing the constant by which to multiply x,y,z respectively
- microns (array) – Array with three values representing the x,y,z micron dimensions of the voxel
- deg (int) – Degree of the function that should be fit to the model
- primary_key (str) – Key for the primary structural channel which PCA and the model should be fit too
- comp_order (array) – Array specifies the assignment of components to x,y,z. Form [x component index, y component index, z component index], e.g. [0,2,1]
- fit_dim (array) – Array of length two containing two strings describing the first and second axis for fitting the model, e.g. [‘x’,’z’]
-
save_projections
(subset)[source]¶ Save projections of both channels into png files in
embryo.outdir
following the naming scheme [embryo.name
]_[embryo.number
]_[channel name]_MIP.pngParameters: subset (float) – Value between 0 and 1 to specify the fraction of the data to randomly sample for plotting
-
save_psi
()[source]¶ Save all channels into psi files following the naming scheme [
embryo.name
]_[embryo.number
]_[channel name].psi
-
class
deltascope.
math_model
(model)[source]¶ Object to contain attributes associated with the math model of a sample
Parameters: model (array) – Array of coefficients calculated by np.polyfit -
cf
¶ Array of coefficients for the math model
-
p
¶ Poly1d function for the math model to allow calculation and plotting of the model
-
-
class
deltascope.
landmarks
(percbins=[10, 50, 90], rnull=15)[source]¶ Class to handle calculation of landmarks to describe structural data
Parameters: - percbins (list) – (or None) Must be a list of integers between 0 and 100
- rnull (int) – (or None) When the r value cannot be calculated it will be set to this value
-
brain.
lm_wt_rf
¶ pd.DataFrame, which wildtype landmarks will be added to
-
brain.
lm_mt_rf
¶ pd.DataFrame, which mutant landmarks will be added to
-
brain.
rnull
¶ Integer specifying the value which null landmark calculations will be set to
-
brain.
percbins
¶ Integer specifying the percentiles which will be used to calculate landmarks
-
calc_bins
(Ldf, ac_num, tstep)[source]¶ Calculates alpha and theta bins based on ac_num and tstep
Creates
landmarks.acbins
andlandmarks.tbins
Warning
tstep does not handle scenarios where 2pi is not evenly divisible by tstep
Parameters: - Ldf (dict) – Dict dataframes that are being used for the analysis
- ac_num (int) – Integer indicating the number of divisions that should be made along alpha
- tstep (float) – The size of each bin used for alpha
-
acbins
¶ List containing the boundaries of each bin along alpha based on ac_num
-
tbins
¶ List containing the boundaries of each bin along theta based on tstep
-
calc_perc
(df, snum, dtype, out)[source]¶ Calculate landmarks for a dataframe based on the bins and percentiles that have been previously defined
Parameters: - df (pd.DataFrame) – Dataframe containing columns x,y,z,alpha,r,theta
- snum (str) – String containing a sample identifier that can be converted to an integer
- dtype (str) – String describing the sample group to which the sample belongs, e.g. control or experimental
Returns: pd.DataFrame with new landmarks appended
-
deltascope.
reformat_to_cart
(df)[source]¶ Take a dataframe in which columns contain the bin parameters and convert to a cartesian coordinate system
Parameters: df (pd.DataFrame) – Dataframe containing columns with string names that contain the bin parameter Returns: pd.DataFrame with each landmark as a row and columns: x,y,z,r,r_std,t,pts
-
deltascope.
convert_to_arr
(xarr, tarr, DT, mdf, Ldf=[])[source]¶ Convert a pandas dataframe containing landmarks as columns and samples as rows into a 3D numpy array
The columns of mdf determine which landmarks will be saved into the array. Any additional dataframes that need to be converted can be included in Ldf
Parameters: - xarr (np.array) – Array containing all unique x values of landmarks in the dataset
- tarr (np.array) – Array containing all unique t values of landmarks in the dataset
- DT (str) – Either
r
orpts
indicating which data type should be saved to the array - mdf (pd.DataFrame) – Main landmark dataframe containing landmarks as columns and samples as rows
- Ldf (list) – List of additional pd.DataFrames that should also be converted to arrays
Returns: Array of the main dataframe and list of arrays converted from Ldf
-
deltascope.
calc_variance
(anum, dfs)[source]¶ Calculate the variance between samples according to bin position and variance between adjacent bins
Parameters: - anum (int) – Number of bins which the arclength axis should be divided into
- dfs (dict) – Dictionary of dfs which are going to be processed
Returns: Two arrays: svar (anum,tnum) and bvar (anum*tnum,snum)
Return type: np.array
-
deltascope.
subplot_lmk
(ax, p, avg, sem, parr, xarr, tarr, dtype, Pn={'alpha': 0.3, 'cmap': 'Greys_r', 'mtc': 'r', 'tarr': None, 'wtc': 'b', 'xarr': None, 'zfb': 1, 'zln': 2, 'zpt': 3})[source]¶ Plot a ribbon of average and standard error of the mean onto the subplot, ax
Parameters: - ax (plt.Subplot) – Matplotlib subplot onto which the data should be plotted
- p (list) – List of two theta values that should be plotted
- avg (np.array) – Array of shape (xvalues,tvalues) containing the average values of the data
- sem (np.array) – Array of shape (xvalues,tvalues) containing the standard error of the mean values of the data
- parr (np.array) – Array of shape (xvalues,tvalues) containing the p values for the data
- dtype (str) – String describing sample type
- Pn – Dictionary containing the following values: ‘zln’:2,’zpt’:3,’zfb’:1,’wtc’:’b’,’mtc’:’r’,’alpha’:0.3,’cmap’:’Greys_r’
Type: dict or None
-
deltascope.
write_header
(f)[source]¶ Writes header for PSI file with columns Id,x,y,z,ac,r,theta
Parameters: f (file) – file object created by ‘open(filename,’w’)`
-
deltascope.
write_data
(filepath, df)[source]¶ Writes data in PSI format to file after writing header using
write_header()
. Closes file at the conclusion of writing data.Parameters: - filepath (str) – Complete filepath to output file
- df (pd.DataFrame) – dataframe containing columns x,y,z,ac,r,theta
-
deltascope.
read_psi
(filepath)[source]¶ Reads psi file at the given filepath and returns data in a pandas DataFrame
Parameters: filepath (str) – Complete filepath to file Returns: pd.Dataframe containing data
-
deltascope.
read_psi_to_dict
(directory, dtype)[source]¶ Read psis from directory into dictionary of dfs with filtering based on dtype
Parameters: - directory (str) – Directory to get psis from
- dtype (str) – Usually ‘AT’ or ‘ZRF1’
Returns: Dictionary of pd.DataFrame
-
deltascope.
process_sample
(num, root, outdir, name, chs, prefixes, threshold, scale, deg, primary_key, comp_order, fit_dim, flip_dim)[source]¶ Process single sample through
brain
class and saves df to csvWarning
Out of date and will probably fail
Parameters: - num (str) – Sample number
- root (str) – Complete path to the root directory for this sample set
- name (str) – Name describing this sample set
- outdir (str) – Complete path to output directory
- chs (array) – Array containing strings specifying the directories for each channel
- prefixes (array) – Array containing strings specifying the file prefix for each channel
- threshold (float) – Value between 0 and 1 to use as a cutoff for minimum pixel value
- scale (array) – Array with three values representing the constant by which to multiply x,y,z respectively
- deg (int) – Degree of the function that should be fit to the model
- primary_key (str) – Key for the primary structural channel which PCA and the model should be fit too
-
deltascope.
calculate_models
(Ldf)[source]¶ Calculate model for each dataframe in list and add to new dataframe
Parameters: Ldf (list) – List of dataframes containing aligned data Returns: pd.Dataframe with a,b,c values for parabolic model
-
deltascope.
generate_kde
(data, var, x, absv=False)[source]¶ Generate list of KDEs from either dictionary or list of data
Parameters: - data – pd.DataFrames to convert
- var (str) – Name of column to select from df
- x (array) – Array of datapoints to evaluate KDE on
- absv (bool) – (or None) Set to True to use absolute value of selected data for KDE calculation
Type: dict or list
Returns: List of KDE arrays
-
deltascope.
calculate_area_error
(pdf, Lkde, x)[source]¶ Calculate area between PDF and each kde in Lkde
Parameters: - pdf (array) – Array of probability distribution function that is the same shape as kdes in Lkde
- Lkde (list) – List of arrays of Kdes
- x (array) – Array of datapoints used to generate pdf and kdes
Returns: List of error values for each kde in Lkde
-
deltascope.
rescale_variable
(Ddfs, var, newvar)[source]¶ Rescale variable from -1 to 1 and save in newvar column on original dataframe
Parameters: - Ddfs (dict) – Dictionary of pd.DataFrames
- var (str) – Name of column to select from dfs
- newvar (str) – Name to use for new data in appended column
Returns: Dictionary of dataframes containing column of rescaled data
-
class
deltascope.
paramsClass
(path=None, dparams=None)[source]¶ A class to read and validate parameters for multiprocessing transformation. Validated parameters can be read as attributes of the object