Function Reference#
- class hotspot.Hotspot(adata, layer_key=None, model='danb', latent_obsm_key=None, distances_obsp_key=None, tree=None, umi_counts_obs_key=None)[source]#
Initialize a Hotspot object for analysis
Either latent or tree or distances is required.
- Parameters:
adata (anndata.AnnData) – Count matrix (shape is cells by genes)
layer_key (str) – Key in adata.layers with count data, uses adata.X if None.
model (string, optional) –
Specifies the null model to use for gene expression. Valid choices are:
’danb’: Depth-Adjusted Negative Binomial
’bernoulli’: Models probability of detection
’normal’: Depth-Adjusted Normal
’none’: Assumes data has been pre-standardized
latent_obsm_key (string, optional) – Latent space encoding cell-cell similarities with euclidean distances. Shape is (cells x dims). Input is key in adata.obsm
distances_obsp_key (pandas.DataFrame, optional) – Distances encoding cell-cell similarities directly Shape is (cells x cells). Input is key in adata.obsp
tree (ete3.coretype.tree.TreeNode) – Root tree node. Can be created using ete3.Tree
umi_counts_obs_key (str) – Total umi count per cell. Used as a size factor. If omitted, the sum over genes in the counts matrix is used
- classmethod legacy_init(counts, model='danb', latent=None, distances=None, tree=None, umi_counts=None)[source]#
Initialize a Hotspot object for analysis using legacy method
Either latent or tree or distances is required.
- Parameters:
counts (pandas.DataFrame) – Count matrix (shape is genes x cells)
model (string, optional) –
Specifies the null model to use for gene expression. Valid choices are:
’danb’: Depth-Adjusted Negative Binomial
’bernoulli’: Models probability of detection
’normal’: Depth-Adjusted Normal
’none’: Assumes data has been pre-standardized
latent (pandas.DataFrame, optional) – Latent space encoding cell-cell similarities with euclidean distances. Shape is (cells x dims)
distances (pandas.DataFrame, optional) – Distances encoding cell-cell similarities directly Shape is (cells x cells)
tree (ete3.coretype.tree.TreeNode) – Root tree node. Can be created using ete3.Tree
umi_counts (pandas.Series, optional) – Total umi count per cell. Used as a size factor. If omitted, the sum over genes in the counts matrix is used
Examples
>>> gene_exp = pd.read_csv(path, index_col=0) # genes by cells >>> latent = pd.read_csv(latent_path, index_col=0) # cells by dims >>> hs = hotspot.Hotspot.legacy_init(gene_exp, model="normal", latent=latent)
- create_knn_graph(weighted_graph=False, n_neighbors=30, neighborhood_factor=3, approx_neighbors=True)[source]#
Create’s the KNN graph and graph weights
The resulting matrices containing the neighbors and weights are stored in the object at self.neighbors and self.weights
- Parameters:
weighted_graph (bool) – Whether or not to create a weighted graph
n_neighbors (int) – Neighborhood size
neighborhood_factor (float) – Used when creating a weighted graph. Sets how quickly weights decay relative to the distances within the neighborhood. The weight for a cell with a distance d will decay as exp(-d/D) where D is the distance to the n_neighbors/neighborhood_factor-th neighbor.
approx_neighbors (bool) – Use approximate nearest neighbors or exact scikit-learn neighbors. Only when hotspot initialized with latent.
- compute_autocorrelations(jobs=1)[source]#
Perform feature selection using local autocorrelation
In addition to returning output, this also stores the output in self.results
- Parameters:
jobs (int) – Number of parallel jobs to run
- Returns:
results –
A dataframe with four columns:
C: Scaled -1:1 autocorrelation coeficients
Z: Z-score for autocorrelation
Pval: P-values computed from Z-scores
FDR: Q-values using the Benjamini-Hochberg procedure
Gene ids are in the index
- Return type:
pandas.DataFrame
- compute_local_correlations(genes, jobs=1)[source]#
Define gene-gene relationships with pair-wise local correlations
In addition to returning output, this method stores its result in self.local_correlation_z
- Parameters:
genes (iterable of str) – gene identifies to compute local correlations on should be a smaller subset of all genes
jobs (int) – Number of parallel jobs to run
- Returns:
local_correlation_z – local correlation Z-scores between genes shape is genes x genes
- Return type:
pd.Dataframe
- create_modules(min_gene_threshold=20, core_only=True, fdr_threshold=0.05)[source]#
Groups genes into modules
In addition to being returned, the results of this method are retained in the object at self.modules. Additionally, the linkage matrix (in the same form as that of scipy.cluster.hierarchy.linkage) is saved in self.linkage for plotting or manual clustering.
- Parameters:
min_gene_threshold (int) – Controls how small modules can be. Increase if there are too many modules being formed. Decrease if substructre is not being captured
core_only (bool) – Whether or not to assign ambiguous genes to a module or leave unassigned
fdr_threshold (float) – Correlation theshold at which to stop assigning genes to modules
- Returns:
modules – Maps gene to module number. Unassigned genes are indicated with -1
- Return type:
pandas.Series
- calculate_module_scores()[source]#
Calculate Module Scores
In addition to returning its result, this method stores its output in the object at self.module_scores
- Returns:
module_scores – Scores for each module for each gene Dimensions are genes x modules
- Return type:
pandas.DataFrame
- plot_local_correlations(mod_cmap='tab10', vmin=-8, vmax=8, z_cmap='RdBu_r', yticklabels=False)[source]#
Plots a clustergrid of the local correlation values
- Parameters:
mod_cmap (valid matplotlib colormap str or object) – discrete colormap for module assignments on the left side
vmin (float) – minimum value for colorscale for Z-scores
vmax (float) – maximum value for colorscale for Z-scores
z_cmap (valid matplotlib colormap str or object) – continuous colormap for correlation Z-scores
yticklabels (bool) – Whether or not to plot all gene labels Default is false as there are too many. However if using this plot interactively you may with to set to true so you can zoom in and read gene names