clusttraj package
Submodules
clusttraj.classify module
Functions to perform clustering based on the RMSD matrix.
- clusttraj.classify.classify_structures(clust_opt: ClustOptions, distmat: ndarray) Tuple[ndarray, ndarray]
Classify structures based on clustering options and RMSD matrix.
- Parameters:
clust_opt – The clustering options.
distmat – The RMSD matrix.
- Returns:
A tuple containing the linkage matrix and the clusters.
- clusttraj.classify.classify_structures_silhouette(clust_opt: ClustOptions, distmat: ndarray, dstep: float = 0.1) Tuple[ndarray, ndarray]
Find the optimal threshold following the silhouette score metric and perform the classification.
- Parameters:
clust_opt – The clustering options.
distmat – The RMSD matrix.
dstep (float, optional) – Interval between threshold values, defaults to 0.1
- Returns:
A tuple containing the linkage matrix and the clusters.
- clusttraj.classify.find_medoids_from_clusters(distmat: ndarray, clusters: ndarray) ndarray
Find the medoids of the clusters.
- Parameters:
distmat – The RMSD matrix.
clusters – The clusters.
- Returns:
The indices of the medoids.
- clusttraj.classify.sum_distmat(distmat: ndarray) ndarray
Sum the RMSD matrix.
- Parameters:
distmat – The RMSD matrix.
- Returns:
The sum of the RMSD matrix.
clusttraj.distmat module
Functions to compute the RMSD matrix based on the provided trajectory.
- clusttraj.distmat.build_distance_matrix(clust_opt: ClustOptions) ndarray
Compute the RMSD matrix.
- Parameters:
clust_opt (ClustOptions) – The options for clustering.
- Returns:
The computed RMSD matrix.
- Return type:
np.ndarray
- clusttraj.distmat.compute_distmat_line(idx1: int, q_info: tuple, trajfile: str, noh: bool, reorder: Callable[[ndarray, ndarray, ndarray, ndarray], ndarray] | None, reorder_solvent_only: bool, nsatoms: int, weight_solute: float, reorderexcl: ndarray, final_kabsch: bool) List[float]
Compute the distance between molecule idx1 and molecules with idx2 > idx1.
- Parameters:
idx1 (int) – The index of the first molecule.
q_info (tuple) – Tuple containing the atom and all information of the first molecule.
trajfile (str) – The path to the trajectory file.
noh (bool) – Whether to consider hydrogen atoms or not.
reorder (Union[Callable[[np.ndarray, np.ndarray, np.ndarray, np.ndarray], np.ndarray], None]) – A function to reorder the atoms, if necessary.
nsatoms (int) – The number of atoms in the solute.
reorderexcl (np.ndarray) – The array defining the excluded atoms during reordering.
final_kabsch (bool) – Whether to perform the final Kabsch rotation or not.
- Returns:
The RMSD matrix.
- Return type:
List[float]
- clusttraj.distmat.get_distmat(clust_opt: ClustOptions) ndarray
Calculate or read a condensed RMSD matrix based on the given clustering options.
- Parameters:
clust_opt (ClustOptions) – The clustering options.
- Returns:
The condensed RMSD matrix.
- Return type:
np.ndarray
clusttraj.io module
Input parsing, output information and a class to store the options for clustering.
- class clusttraj.io.ClustOptions(trajfile: str | None = None, min_rmsd: float | None = None, n_workers: int | None = None, method: str | None = None, reorder_alg_name: str | None = None, reorder_alg: Callable[[ndarray, ndarray, ndarray, ndarray], ndarray] | None = None, out_conf_fmt: str | None = None, reorder: bool | None = None, reorder_solvent_only: bool | None = None, exclusions: bool | None = None, no_hydrogen: bool | None = None, input_distmat: bool | None = None, save_confs: bool | None = None, plot: bool | None = None, opt_order: bool | None = None, overwrite: bool | None = None, final_kabsch: bool | None = None, silhouette_score: bool | None = None, metrics: bool | None = None, distmat_name: str | None = None, out_clust_name: str | None = None, evo_name: str | None = None, mds_name: str | None = None, dendrogram_name: str | None = None, out_conf_name: str | None = None, summary_name: str | None = None, solute_natoms: int | None = None, weight_solute: float | None = None, reorder_excl: ndarray | None = None, optimal_cut: ndarray | None = None, verbose: bool | None = None)
Bases:
object
Class to store the options for clustering.
- dendrogram_name: str = None
- distmat_name: str = None
- evo_name: str = None
- exclusions: bool = None
- final_kabsch: bool = None
- input_distmat: bool = None
- mds_name: str = None
- method: str = None
- metrics: bool = None
- min_rmsd: float = None
- n_workers: int = None
- no_hydrogen: bool = None
- opt_order: bool = None
- optimal_cut: ndarray = None
- out_clust_name: str = None
- out_conf_fmt: str = None
- out_conf_name: str = None
- overwrite: bool = None
- plot: bool = None
- reorder: bool = None
- reorder_alg: Callable[[ndarray, ndarray, ndarray, ndarray], ndarray] = None
- reorder_alg_name: str = None
- reorder_excl: ndarray = None
- reorder_solvent_only: bool = None
- save_confs: bool = None
- silhouette_score: bool = None
- solute_natoms: int = None
- summary_name: str = None
- trajfile: str = None
- update(new: dict) None
Update the instance with new values.
- Parameters:
new (dict) – A dictionary containing the new values to update.
- Returns:
None
- verbose: bool = None
- weight_solute: float = None
- class clusttraj.io.Logger
Bases:
object
Logger class.
- formatter = <logging.Formatter object>
- logformat = '%(asctime)s %(levelname)-8s [%(filename)s:%(lineno)d] <%(funcName)s> %(message)s'
- logger = <Logger clusttraj.io (WARNING)>
- classmethod setup(logfile: str) None
Set up the logger.
- Parameters:
logfile (str) – The path to the log file.
- Returns:
None
- clusttraj.io.check_positive(value: str) int
Check if the given value is a positive integer.
- Parameters:
value (str) – The value to be checked.
- Raises:
argparse.ArgumentTypeError – If the value is not a positive integer.
- Returns:
The converted positive integer value.
- Return type:
int
- clusttraj.io.configure_runtime(args_in: List[str]) ClustOptions
Configure the runtime based on command line arguments.
- Parameters:
args_in (List[str]) – The command line arguments.
- Returns:
The parsed command line arguments.
- Return type:
argparse.Namespace
- clusttraj.io.extant_file(x: str) str
Check if a file exists.
- Parameters:
x (str) – The file path to check.
- Raises:
argparse.ArgumentTypeError – If the file does not exist.
- Returns:
The input file path if it exists.
- Return type:
str
- clusttraj.io.parse_args(args: Namespace) ClustOptions
Parse all the information from the argument parser, storing in the ClustOptions class.
Define file names and set the pointers to the correct functions.
- Parameters:
args (Namespace) – The arguments parsed from the argument parser.
- Returns:
An instance of the ClustOptions class with the parsed options.
- Return type:
- clusttraj.io.save_clusters_config(trajfile: str, clusters: ndarray, distmat: ndarray, noh: bool, reorder: Callable[[ndarray, ndarray, ndarray, ndarray], ndarray] | None, reorder_solvent_only: bool, nsatoms: int, weight_solute: float, outbasename: str, outfmt: str, reorderexcl: ndarray, final_kabsch: bool, overwrite: bool) None
Save best superpositioned configurations for each cluster. First configuration is the medoid.
- Parameters:
trajfile – The trajectory file path.
clusters – An array containing cluster labels.
distmat – The RMSD matrix.
noh – Flag indicating whether to exclude hydrogen atoms.
reorder (Union[Callable[[np.ndarray, np.ndarray, np.ndarray, np.ndarray], np.ndarray], None]) – A function to reorder the atoms, if necessary.
nsatoms – The number of atoms in the solute.
outbasename – The base name for the output files.
outfmt – The output file format.
reorderexcl – An array of atom indices to exclude during reordering.
final_kabsch – Flag indicating whether to perform a final Kabsch rotation.
overwrite – Flag indicating whether to overwrite existing output files.
- Returns:
None
clusttraj.main module
Main entry point for clusttraj.
Can be called from command line or from an external library given a list of arguments.
- clusttraj.main.main(args: List[str] | None = None) None
Main function that performs clustering and generates output.
- Parameters:
args (List) – List of command-line arguments. Defaults to None.
- Returns:
None
clusttraj.plot module
Functions to plot the obtained results.
- clusttraj.plot.plot_clust_evo(clust_opt: ClustOptions, clusters: ndarray) None
Plot the evolution of cluster classification over the given samples.
- Parameters:
clust_opt (ClustOptions) – The clustering options.
clusters (np.ndarray) – The cluster classifications for each sample.
- Returns:
None
- clusttraj.plot.plot_dendrogram(clust_opt: ClustOptions, clusters: ndarray, Z: ndarray) None
Plot a dendrogram based on hierarchical clustering.
- Parameters:
clust_opt (ClustOptions) – The options for clustering.
clusters (np.ndarray) – The cluster labels.
Z (np.ndarray) – The linkage matrix.
- Returns:
None
- clusttraj.plot.plot_mds(clust_opt: ClustOptions, clusters: ndarray, distmat: ndarray) None
Plot the multidimensional scaling (MDS) of the RMSD matrix.
- Parameters:
clust_opt (ClustOptions) – The clustering options.
clusters (np.ndarray) – The cluster labels.
distmat (np.ndarray) – The RMSD matrix.
- Returns:
None
- clusttraj.plot.plot_tsne(clust_opt: ClustOptions, clusters: ndarray, distmat: ndarray) None
Plot the t-distributed Stochastic Neighbor Embedding 2D plot of the clustering.
- Parameters:
clust_opt (ClustOptions) – The clustering options.
clusters (np.ndarray) – The cluster labels.
distmat (np.ndarray) – The RMSD matrix.
- Returns:
None
clusttraj.utils module
Additional utility functions.
- clusttraj.utils.get_mol_coords(mol: Molecule) ndarray
Get the coordinates of all atoms in a molecule.
- Parameters:
mol (pybel.Molecule) – The molecule object.
- Returns:
The array of atom coordinates.
- Return type:
np.ndarray
- clusttraj.utils.get_mol_info(mol: Molecule) Tuple[ndarray, ndarray]
Get the atomic numbers and coordinates of all atoms in a molecule.
- Parameters:
mol (pybel.Molecule) – The molecule object.
- Returns:
The array of atomic numbers and the array of atom coordinates.
- Return type:
Tuple[np.ndarray, np.ndarray]
Module contents
- clusttraj.main(args: List[str] | None = None) None
Main function that performs clustering and generates output.
- Parameters:
args (List) – List of command-line arguments. Defaults to None.
- Returns:
None