clusttraj package

Submodules

clusttraj.classify module

Functions to perform clustering based on the RMSD matrix.

clusttraj.classify.classify_structures(clust_opt: ClustOptions, distmat: ndarray) → Tuple[ndarray, ndarray]

Classify structures based on clustering options and RMSD matrix.

Parameters:

clust_opt – The clustering options.
distmat – The RMSD matrix.

Returns:

A tuple containing the linkage matrix and the clusters.

clusttraj.classify.classify_structures_silhouette(clust_opt: ClustOptions, distmat: ndarray, dstep: float = 0.1) → Tuple[ndarray, ndarray]

Find the optimal threshold following the silhouette score metric and perform the classification.

Parameters:

clust_opt – The clustering options.
distmat – The RMSD matrix.
dstep (float, optional) – Interval between threshold values, defaults to 0.1

Returns:

A tuple containing the linkage matrix and the clusters.

clusttraj.classify.find_medoids_from_clusters(distmat: ndarray, clusters: ndarray) → ndarray

Find the medoids of the clusters.

Parameters:

distmat – The RMSD matrix.
clusters – The clusters.

Returns:

The indices of the medoids.

clusttraj.classify.sum_distmat(distmat: ndarray) → ndarray

Sum the RMSD matrix.

Parameters:: distmat – The RMSD matrix.
Returns:: The sum of the RMSD matrix.

clusttraj.distmat module

Functions to compute the RMSD matrix based on the provided trajectory.

clusttraj.distmat.build_distance_matrix(clust_opt: ClustOptions) → ndarray

Compute the RMSD matrix.

Parameters:: clust_opt (ClustOptions) – The options for clustering.
Returns:: The computed RMSD matrix.
Return type:: np.ndarray

clusttraj.distmat.compute_distmat_line(idx1: int, q_info: tuple, trajfile: str, noh: bool, reorder: Callable[[ndarray, ndarray, ndarray, ndarray], ndarray] | None, reorder_solvent_only: bool, nsatoms: int, weight_solute: float, reorderexcl: ndarray, final_kabsch: bool) → List[float]

Compute the distance between molecule idx1 and molecules with idx2 > idx1.

Parameters:

idx1 (int) – The index of the first molecule.
q_info (tuple) – Tuple containing the atom and all information of the first molecule.
trajfile (str) – The path to the trajectory file.
noh (bool) – Whether to consider hydrogen atoms or not.
reorder (Union[Callable[[np.ndarray, np.ndarray, np.ndarray, np.ndarray], np.ndarray], None]) – A function to reorder the atoms, if necessary.
nsatoms (int) – The number of atoms in the solute.
reorderexcl (np.ndarray) – The array defining the excluded atoms during reordering.
final_kabsch (bool) – Whether to perform the final Kabsch rotation or not.

Returns:

The RMSD matrix.

Return type:

List[float]

clusttraj.distmat.get_distmat(clust_opt: ClustOptions) → ndarray

Calculate or read a condensed RMSD matrix based on the given clustering options.

Parameters:: clust_opt (ClustOptions) – The clustering options.
Returns:: The condensed RMSD matrix.
Return type:: np.ndarray

clusttraj.io module

Input parsing, output information and a class to store the options for clustering.

Bases: object

Class to store the options for clustering.

dendrogram_name: str = None

distmat_name: str = None

evo_name: str = None

exclusions: bool = None

final_kabsch: bool = None

input_distmat: bool = None

mds_name: str = None

method: str = None

metrics: bool = None

min_rmsd: float = None

n_workers: int = None

no_hydrogen: bool = None

opt_order: bool = None

optimal_cut: ndarray = None

out_clust_name: str = None

out_conf_fmt: str = None

out_conf_name: str = None

overwrite: bool = None

plot: bool = None

reorder: bool = None

reorder_alg: Callable[[ndarray, ndarray, ndarray, ndarray], ndarray] = None

reorder_alg_name: str = None

reorder_excl: ndarray = None

reorder_solvent_only: bool = None

save_confs: bool = None

silhouette_score: bool = None

solute_natoms: int = None

summary_name: str = None

trajfile: str = None

update(new: dict) → None

Update the instance with new values.

Parameters:: new (dict) – A dictionary containing the new values to update.
Returns:: None

verbose: bool = None

weight_solute: float = None

class clusttraj.io.Logger

Bases: object

Logger class.

formatter = <logging.Formatter object>

logformat = '%(asctime)s %(levelname)-8s [%(filename)s:%(lineno)d] <%(funcName)s> %(message)s'

logger = <Logger clusttraj.io (WARNING)>

classmethod setup(logfile: str) → None

Set up the logger.

Parameters:: logfile (str) – The path to the log file.
Returns:: None

clusttraj.io.check_positive(value: str) → int

Check if the given value is a positive integer.

Parameters:: value (str) – The value to be checked.
Raises:: argparse.ArgumentTypeError – If the value is not a positive integer.
Returns:: The converted positive integer value.
Return type:: int

clusttraj.io.configure_runtime(args_in: List[str]) → ClustOptions

Configure the runtime based on command line arguments.

Parameters:: args_in (List[str]) – The command line arguments.
Returns:: The parsed command line arguments.
Return type:: argparse.Namespace

clusttraj.io.extant_file(x: str) → str

Check if a file exists.

Parameters:: x (str) – The file path to check.
Raises:: argparse.ArgumentTypeError – If the file does not exist.
Returns:: The input file path if it exists.
Return type:: str

clusttraj.io.parse_args(args: Namespace) → ClustOptions

Parse all the information from the argument parser, storing in the ClustOptions class.

Define file names and set the pointers to the correct functions.

Parameters:: args (Namespace) – The arguments parsed from the argument parser.
Returns:: An instance of the ClustOptions class with the parsed options.
Return type:: ClustOptions

clusttraj.io.save_clusters_config(trajfile: str, clusters: ndarray, distmat: ndarray, noh: bool, reorder: Callable[[ndarray, ndarray, ndarray, ndarray], ndarray] | None, reorder_solvent_only: bool, nsatoms: int, weight_solute: float, outbasename: str, outfmt: str, reorderexcl: ndarray, final_kabsch: bool, overwrite: bool) → None

Save best superpositioned configurations for each cluster. First configuration is the medoid.

Parameters:

trajfile – The trajectory file path.
clusters – An array containing cluster labels.
distmat – The RMSD matrix.
noh – Flag indicating whether to exclude hydrogen atoms.
reorder (Union[Callable[[np.ndarray, np.ndarray, np.ndarray, np.ndarray], np.ndarray], None]) – A function to reorder the atoms, if necessary.
nsatoms – The number of atoms in the solute.
outbasename – The base name for the output files.
outfmt – The output file format.
reorderexcl – An array of atom indices to exclude during reordering.
final_kabsch – Flag indicating whether to perform a final Kabsch rotation.
overwrite – Flag indicating whether to overwrite existing output files.

Returns:

None

clusttraj.main module

Main entry point for clusttraj.

Can be called from command line or from an external library given a list of arguments.

clusttraj.main.main(args: List[str] | None = None) → None

Main function that performs clustering and generates output.

Parameters:: args (List) – List of command-line arguments. Defaults to None.
Returns:: None

clusttraj.plot module

Functions to plot the obtained results.

clusttraj.plot.plot_clust_evo(clust_opt: ClustOptions, clusters: ndarray) → None

Plot the evolution of cluster classification over the given samples.

Parameters:

clust_opt (ClustOptions) – The clustering options.
clusters (np.ndarray) – The cluster classifications for each sample.

Returns:

None

clusttraj.plot.plot_dendrogram(clust_opt: ClustOptions, clusters: ndarray, Z: ndarray) → None

Plot a dendrogram based on hierarchical clustering.

Parameters:

clust_opt (ClustOptions) – The options for clustering.
clusters (np.ndarray) – The cluster labels.
Z (np.ndarray) – The linkage matrix.

Returns:

None

clusttraj.plot.plot_mds(clust_opt: ClustOptions, clusters: ndarray, distmat: ndarray) → None

Plot the multidimensional scaling (MDS) of the RMSD matrix.

Parameters:

clust_opt (ClustOptions) – The clustering options.
clusters (np.ndarray) – The cluster labels.
distmat (np.ndarray) – The RMSD matrix.

Returns:

None

clusttraj.plot.plot_tsne(clust_opt: ClustOptions, clusters: ndarray, distmat: ndarray) → None

Plot the t-distributed Stochastic Neighbor Embedding 2D plot of the clustering.

Parameters:

clust_opt (ClustOptions) – The clustering options.
clusters (np.ndarray) – The cluster labels.
distmat (np.ndarray) – The RMSD matrix.

Returns:

None

clusttraj.utils module

Additional utility functions.

clusttraj.utils.get_mol_coords(mol: Molecule) → ndarray

Get the coordinates of all atoms in a molecule.

Parameters:: mol (pybel.Molecule) – The molecule object.
Returns:: The array of atom coordinates.
Return type:: np.ndarray

clusttraj.utils.get_mol_info(mol: Molecule) → Tuple[ndarray, ndarray]

Get the atomic numbers and coordinates of all atoms in a molecule.

Parameters:: mol (pybel.Molecule) – The molecule object.
Returns:: The array of atomic numbers and the array of atom coordinates.
Return type:: Tuple[np.ndarray, np.ndarray]

Module contents

clusttraj.main(args: List[str] | None = None) → None

Main function that performs clustering and generates output.

Parameters:: args (List) – List of command-line arguments. Defaults to None.
Returns:: None