clusttraj package

Submodules

clusttraj.classify module

Functions to perform clustering based on the RMSD matrix.

clusttraj.classify.classify_structures(clust_opt: ClustOptions, distmat: ndarray) Tuple[ndarray, ndarray]

Classify structures based on clustering options and RMSD matrix.

Parameters:
  • clust_opt – The clustering options.

  • distmat – The RMSD matrix.

Returns:

A tuple containing the linkage matrix and the clusters.

clusttraj.classify.classify_structures_silhouette(clust_opt: ClustOptions, distmat: ndarray, dstep: float = 0.1) Tuple[ndarray, ndarray]

Find the optimal threshold following the silhouette score metric and perform the classification.

Parameters:
  • clust_opt – The clustering options.

  • distmat – The RMSD matrix.

  • dstep (float, optional) – Interval between threshold values, defaults to 0.1

Returns:

A tuple containing the linkage matrix and the clusters.

clusttraj.classify.find_medoids_from_clusters(distmat: ndarray, clusters: ndarray) ndarray

Find the medoids of the clusters.

Parameters:
  • distmat – The RMSD matrix.

  • clusters – The clusters.

Returns:

The indices of the medoids.

clusttraj.classify.sum_distmat(distmat: ndarray) ndarray

Sum the RMSD matrix.

Parameters:

distmat – The RMSD matrix.

Returns:

The sum of the RMSD matrix.

clusttraj.distmat module

Functions to compute the RMSD matrix based on the provided trajectory.

clusttraj.distmat.build_distance_matrix(clust_opt: ClustOptions) ndarray

Compute the RMSD matrix.

Parameters:

clust_opt (ClustOptions) – The options for clustering.

Returns:

The computed RMSD matrix.

Return type:

np.ndarray

clusttraj.distmat.compute_distmat_line(idx1: int, q_info: tuple, trajfile: str, noh: bool, reorder: Callable[[ndarray, ndarray, ndarray, ndarray], ndarray] | None, reorder_solvent_only: bool, nsatoms: int, weight_solute: float, reorderexcl: ndarray, final_kabsch: bool) List[float]

Compute the distance between molecule idx1 and molecules with idx2 > idx1.

Parameters:
  • idx1 (int) – The index of the first molecule.

  • q_info (tuple) – Tuple containing the atom and all information of the first molecule.

  • trajfile (str) – The path to the trajectory file.

  • noh (bool) – Whether to consider hydrogen atoms or not.

  • reorder (Union[Callable[[np.ndarray, np.ndarray, np.ndarray, np.ndarray], np.ndarray], None]) – A function to reorder the atoms, if necessary.

  • nsatoms (int) – The number of atoms in the solute.

  • reorderexcl (np.ndarray) – The array defining the excluded atoms during reordering.

  • final_kabsch (bool) – Whether to perform the final Kabsch rotation or not.

Returns:

The RMSD matrix.

Return type:

List[float]

clusttraj.distmat.get_distmat(clust_opt: ClustOptions) ndarray

Calculate or read a condensed RMSD matrix based on the given clustering options.

Parameters:

clust_opt (ClustOptions) – The clustering options.

Returns:

The condensed RMSD matrix.

Return type:

np.ndarray

clusttraj.io module

Input parsing, output information and a class to store the options for clustering.

class clusttraj.io.ClustOptions(trajfile: str | None = None, min_rmsd: float | None = None, n_workers: int | None = None, method: str | None = None, reorder_alg_name: str | None = None, reorder_alg: Callable[[ndarray, ndarray, ndarray, ndarray], ndarray] | None = None, out_conf_fmt: str | None = None, reorder: bool | None = None, reorder_solvent_only: bool | None = None, exclusions: bool | None = None, no_hydrogen: bool | None = None, input_distmat: bool | None = None, save_confs: bool | None = None, plot: bool | None = None, opt_order: bool | None = None, overwrite: bool | None = None, final_kabsch: bool | None = None, silhouette_score: bool | None = None, metrics: bool | None = None, distmat_name: str | None = None, out_clust_name: str | None = None, evo_name: str | None = None, mds_name: str | None = None, dendrogram_name: str | None = None, out_conf_name: str | None = None, summary_name: str | None = None, solute_natoms: int | None = None, weight_solute: float | None = None, reorder_excl: ndarray | None = None, optimal_cut: ndarray | None = None, verbose: bool | None = None)

Bases: object

Class to store the options for clustering.

dendrogram_name: str = None
distmat_name: str = None
evo_name: str = None
exclusions: bool = None
final_kabsch: bool = None
input_distmat: bool = None
mds_name: str = None
method: str = None
metrics: bool = None
min_rmsd: float = None
n_workers: int = None
no_hydrogen: bool = None
opt_order: bool = None
optimal_cut: ndarray = None
out_clust_name: str = None
out_conf_fmt: str = None
out_conf_name: str = None
overwrite: bool = None
plot: bool = None
reorder: bool = None
reorder_alg: Callable[[ndarray, ndarray, ndarray, ndarray], ndarray] = None
reorder_alg_name: str = None
reorder_excl: ndarray = None
reorder_solvent_only: bool = None
save_confs: bool = None
silhouette_score: bool = None
solute_natoms: int = None
summary_name: str = None
trajfile: str = None
update(new: dict) None

Update the instance with new values.

Parameters:

new (dict) – A dictionary containing the new values to update.

Returns:

None

verbose: bool = None
weight_solute: float = None
class clusttraj.io.Logger

Bases: object

Logger class.

formatter = <logging.Formatter object>
logformat = '%(asctime)s %(levelname)-8s [%(filename)s:%(lineno)d] <%(funcName)s> %(message)s'
logger = <Logger clusttraj.io (WARNING)>
classmethod setup(logfile: str) None

Set up the logger.

Parameters:

logfile (str) – The path to the log file.

Returns:

None

clusttraj.io.check_positive(value: str) int

Check if the given value is a positive integer.

Parameters:

value (str) – The value to be checked.

Raises:

argparse.ArgumentTypeError – If the value is not a positive integer.

Returns:

The converted positive integer value.

Return type:

int

clusttraj.io.configure_runtime(args_in: List[str]) ClustOptions

Configure the runtime based on command line arguments.

Parameters:

args_in (List[str]) – The command line arguments.

Returns:

The parsed command line arguments.

Return type:

argparse.Namespace

clusttraj.io.extant_file(x: str) str

Check if a file exists.

Parameters:

x (str) – The file path to check.

Raises:

argparse.ArgumentTypeError – If the file does not exist.

Returns:

The input file path if it exists.

Return type:

str

clusttraj.io.parse_args(args: Namespace) ClustOptions

Parse all the information from the argument parser, storing in the ClustOptions class.

Define file names and set the pointers to the correct functions.

Parameters:

args (Namespace) – The arguments parsed from the argument parser.

Returns:

An instance of the ClustOptions class with the parsed options.

Return type:

ClustOptions

clusttraj.io.save_clusters_config(trajfile: str, clusters: ndarray, distmat: ndarray, noh: bool, reorder: Callable[[ndarray, ndarray, ndarray, ndarray], ndarray] | None, reorder_solvent_only: bool, nsatoms: int, weight_solute: float, outbasename: str, outfmt: str, reorderexcl: ndarray, final_kabsch: bool, overwrite: bool) None

Save best superpositioned configurations for each cluster. First configuration is the medoid.

Parameters:
  • trajfile – The trajectory file path.

  • clusters – An array containing cluster labels.

  • distmat – The RMSD matrix.

  • noh – Flag indicating whether to exclude hydrogen atoms.

  • reorder (Union[Callable[[np.ndarray, np.ndarray, np.ndarray, np.ndarray], np.ndarray], None]) – A function to reorder the atoms, if necessary.

  • nsatoms – The number of atoms in the solute.

  • outbasename – The base name for the output files.

  • outfmt – The output file format.

  • reorderexcl – An array of atom indices to exclude during reordering.

  • final_kabsch – Flag indicating whether to perform a final Kabsch rotation.

  • overwrite – Flag indicating whether to overwrite existing output files.

Returns:

None

clusttraj.main module

Main entry point for clusttraj.

Can be called from command line or from an external library given a list of arguments.

clusttraj.main.main(args: List[str] | None = None) None

Main function that performs clustering and generates output.

Parameters:

args (List) – List of command-line arguments. Defaults to None.

Returns:

None

clusttraj.plot module

Functions to plot the obtained results.

clusttraj.plot.plot_clust_evo(clust_opt: ClustOptions, clusters: ndarray) None

Plot the evolution of cluster classification over the given samples.

Parameters:
  • clust_opt (ClustOptions) – The clustering options.

  • clusters (np.ndarray) – The cluster classifications for each sample.

Returns:

None

clusttraj.plot.plot_dendrogram(clust_opt: ClustOptions, clusters: ndarray, Z: ndarray) None

Plot a dendrogram based on hierarchical clustering.

Parameters:
  • clust_opt (ClustOptions) – The options for clustering.

  • clusters (np.ndarray) – The cluster labels.

  • Z (np.ndarray) – The linkage matrix.

Returns:

None

clusttraj.plot.plot_mds(clust_opt: ClustOptions, clusters: ndarray, distmat: ndarray) None

Plot the multidimensional scaling (MDS) of the RMSD matrix.

Parameters:
  • clust_opt (ClustOptions) – The clustering options.

  • clusters (np.ndarray) – The cluster labels.

  • distmat (np.ndarray) – The RMSD matrix.

Returns:

None

clusttraj.plot.plot_tsne(clust_opt: ClustOptions, clusters: ndarray, distmat: ndarray) None

Plot the t-distributed Stochastic Neighbor Embedding 2D plot of the clustering.

Parameters:
  • clust_opt (ClustOptions) – The clustering options.

  • clusters (np.ndarray) – The cluster labels.

  • distmat (np.ndarray) – The RMSD matrix.

Returns:

None

clusttraj.utils module

Additional utility functions.

clusttraj.utils.get_mol_coords(mol: Molecule) ndarray

Get the coordinates of all atoms in a molecule.

Parameters:

mol (pybel.Molecule) – The molecule object.

Returns:

The array of atom coordinates.

Return type:

np.ndarray

clusttraj.utils.get_mol_info(mol: Molecule) Tuple[ndarray, ndarray]

Get the atomic numbers and coordinates of all atoms in a molecule.

Parameters:

mol (pybel.Molecule) – The molecule object.

Returns:

The array of atomic numbers and the array of atom coordinates.

Return type:

Tuple[np.ndarray, np.ndarray]

Module contents

clusttraj.main(args: List[str] | None = None) None

Main function that performs clustering and generates output.

Parameters:

args (List) – List of command-line arguments. Defaults to None.

Returns:

None