Core API¶
The core module provides the fundamental abstractions and algorithms for k-means clustering on arbitrary metric spaces.
Abstract Base Classes¶
These classes define the interface for extending kmeanssa-ng to new metric spaces.
kmeanssa_ng.core.abstract.Point ¶
Bases: ABC
Abstract base class for points in a metric space.
A point is an element of a metric space with a fixed location. Concrete implementations must define which space the point belongs to.
kmeanssa_ng.core.abstract.Center ¶
Bases: Point
Abstract base class for cluster centers.
A center is a special type of point that can move through the space using two mechanisms: - Brownian motion: Random exploration - Drift: Directed movement toward a target point
This class is used in simulated annealing for k-means clustering.
brownian_motion
abstractmethod
¶
Perform random Brownian motion in the space.
Args: time_to_travel: Time parameter controlling the magnitude of motion. Typical distance traveled is proportional to sqrt(time_to_travel).
Source code in src/kmeanssa_ng/core/abstract.py
kmeanssa_ng.core.abstract.Space ¶
Bases: ABC
Abstract base class for metric spaces.
A metric space provides: - Distance computation between points - Sampling of random points and centers - Cluster computation and energy calculation
distance
abstractmethod
¶
Compute the distance between two points.
Args: p1: First point. p2: Second point.
Returns: The distance between p1 and p2.
sample_points
abstractmethod
¶
Sample random points from the space.
Args: n: Number of points to sample.
Returns: List of n randomly sampled points.
sample_centers
abstractmethod
¶
Sample random centers from the space.
Args: k: Number of centers to sample.
Returns: List of k randomly sampled centers.
compute_clusters
abstractmethod
¶
Assign points to their nearest center.
This method typically updates internal state or annotations indicating which cluster each point belongs to.
Args: centers: List of cluster centers.
Source code in src/kmeanssa_ng/core/abstract.py
Simulated Annealing Algorithm¶
The main clustering algorithm that works with any metric space implementation.
kmeanssa_ng.core.simulated_annealing.SimulatedAnnealing ¶
SimulatedAnnealing(
observations: list[Point],
k: int,
lambda_param: int = 1,
beta: float = 1.0,
step_size: float = 0.1,
)
Simulated annealing for offline k-means clustering.
This algorithm solves the k-means problem on arbitrary metric spaces using simulated annealing. Centers perform Brownian motion (exploration) and drift toward observations (exploitation), with temperature controlled by an inhomogeneous Poisson process.
Attributes: space: The metric space containing the observations. k: Number of clusters. observations: List of points to cluster. centers: Current cluster centers.
Example:
# Create observations and space
space = QuantumGraph(...)
points = space.sample_points(100)
# Run simulated annealing
sa = SimulatedAnnealing(points, k=5)
centers = sa.run(robust_prop=0.1, initialization="kpp")
Initialize the simulated annealing algorithm.
Args: observations: List of points to cluster, all in the same metric space. k: Number of clusters. lambda_param: Intensity parameter for Poisson process (must be > 0). beta: Inverse temperature parameter (must be > 0, higher = faster convergence). step_size: Time step for updating centers (must be > 0).
Raises: ValueError: If observations is empty, k <= 0, points are in different spaces, or hyperparameters are invalid.
Source code in src/kmeanssa_ng/core/simulated_annealing.py
run ¶
run(
robust_prop: float = 0.0,
robust_points: list[Point] | None = None,
initialization: str = "kpp",
algorithm_version: str = "v1",
) -> list[Center]
Run the simulated annealing algorithm.
Args: robust_prop: Proportion of final iterations for robustification (0 to 1). The best centers from this period are returned. robust_points: Optional dataset for computing energy during robustification. Defaults to the observations. initialization: Initialization method ("kpp" for k-means++, "random" otherwise). algorithm_version: Algorithm variant ("v1" or "v2"). v1 interleaves drift and brownian motion, v2 performs all brownian motion first.
Returns: List of k robust cluster centers.
Raises: ValueError: If robust_prop not in [0, 1] or algorithm_version invalid.