Skip to content

Core API

The core module provides the fundamental abstractions and algorithms for k-means clustering on arbitrary metric spaces.

Abstract Base Classes

These classes define the interface for extending kmeanssa-ng to new metric spaces.

kmeanssa_ng.core.abstract.Point

Bases: ABC

Abstract base class for points in a metric space.

A point is an element of a metric space with a fixed location. Concrete implementations must define which space the point belongs to.

kmeanssa_ng.core.abstract.Center

Bases: Point

Abstract base class for cluster centers.

A center is a special type of point that can move through the space using two mechanisms: - Brownian motion: Random exploration - Drift: Directed movement toward a target point

This class is used in simulated annealing for k-means clustering.

brownian_motion abstractmethod

brownian_motion(time_to_travel: float) -> None

Perform random Brownian motion in the space.

Args: time_to_travel: Time parameter controlling the magnitude of motion. Typical distance traveled is proportional to sqrt(time_to_travel).

Source code in src/kmeanssa_ng/core/abstract.py
@abstractmethod
def brownian_motion(self, time_to_travel: float) -> None:
    """Perform random Brownian motion in the space.

    Args:
        time_to_travel: Time parameter controlling the magnitude of motion.
            Typical distance traveled is proportional to sqrt(time_to_travel).
    """
    pass

kmeanssa_ng.core.abstract.Space

Bases: ABC

Abstract base class for metric spaces.

A metric space provides: - Distance computation between points - Sampling of random points and centers - Cluster computation and energy calculation

distance abstractmethod

distance(p1: Point, p2: Point) -> float

Compute the distance between two points.

Args: p1: First point. p2: Second point.

Returns: The distance between p1 and p2.

Source code in src/kmeanssa_ng/core/abstract.py
@abstractmethod
def distance(self, p1: Point, p2: Point) -> float:
    """Compute the distance between two points.

    Args:
        p1: First point.
        p2: Second point.

    Returns:
        The distance between p1 and p2.
    """
    pass

sample_points abstractmethod

sample_points(n: int) -> list[Point]

Sample random points from the space.

Args: n: Number of points to sample.

Returns: List of n randomly sampled points.

Source code in src/kmeanssa_ng/core/abstract.py
@abstractmethod
def sample_points(self, n: int) -> list[Point]:
    """Sample random points from the space.

    Args:
        n: Number of points to sample.

    Returns:
        List of n randomly sampled points.
    """
    pass

sample_centers abstractmethod

sample_centers(k: int) -> list[Center]

Sample random centers from the space.

Args: k: Number of centers to sample.

Returns: List of k randomly sampled centers.

Source code in src/kmeanssa_ng/core/abstract.py
@abstractmethod
def sample_centers(self, k: int) -> list[Center]:
    """Sample random centers from the space.

    Args:
        k: Number of centers to sample.

    Returns:
        List of k randomly sampled centers.
    """
    pass

compute_clusters abstractmethod

compute_clusters(centers: list[Center]) -> None

Assign points to their nearest center.

This method typically updates internal state or annotations indicating which cluster each point belongs to.

Args: centers: List of cluster centers.

Source code in src/kmeanssa_ng/core/abstract.py
@abstractmethod
def compute_clusters(self, centers: list[Center]) -> None:
    """Assign points to their nearest center.

    This method typically updates internal state or annotations
    indicating which cluster each point belongs to.

    Args:
        centers: List of cluster centers.
    """
    pass

Simulated Annealing Algorithm

The main clustering algorithm that works with any metric space implementation.

kmeanssa_ng.core.simulated_annealing.SimulatedAnnealing

SimulatedAnnealing(
    observations: list[Point],
    k: int,
    lambda_param: int = 1,
    beta: float = 1.0,
    step_size: float = 0.1,
)

Simulated annealing for offline k-means clustering.

This algorithm solves the k-means problem on arbitrary metric spaces using simulated annealing. Centers perform Brownian motion (exploration) and drift toward observations (exploitation), with temperature controlled by an inhomogeneous Poisson process.

Attributes: space: The metric space containing the observations. k: Number of clusters. observations: List of points to cluster. centers: Current cluster centers.

Example:

# Create observations and space
space = QuantumGraph(...)
points = space.sample_points(100)

# Run simulated annealing
sa = SimulatedAnnealing(points, k=5)
centers = sa.run(robust_prop=0.1, initialization="kpp")

Initialize the simulated annealing algorithm.

Args: observations: List of points to cluster, all in the same metric space. k: Number of clusters. lambda_param: Intensity parameter for Poisson process (must be > 0). beta: Inverse temperature parameter (must be > 0, higher = faster convergence). step_size: Time step for updating centers (must be > 0).

Raises: ValueError: If observations is empty, k <= 0, points are in different spaces, or hyperparameters are invalid.

Source code in src/kmeanssa_ng/core/simulated_annealing.py
def __init__(
    self,
    observations: list[Point],
    k: int,
    lambda_param: int = 1,
    beta: float = 1.0,
    step_size: float = 0.1,
) -> None:
    """Initialize the simulated annealing algorithm.

    Args:
        observations: List of points to cluster, all in the same metric space.
        k: Number of clusters.
        lambda_param: Intensity parameter for Poisson process (must be > 0).
        beta: Inverse temperature parameter (must be > 0, higher = faster convergence).
        step_size: Time step for updating centers (must be > 0).

    Raises:
        ValueError: If observations is empty, k <= 0, points are in different spaces,
            or hyperparameters are invalid.
    """
    if not observations:
        raise ValueError("Observations must be a non-empty list of points.")
    if k <= 0:
        raise ValueError("Number of clusters 'k' must be greater than zero.")
    if any(obs.space != observations[0].space for obs in observations):
        raise ValueError("All observations must belong to the same metric space.")

    # Validate lambda_param
    try:
        lambda_float = float(lambda_param)
    except (TypeError, ValueError) as e:
        raise ValueError(
            f"lambda_param must be a number, got {type(lambda_param).__name__}"
        ) from e
    if lambda_float <= 0:
        raise ValueError(f"lambda_param must be positive, got {lambda_float}")

    # Validate beta
    try:
        beta_float = float(beta)
    except (TypeError, ValueError) as e:
        raise ValueError(f"beta must be a number, got {type(beta).__name__}") from e
    if beta_float <= 0:
        raise ValueError(f"beta must be positive, got {beta_float}")

    # Validate step_size
    try:
        step_size_float = float(step_size)
    except (TypeError, ValueError) as e:
        raise ValueError(f"step_size must be a number, got {type(step_size).__name__}") from e
    if step_size_float <= 0:
        raise ValueError(f"step_size must be positive, got {step_size_float}")

    self._space = observations[0].space
    self._observations = observations.copy()
    self._k = k
    self._lambda = lambda_float
    self._beta = beta_float
    self._step_size = step_size_float

    rd.shuffle(self._observations)
    self._centers: list[Center] = []

run

run(
    robust_prop: float = 0.0,
    robust_points: list[Point] | None = None,
    initialization: str = "kpp",
    algorithm_version: str = "v1",
) -> list[Center]

Run the simulated annealing algorithm.

Args: robust_prop: Proportion of final iterations for robustification (0 to 1). The best centers from this period are returned. robust_points: Optional dataset for computing energy during robustification. Defaults to the observations. initialization: Initialization method ("kpp" for k-means++, "random" otherwise). algorithm_version: Algorithm variant ("v1" or "v2"). v1 interleaves drift and brownian motion, v2 performs all brownian motion first.

Returns: List of k robust cluster centers.

Raises: ValueError: If robust_prop not in [0, 1] or algorithm_version invalid.

Source code in src/kmeanssa_ng/core/simulated_annealing.py
def run(
    self,
    robust_prop: float = 0.0,
    robust_points: list[Point] | None = None,
    initialization: str = "kpp",
    algorithm_version: str = "v1",
) -> list[Center]:
    """Run the simulated annealing algorithm.

    Args:
        robust_prop: Proportion of final iterations for robustification (0 to 1).
            The best centers from this period are returned.
        robust_points: Optional dataset for computing energy during robustification.
            Defaults to the observations.
        initialization: Initialization method ("kpp" for k-means++, "random" otherwise).
        algorithm_version: Algorithm variant ("v1" or "v2"). v1 interleaves drift
            and brownian motion, v2 performs all brownian motion first.

    Returns:
        List of k robust cluster centers.

    Raises:
        ValueError: If robust_prop not in [0, 1] or algorithm_version invalid.
    """
    if robust_prop < 0 or robust_prop > 1:
        raise ValueError("The proportion must be in [0,1]")
    if algorithm_version not in ["v1", "v2"]:
        raise ValueError("algorithm_version must be 'v1' or 'v2'")

    if robust_points is None:
        robust_points = self._observations

    i0 = int(np.floor((self.n - 1) * (1 - robust_prop)))

    # Initialize centers
    if initialization == "kpp":
        self._centers = self._initialize_kpp_centers()
    else:
        self._centers = self._initialize_centers()

    best_centers = self._clone_centers(self._centers)
    best_energy = self.space.calculate_energy_graph(best_centers)

    times = self._initialize_times(self.n)

    if algorithm_version == "v1":
        return self._run_v1(times, i0, best_centers, best_energy)
    else:
        return self._run_v2(times, i0, best_centers, best_energy)

_initialize_centers

_initialize_centers() -> list[Center]

Initialize centers randomly in the metric space.

Source code in src/kmeanssa_ng/core/simulated_annealing.py
def _initialize_centers(self) -> list[Center]:
    """Initialize centers randomly in the metric space."""
    return self.space.sample_centers(self._k)