Skip to content

Remove utils/kmeans.rs and migrate disk partitioning to diskann-quantization KMeans#920

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/remove-utils-kmeans-rs
Draft

Remove utils/kmeans.rs and migrate disk partitioning to diskann-quantization KMeans#920
Copilot wants to merge 2 commits intomainfrom
copilot/remove-utils-kmeans-rs

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 8, 2026

diskann-providers/src/utils/kmeans.rs was the last home-grown KMeans implementation remaining after PQ moved to diskann-quantization. This removes it entirely, replacing its users with the quantization crate's kmeans_plusplus_into + lloyds API.

Changes

  • Deleted diskann-providers/src/utils/kmeans.rs — removes k_means_clustering, k_meanspp_selecting_pivots, run_lloyds, and all helpers (~989 lines).
  • Removed k_means_clustering / k_meanspp_selecting_pivots / run_lloyds from diskann-providers::utils public API.
  • diskann-disk/src/utils/partition.rs — the only live user; now calls kmeans_plusplus_into + lloyds via MatrixBase views:
    let data_view = MatrixBase::try_from(train_data_float.as_slice(), num_train, train_dim)?;
    let mut centers = MatrixBase::try_from(pivot_data.as_mut_slice(), num_parts, train_dim)?;
    kmeans_plusplus_into(centers.as_mut_view(), data_view, rng)
        .map_err(|e| ANNError::log_pq_error(e.to_string()))?;
    lloyds(data_view, centers.as_mut_view(), MAX_K_MEANS_REPS);
  • pq_construction.rs — removed the already-dead generate_optimized_pq_pivots / opq_quantize_all_chunks / copy_chunk_centroids_to_full_table functions (and their test) that were the only remaining callers of the old API within this file.
  • Benchmarks (kmeans_bench.rs, kmeans_bench_iai.rs) — updated to benchmark the diskann-quantization implementation.

Copilot AI linked an issue Apr 8, 2026 that may be closed by this pull request
1 task
Copilot AI changed the title [WIP] Remove utils/kmeans.rs and use diskann-quantization implementation Remove utils/kmeans.rs and migrate disk partitioning to diskann-quantization KMeans Apr 8, 2026
Copilot AI requested a review from arrayka April 8, 2026 01:08
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 63.63636% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.27%. Comparing base (de98ea6) to head (996064c).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
diskann-disk/src/utils/partition.rs 63.63% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #920      +/-   ##
==========================================
- Coverage   89.34%   89.27%   -0.07%     
==========================================
  Files         444      443       -1     
  Lines       83986    83131     -855     
==========================================
- Hits        75036    74217     -819     
+ Misses       8950     8914      -36     
Flag Coverage Δ
miri 89.27% <63.63%> (-0.07%) ⬇️
unittests 89.11% <63.63%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-providers/src/model/pq/pq_construction.rs 91.03% <ø> (-1.12%) ⬇️
diskann-disk/src/utils/partition.rs 91.91% <63.63%> (-0.64%) ⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Get rid of utils/kmeans.rs

3 participants