Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
394 changes: 394 additions & 0 deletions lib/node_modules/@stdlib/ml/strided/dkmeans-init-plus-plus/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,394 @@
<!--

@license Apache-2.0

Copyright (c) 2026 The Stdlib Authors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

-->

# dkmeansInitPlusPlus

> Initializes centroids by performing the [k-means++][kmeans-plus-plus] initialization procedure on double-precision floating-point data points.

<!-- Section to include introductory text. Make sure to keep an empty line after the intro `section` element and another before the `/section` close. -->

<section class="intro">

</section>

<!-- /.intro -->

<!-- Package usage documentation. -->

<section class="usage">

## Usage

```javascript
var dkmeansInitPlusPlus = require( '@stdlib/ml/strided/dkmeans-init-plus-plus' );
```

<!-- lint disable maximum-heading-length -->

#### dkmeansInitPlusPlus( order, M, N, k, trials, metric, X, LDX, out, LDO, W1, sw1, W2, sw2\[, options] )

<!-- lint enable maximum-heading-length -->

Initializes centroids by performing the [k-means++][kmeans-plus-plus] initialization procedure on double-precision floating-point data points.

<!-- eslint-disable max-len -->

```javascript
var Float64Array = require( '@stdlib/array/float64' );
var Int32Array = require( '@stdlib/array/int32' );

var k = 3;
var M = 5;
var N = 2;

// Allocate an output matrix for the centroids:
var out = new Float64Array( k*N );
var W1 = new Float64Array( 2*M );
var W2 = new Int32Array( k );

/*
X = [
[ 0.0, 0.0 ],
[ 1.0, 1.0 ],
[ 1.0, -1.0 ],
[ -1.0, -1.0 ],
[ -1.0, 1.0 ]
]
*/
var X = new Float64Array( [ 0.0, 0.0, 1.0, 1.0, 1.0, -1.0, -1.0, -1.0, -1.0, 1.0 ] );

var centroids = dkmeansInitPlusPlus( 'row-major', M, N, k, 3, 'sqeuclidean', X, N, out, N, W1, 1, W2, 1 );
```

The function has the following parameters:

- **order**: storage layout.
- **M**: number of data points.
- **N**: number of features.
- **k**: number of clusters (i.e., the number of centroids to initialize).
- **trials**: number of candidate centroids to sample per iteration (must be `>= 1`).
- **metric**: distance metric. Must be one of the following: `'sqeuclidean'`, `'cosine'`, `'cityblock'`, or `'correlation'`.
- **X**: input matrix stored as a [`Float64Array`][@stdlib/array/float64].
- **LDX**: stride of the first dimension of `X` (a.k.a., leading dimension of the matrix `X`).
- **out**: output matrix stored as a [`Float64Array`][@stdlib/array/float64].
- **LDO**: stride of the first dimension of `out` (a.k.a., leading dimension of the matrix `out`).
- **W1**: first workspace array of size `2*M` for tracking squared distances and probabilities.
- **sw1**: stride length of `W1`.
- **W2**: second workspace array for tracking centroid candidates.
- **sw2**: stride length of `W2`.

The output matrix `out` has shape `(k, N)`, with one initialized centroid per row, and the input matrix `X` has shape `(M, N)`, with one data point per row.

```javascript
var Float64Array = require( '@stdlib/array/float64' );
var Int32Array = require( '@stdlib/array/int32' );

var k = 2;
var M = 3;
var N = 2;

// Allocate an output matrix for the centroids:
var out = new Float64Array( k*N );
var W1 = new Float64Array( 2*M );
var W2 = new Int32Array( k );

/*
X = [
[ 0.0, 0.0 ],
[ 1.0, 3.0 ],
[ 2.0, 4.0 ],
]
*/
var X = new Float64Array( [ 0.0, 0.0, 1.0, 3.0, 2.0, 4.0 ] );

var centroids = dkmeansInitPlusPlus( 'column-major', M, N, k, 3, 'sqeuclidean', X, M, out, k, W1, 1, W2, 1 );
```

Note that indexing is relative to the first index. To introduce an offset, use [`typed array`][mdn-typed-array] views.

<!-- eslint-disable max-len -->

```javascript
var Float64Array = require( '@stdlib/array/float64' );
var Int32Array = require( '@stdlib/array/int32' );

var k = 3;
var M = 5;
var N = 2;

// Allocate an output matrix for the centroids:
var out = new Float64Array( k*N );
var W1 = new Float64Array( 2*M );
var W2 = new Int32Array( k );

/*
X = [
[ 0.0, 0.0 ],
[ 1.0, 1.0 ],
[ 1.0, -1.0 ],
[ -1.0, -1.0 ],
[ -1.0, 1.0 ]
]
*/
var X = new Float64Array( [ 0.0, 0.0, 1.0, 1.0, 1.0, -1.0, -1.0, -1.0, -1.0, 1.0, 2.0 ] );

var X1 = new Float64Array( X.buffer, X.BYTES_PER_ELEMENT*1 ); // start at 2nd element

var centroids = dkmeansInitPlusPlus( 'row-major', M, N, k, 3, 'sqeuclidean', X1, N, out, N, W1, 1, W2, 1 );
```

The function accepts the following `options`:

- **prng**: pseudorandom number generator for generating uniformly distributed pseudorandom numbers. If provided, the function **ignores** both the `state` and `seed` options. In order to seed the underlying pseudorandom number generator, one must seed the provided `prng` (assuming the provided `prng` is seedable).
- **seed**: pseudorandom number generator seed.
- **state**: a [`Uint32Array`][@stdlib/array/uint32] containing pseudorandom number generator state. If provided, the function ignores the `seed` option.
- **copy**: `boolean` indicating whether to copy a provided pseudorandom number generator state. Setting this option to `false` allows sharing state between two or more pseudorandom number generators. Setting this option to `true` ensures that an underlying generator has exclusive control over its internal state. Default: `true`.

To use a custom PRNG as the underlying source of uniformly distributed pseudorandom numbers, set the `prng` option.

<!-- eslint-disable max-len -->

```javascript
var Float64Array = require( '@stdlib/array/float64' );
var Int32Array = require( '@stdlib/array/int32' );
var minstd = require( '@stdlib/random/base/minstd' );

var k = 3;
var M = 5;
var N = 2;

// Allocate an output matrix for the centroids:
var out = new Float64Array( k*N );
var W1 = new Float64Array( 2*M );
var W2 = new Int32Array( k );

/*
X = [
[ 0.0, 0.0 ],
[ 1.0, 1.0 ],
[ 1.0, -1.0 ],
[ -1.0, -1.0 ],
[ -1.0, 1.0 ]
]
*/
var X = new Float64Array( [ 0.0, 0.0, 1.0, 1.0, 1.0, -1.0, -1.0, -1.0, -1.0, 1.0 ] );

var opts = {
'prng': minstd
};

var centroids = dkmeansInitPlusPlus( 'row-major', M, N, k, 3, 'sqeuclidean', X, N, out, N, W1, 1, W2, 1, opts );
```

To seed the underlying pseudorandom number generator, set the `seed` option.

<!-- eslint-disable max-len -->

```javascript
var Float64Array = require( '@stdlib/array/float64' );
var Int32Array = require( '@stdlib/array/int32' );

var k = 3;
var M = 5;
var N = 2;

// Allocate an output matrix for the centroids:
var out = new Float64Array( k*N );
var W1 = new Float64Array( 2*M );
var W2 = new Int32Array( k );

/*
X = [
[ 0.0, 0.0 ],
[ 1.0, 1.0 ],
[ 1.0, -1.0 ],
[ -1.0, -1.0 ],
[ -1.0, 1.0 ]
]
*/
var X = new Float64Array( [ 0.0, 0.0, 1.0, 1.0, 1.0, -1.0, -1.0, -1.0, -1.0, 1.0 ] );

var opts = {
'seed': 12345
};

var centroids = dkmeansInitPlusPlus( 'row-major', M, N, k, 3, 'sqeuclidean', X, N, out, N, W1, 1, W2, 1, opts );
```

<!-- lint disable maximum-heading-length -->

#### dkmeansInitPlusPlus.ndarray( M, N, k, trials, metric, X, sx1, sx2, ox, out, so1, so2, oo, W1, sw1, ow1, W2, sw2, ow2\[, options] )

<!-- lint disable maximum-heading-length -->

Initializes centroids by performing the [k-means++][kmeans-plus-plus] initialization procedure on double-precision floating-point data points using alternative indexing semantics.

```javascript
var Float64Array = require( '@stdlib/array/float64' );
var Int32Array = require( '@stdlib/array/int32' );

var k = 3;
var M = 5;
var N = 2;

// Allocate an output matrix for the centroids:
var out = new Float64Array( k*N );
var W1 = new Float64Array( 2*M );
var W2 = new Int32Array( k );

/*
X = [
[ 0.0, 0.0 ],
[ 1.0, 1.0 ],
[ 1.0, -1.0 ],
[ -1.0, -1.0 ],
[ -1.0, 1.0 ]
]
*/
var X = new Float64Array( [ 0.0, 0.0, 1.0, 1.0, 1.0, -1.0, -1.0, -1.0, -1.0, 1.0 ] );

var centroids = dkmeansInitPlusPlus.ndarray( M, N, k, 3, 'sqeuclidean', X, N, 1, 0, out, N, 1, 0, W1, 1, 0, W2, 1, 0 );
```

The function has the following parameters:

- **M**: number of data points.
- **N**: number of features.
- **k**: number of clusters (i.e., the number of centroids to initialize).
- **trials**: number of candidate centroids to sample per iteration (must be `>= 1`).
- **metric**: distance metric. Must be one of the following: `'sqeuclidean'`, `'cosine'`, `'cityblock'`, or `'correlation'`.
- **X**: input matrix stored as a [`Float64Array`][@stdlib/array/float64].
- **sx1**: stride of the first dimension of `X`.
- **sx2**: stride of the second dimension of `X`.
- **ox**: starting index for `X`.
- **out**: output matrix stored as a [`Float64Array`][@stdlib/array/float64].
- **so1**: stride of the first dimension of `out`.
- **so2**: stride of the second dimension of `out`.
- **oo**: starting index for `out`.
- **W1**: first workspace array of size `2*M` for tracking squared distances and probabilities.
- **sw1**: stride length of `W1`.
- **ow1**: starting index for `W1`.
- **W2**: second workspace array for tracking centroid candidates.
- **sw2**: stride length of `W2`.
- **ow2**: starting index for `W2`.

</section>

<!-- /.usage -->

<!-- Package usage notes. Make sure to keep an empty line after the `section` element and another before the `/section` close. -->

<section class="notes">

## Notes

- The k-means++ procedure is stochastic; providing the same `seed` yields reproducible centroids.
- Increasing the number of `trials` (the "greedy" k-means++ variant) samples multiple candidate centroids per iteration and keeps the candidate which minimizes the total squared distance, generally improving centroid quality at the cost of additional computation.

</section>

<!-- /.notes -->

<!-- Package usage examples. -->

<section class="examples">

## Examples

<!-- eslint no-undef: "error" -->

```javascript
var Float64Array = require( '@stdlib/array/float64' );
var Int32Array = require( '@stdlib/array/int32' );
var discreteUniform = require( '@stdlib/random/array/discrete-uniform' );
var dkmeansInitPlusPlus = require( '@stdlib/ml/strided/dkmeans-init-plus-plus' );

var k = 3;
var M = 10;
var N = 2;

// Generate a random set of data points:
var X = discreteUniform( M*N, -50, 50, {
'dtype': 'float64'
});
console.log( X );

// Allocate an output matrix for the centroids:
var out = new Float64Array( k*N );

// Allocate workspace arrays:
var W1 = new Float64Array( 2*M );
var W2 = new Int32Array( k );

// Set PRNG options
var options = {
'seed': 1234
};

// Initialize centroids using the k-means++ procedure:
var centroids = dkmeansInitPlusPlus( 'row-major', M, N, k, 3, 'sqeuclidean', X, N, out, N, W1, 1, W2, 1, options );

console.log( centroids );
// => <Float64Array>

// Initialize centroids using the k-means++ procedure using alternative indexing semantics:
centroids = dkmeansInitPlusPlus.ndarray( M, N, k, 3, 'sqeuclidean', X, N, 1, 0, out, N, 1, 0, W1, 1, 0, W2, 1, 0, options );

console.log( centroids );
// => <Float64Array>
```

</section>

<!-- /.examples -->

<section class="references">

## References

- Arthur, David, and Sergei Vassilvitskii. 2007. "K-means++: The Advantages of Careful Seeding." In _Proceedings of the Eighteenth Annual Acm-Siam Symposium on Discrete Algorithms_, 1027–35. SODA '07. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics. <http://dl.acm.org/citation.cfm?id=1283383.1283494>.

</section>

<!-- /.references -->

<!-- Section for related `stdlib` packages. Do not manually edit this section, as it is automatically populated. -->

<section class="related">

</section>

<!-- /.related -->

<!-- Section for all links. Make sure to keep an empty line after the `section` element and another before the `/section` close. -->

<section class="links">

[kmeans-plus-plus]: https://en.wikipedia.org/wiki/K-means%2B%2B

[@stdlib/array/float64]: https://github.com/stdlib-js/stdlib/tree/develop/lib/node_modules/%40stdlib/array/float64

[mdn-typed-array]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/TypedArray

[@stdlib/array/uint32]: https://github.com/stdlib-js/stdlib/tree/develop/lib/node_modules/%40stdlib/array/uint32

</section>

<!-- /.links -->
Loading