languagechange.models.meaning package

Submodules

languagechange.models.meaning.clustering module

class languagechange.models.meaning.clustering.ClusteringResults(labels)[source]

Bases: object

get_cluster_instances(cluster_id)[source]
class languagechange.models.meaning.clustering.Clustering(algorithm)[source]

Bases: object

get_cluster_results(embeddings)[source]
Parameters:

embeddings (numpy.array)

class languagechange.models.meaning.clustering.APosterioriaffinityPropagation(*args, **kwargs)[source]

Bases: ClusterMixin, BaseEstimator

A class that implements the APP clustering algorithm.

This class is compatible with the [scikit-learn](https://scikit-learn.org) ecosystem.

Parameters:
  • damping (float, default=0.9) – Damping factor in the range [0.5, 1.0) is the extent to which the current value is maintained relative to incoming values (weighted 1 - damping). This in order to avoid numerical oscillations when updating these values (messages).

  • max_iter (int, default=200) – Maximum number of iterations.

  • convergence_iter (int, default=15) – Number of iterations with no change in the number of estimated clusters that stops the convergence.

  • copy (bool, default=True) – Make a copy of input data.

  • preference (array-like of shape (n_samples,) or float, default=None) – Preferences for each point - points with larger values of preferences are more likely to be chosen as exemplars. The number of exemplars, ie of clusters, is influenced by the input preferences value. If the preferences are not passed as arguments, they will be set to the median of the input similarities.

  • affinity ({'euclidean', 'cosine'}, default='cosine') – Which affinity to use. At the moment cosine, euclidean are supported. ‘euclidean’ uses the negative squared euclidean distance between points.

  • verbose (bool, default=False) – Whether to be verbose.

  • random_state (int, RandomState instance or None, default=42) – Pseudo-random number generator to control the starting state. Use an int for reproducible results across function calls.

  • th_gamma (int, default=0) – Threshold over the aging index gamma. Must be in [1, ∞). Clustering refinement is not enforced when th_gamma=0.

step_

Iteration number.

Type:

int

cluster_centers_indices_

Indices of cluster centers.

Type:

ndarray of shape (n_clusters,)

cluster_centers_

Cluster centers.

Type:

ndarray of shape (n_clusters, n_features)

labels_

Labels of each point.

Type:

ndarray of shape (n_samples,)

affinity_matrix_

Stores the affinity matrix used in fit.

Type:

ndarray of shape (n_samples, n_samples)

n_iter_

Number of iterations taken to converge.

Type:

int

n_features_in_

Number of features seen during fit.

Type:

int

feature_names_in_

Names of features seen during fit. Defined only when X has feature names that are all strings.

Type:

ndarray of shape (n_features_in_,)

memory_

Memory of all clustering result.

Type:

dict

fit_predict(embs)[source]
fit(X, y=None)[source]

Fit the clustering from features.

Parameters:
  • X ({array-like, sparse matrix} of shape (n_samples, n_features)) – Training instances to cluster. If a sparse feature matrix is provided, it will be converted into a sparse csr_matrix.

  • y (Ignored) – Not used, present here for API consistency by convention.

Returns:

Returns the instance itself.

Return type:

self

languagechange.models.meaning.meaning module

class languagechange.models.meaning.meaning.MeaningModel[source]

Bases: ABC

class languagechange.models.meaning.meaning.WordSenseInduction[source]

Bases: MeaningModel

class languagechange.models.meaning.meaning.StaticEmbedding[source]

Bases: ABC

Placeholder base for static embedding types.

class languagechange.models.meaning.meaning.SGNS[source]

Bases: StaticEmbedding

Module contents