languagechange.models.change package

Submodules

languagechange.models.change.metrics module

class languagechange.models.change.metrics.ChangeModel[source]

Bases: object

class languagechange.models.change.metrics.BinaryChange[source]

Bases: ChangeModel

predict()[source]
class languagechange.models.change.metrics.GradedChange[source]

Bases: ChangeModel

compute_scores()[source]
class languagechange.models.change.metrics.Threshold[source]

Bases: BinaryChange

set_threshold(threshold)[source]
class languagechange.models.change.metrics.AutomaticThrehold[source]

Bases: Threshold

compute_threshold(scores, func=<function AutomaticThrehold.<lambda>>)[source]
class languagechange.models.change.metrics.OptimalThrehold[source]

Bases: Threshold

compute_threshold(scores, vrange=numpy.arange, evaluator=None)[source]
class languagechange.models.change.metrics.APD[source]

Bases: GradedChange

compute_scores(embeddings1, embeddings2, metric='cosine')[source]
class languagechange.models.change.metrics.PRT[source]

Bases: GradedChange

compute_scores(embeddings1, embeddings2, metric='cosine')[source]
class languagechange.models.change.metrics.PJSD[source]

Bases: GradedChange

compute_scores(embeddings1, embeddings2, clustering_algorithm, metric='cosine')[source]

languagechange.models.change.timeseries module

languagechange.models.change.timeseries.ma(ts, k)[source]

Computes the moving average of a timeseries. :param ts: a timeseries. :type ts: np.array :param k: the window (k timesteps to the left and k to the right) :type k: int

Returns:

the moving average of the timeseries (not including endpoints)

class languagechange.models.change.timeseries.TimeSeries(embs=None, series=None, change_metric=None, timeseries_type=None, k=1, time_labels=None, clustering_algorithm=None, distance_metric='cosine')[source]

Bases: object

Parameters:
  • embs (List[numpy.array])

  • series (numpy.array)

  • timeseries_type (str)

  • time_labels (numpy.array | List)

compute_from_embeddings(embs, change_metric, timeseries_type, k=1, time_labels=None, clustering_algorithm=None, distance_metric='cosine')[source]
Parameters:
  • embs ([np.array]) – a list of embeddings, each element of the list contains embeddings from one time period.

  • change_metric (str|object) – the metric to use when comparing embeddings from different time periods (should be one of the classes in languagechange.models.change.metrics).

  • timeseries_type (str) – the kind of timeseries to construct. One of [‘compare_to_first’, ‘compare_to_last’, ‘consecutive’, ‘moving_average’].

  • time_labels (np.array|list) – labels for the x axis of the timeseries.

  • clustering_algorithm – the clustering algorithm if using PJSD as the change metric. E.g. one of the algorithms in scikit-learn, or languagechange.

  • distance_metric (str) – the distance metric to use when computing change scores.

Returns:

the final timeseries. ts (np.array): the time values/labels for each value in the final timeseries.

Return type:

series (np.array)

languagechange.models.change.widid module

class languagechange.models.change.widid.WiDiD(algorithm=<class 'languagechange.models.meaning.clustering.APosterioriaffinityPropagation'>, metric='cosine', **args)[source]

Bases: object

A class that implements WiDiD (https://github.com/FrancescoPeriti/WiDiD).

compute_scores(embs_list, timeseries_type='consecutive', k=1, change_metric='apd', time_labels=None)[source]

Performs a-posteriori affinity propagation (APP) clustering and computes the semantic change as the APD (or another metric) between the prototype embeddings in clusters of different time periods.

Parameters:
  • embs_list ([np.array]) – a list of embeddings for a target word, where each element is embeddings of one time period.

  • timeseries_type (str) – the type of timeseries (see usage in languagechange.models.change.timeseries).

  • k (int) – the window size, if moving average (see usage in languagechange.models.change.timeseries).

  • change_metric (str) – the change metric (e.g. ‘apd’) to use (see usage in languagechange.models.change.timeseries).

  • change_metric – the change metric (e.g. ‘apd’) to use (see usage in languagechange.models.change.timeseries).

  • time_labels (np.array|list) – labels for the x axis of the timeseries (see usage in languagechange.models.change.timeseries).

Returns:

the labels for each embedding in each time period. prot_embs ([np.array]): a list of matrices encoding the prototype (average) embedding of each cluster in each time period. change_scores (TimeSeries): a timeseries (languagechange.models.change.timeseries.TimeSeries) containing the degree of change between the embeddings in different time periods.

Return type:

labels ([np.array])

Module contents