languagechange.models.change package¶
Submodules¶
languagechange.models.change.metrics module¶
- class languagechange.models.change.metrics.BinaryChange[source]¶
Bases:
ChangeModel
- class languagechange.models.change.metrics.GradedChange[source]¶
Bases:
ChangeModel
- class languagechange.models.change.metrics.Threshold[source]¶
Bases:
BinaryChange
- class languagechange.models.change.metrics.APD[source]¶
Bases:
GradedChange
- class languagechange.models.change.metrics.PRT[source]¶
Bases:
GradedChange
- class languagechange.models.change.metrics.PJSD[source]¶
Bases:
GradedChange
languagechange.models.change.timeseries module¶
- languagechange.models.change.timeseries.ma(ts, k)[source]¶
Computes the moving average of a timeseries. :param ts: a timeseries. :type ts: np.array :param k: the window (k timesteps to the left and k to the right) :type k: int
- Returns:
the moving average of the timeseries (not including endpoints)
- class languagechange.models.change.timeseries.TimeSeries(embs=None, series=None, change_metric=None, timeseries_type=None, k=1, time_labels=None, clustering_algorithm=None, distance_metric='cosine')[source]¶
Bases:
object- Parameters:
- compute_from_embeddings(embs, change_metric, timeseries_type, k=1, time_labels=None, clustering_algorithm=None, distance_metric='cosine')[source]¶
- Parameters:
embs ([np.array]) – a list of embeddings, each element of the list contains embeddings from one time period.
change_metric (str|object) – the metric to use when comparing embeddings from different time periods (should be one of the classes in languagechange.models.change.metrics).
timeseries_type (str) – the kind of timeseries to construct. One of [‘compare_to_first’, ‘compare_to_last’, ‘consecutive’, ‘moving_average’].
time_labels (np.array|list) – labels for the x axis of the timeseries.
clustering_algorithm – the clustering algorithm if using PJSD as the change metric. E.g. one of the algorithms in scikit-learn, or languagechange.
distance_metric (str) – the distance metric to use when computing change scores.
- Returns:
the final timeseries. ts (np.array): the time values/labels for each value in the final timeseries.
- Return type:
series (np.array)
languagechange.models.change.widid module¶
- class languagechange.models.change.widid.WiDiD(algorithm=<class 'languagechange.models.meaning.clustering.APosterioriaffinityPropagation'>, metric='cosine', **args)[source]¶
Bases:
objectA class that implements WiDiD (https://github.com/FrancescoPeriti/WiDiD).
- compute_scores(embs_list, timeseries_type='consecutive', k=1, change_metric='apd', time_labels=None)[source]¶
Performs a-posteriori affinity propagation (APP) clustering and computes the semantic change as the APD (or another metric) between the prototype embeddings in clusters of different time periods.
- Parameters:
embs_list ([np.array]) – a list of embeddings for a target word, where each element is embeddings of one time period.
timeseries_type (str) – the type of timeseries (see usage in languagechange.models.change.timeseries).
k (int) – the window size, if moving average (see usage in languagechange.models.change.timeseries).
change_metric (str) – the change metric (e.g. ‘apd’) to use (see usage in languagechange.models.change.timeseries).
change_metric – the change metric (e.g. ‘apd’) to use (see usage in languagechange.models.change.timeseries).
time_labels (np.array|list) – labels for the x axis of the timeseries (see usage in languagechange.models.change.timeseries).
- Returns:
the labels for each embedding in each time period. prot_embs ([np.array]): a list of matrices encoding the prototype (average) embedding of each cluster in each time period. change_scores (TimeSeries): a timeseries (languagechange.models.change.timeseries.TimeSeries) containing the degree of change between the embeddings in different time periods.
- Return type:
labels ([np.array])