languagechange package¶
Subpackages¶
- languagechange.models package
- languagechange.models.change package
- languagechange.models.meaning package
- languagechange.models.representation package
- Module contents
Submodules¶
languagechange.benchmark module¶
languagechange.cache module¶
Cache manager with atomic write helpers for file-based caches.
- class languagechange.cache.CacheManager(cache_dir=None)[source]¶
Bases:
objectManages cache files with atomic write operations to prevent data corruption in concurrent environments. The cache files are saved to a directory that can be specified during initialization.
- atomic_write(path)[source]¶
Provides a context manager for writing to cache files in an atomic way. This ensures that partial writes do not corrupt the target file, especially when multiple processes or threads access the same file.
- Parameters:
path (str) – The relative path to the cache file within the cache directory.
- Yields:
file object –
- A writable file object for writing data. The file is temporary
and will be renamed to the target file path after the write operation completes successfully.
languagechange.corpora module¶
Corpus utilities for line-level corpora and search helpers.
- class languagechange.corpora.Line(raw_text=None, tokens=None, lemmas=None, pos_tags=None, fname=None, raw_lemma_text=None, raw_pos_text=None, **kwargs)[source]¶
Bases:
objectWraps a corpus line with token, lemma, and POS metadata.
- search(search_term, time=None)[source]¶
Searches the line given a search_term.
- Parameters:
search_term (SearchTerm) – SearchTerm
- Return type:
Returns: A TargetUsageList of all matches.
- class languagechange.corpora.Corpus(name, language=None, time=no time specification, time_function=None, skip_lines=0, **args)[source]¶
Bases:
objectBase interface for corpora that support search and tokenization.
- search(search_terms)[source]¶
Searches through the corpora by calling Line.search() on all lines.
- Parameters:
search_terms (List[str | Pattern | SearchTerm]) – List[ str | Pattern | SearchTerm ] If a search term is str or Pattern it is converted to a SearchTerm and matches tokens only SearchTerm(word_feature = ‘token’).
- Return type:
Returns: A UsageDictionary containing all search results for each search term.
- tokenize(tokenizer='trankit', split_sentences=False, batch_size=128)[source]¶
Yield tokenized sentences using Trankit, optionally splitting sentences.
- lemmatize(lemmatizer='trankit', pretokenized=False, tokenize=False, split_sentences=False, batch_size=128)[source]¶
- pos_tagging(pos_tagger='trankit', pretokenized=False, tokenize=False, split_sentences=False, batch_size=128)[source]¶
- class languagechange.corpora.VerticalCorpus(path, sentence_separator='\n', field_separator='\t', field_map={'lemma': 1, 'pos_tag': 2, 'token': 0}, **args)[source]¶
Bases:
Corpus
- class languagechange.corpora.XMLCorpus(path, sentence_tag='sentence', token_tag='token', is_lemmatized=False, lemma_tag=None, is_pos_tagged=False, pos_tag_tag=None, text_tag='text', **args)[source]¶
Bases:
Corpus- cast_to_linebyline(linebyline_corpus)[source]¶
- Parameters:
linebyline_corpus (LinebyLineCorpus)
- cast_to_vertical(vertical_corpus)[source]¶
- Parameters:
vertical_corpus (VerticalCorpus)
- class languagechange.corpora.SprakBankenCorpus(path, sentence_tag='sentence', token_tag='token', is_lemmatized=True, lemma_tag='lemma', is_pos_tagged=True, pos_tag_tag='pos', **args)[source]¶
Bases:
XMLCorpus
- class languagechange.corpora.HistoricalCorpus(*args, **kwargs)[source]¶
Bases:
SortedKeyList- line_iterator()[source]¶
Iterates through all of the corpora, and yields all of the lines that are possible to get.
- search(search_terms, index_by_corpus=False)[source]¶
Searches through all of the corpora by calling search() for each of them.
- Parameters:
search_terms (List[str | Pattern | SearchTerm]) – List[ str | Pattern | SearchTerm ] If search term is str or Pattern it is converted to a SearchTerm and matches tokens only SearchTerm(word_feature = ‘token’).
index_by_corpus – bool, default False decides whether the usages for a given word should be a dictionary, with keys as the corpus names and values as lists of usages, or a list of all usages across corpora.
Returns: a dictionary containing all search results from the included corpora.
languagechange.evaluation module¶
languagechange.resource_manager module¶
Resource manager that downloads and caches datasets and models.
- class languagechange.resource_manager.LanguageChange[source]¶
Bases:
object- download(resource_type, resource_name, dataset, version)[source]¶
Download and cache a resource from the resource hub.
languagechange.search module¶
Helper utilities for searching corpora for target terms.
languagechange.usages module¶
Target usage helpers and containers for LanguageChange.
- class languagechange.usages.POS(*values)[source]¶
Bases:
EnumEnumeration of supported parts of speech for targets.
- NOUN = 1¶
- VERB = 2¶
- ADJECTIVE = 3¶
- ADVERB = 4¶
- class languagechange.usages.Target(target)[source]¶
Bases:
objectStores a target word together with optional metadata.
- Parameters:
target (str)
- class languagechange.usages.TargetUsage(text, offsets, time=None, **kwargs)[source]¶
Bases:
objectRepresents an individual usage with offsets and optional time metadata.
- class languagechange.usages.DWUGUsage(target, date, grouping, identifier, description, **args)[source]¶
Bases:
TargetUsageDWUG-specific usage metadata, including annotator judgments.
languagechange.utils module¶
Simple time representations used across the LanguageChange toolkit.
- class languagechange.utils.LiteralTime(time)[source]¶
Bases:
TimeRepresents a literal timestamp or label for usage references.
- Parameters:
time (str)
Module contents¶
Core exports for the LanguageChange toolkit.