Warning

This website is still a work in progress!

Nephological Semantics

Welcome! This is the current home website of the Nephological Semantics Project, developed in the QLVL research group at KU Leuven. You can learn more about the project here.

One of the main products of our project is Nephosem, a Python package with functions to create type- and token-level distributional models, both with bag-of-words and dependency information. On this site you can find the full reference as well as Tutorials.

https://zenodo.org/badge/233318567.svg

This package has been used in lexical semantics and lectometry studies within the Nephological Semantics projects; the derived publications are listed here.

Specific applications

Semasiological workflow

The semasiological workflow looks at the internal structure of individual words based on the contexts of their occurrences. For each word, it creates multiple token-level models -vector representations of each of its instances- combining different parameter settings (i.e. ways of defining context). Then it selects representative models and visualizes them in an interactive tool. A more or less technical explanation of the procedure is explained here.

The Nephosem package is at the core of this workflow, but is then expanded with other tools:

  • The semasioFlow Python package, which organizes and compacts Nephosem functions in a way specific to the semasiological workflow;

  • The semcloud R package, which takes the output of semasioFlow and prepares the data for visualization, running dimensionality reduction and clustering and generating annotated concordances 1 .

  • The NephoVis interactive visualization tool (see link above) for exhaustive, qualitative exploration of the models.

  • The Level 3 ShinyApp for deeper exploration of individual models.

To start, you can take a look at this notebook, which shows the main steps using semasioFlow and Nephosem, starting with a corpus in conll format (one token per line, columns for different features) and ending with token-by-token distance matrices as well as a number of metadata registers.

Lectometric workflow

Coming soon!

Indices and tables

Footnotes

1

These are not semantic annotations but model-related: context words captured by a given model are highlighted and weighting values may be included as superscript.