{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating a concordance with TokenHandler\n", "\n", "The `type2toks` attribute of the `nephosem.TokenHandler` class is a dictionary with type names as keys and `nephosem.TypeNode` objects as values.\n", "The `TypeNode` objects have a `tokens` attribute, which is a list of `nephosem.TokenNode` objects with information on each collected token. From them, we can create a concordance with a function like `tokenConcordance()` below." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "import sys\n", "nephosemdir = \"../../nephosem/\"\n", "sys.path.append(nephosemdir)\n", "mydir = \"./\"\n", "from nephosem import ConfigLoader, Vocab, TokenHandler\n", "from nephosem.utils import save_concordance\n", "conf = ConfigLoader()\n", "settings = conf.update_config('config.ini')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Collect tokens" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "query = Vocab({'girl/N' : 0}) # dummy query just for illustration\n", "# alternatively, if you already have a vocabulary, vocab.subvocab(['girl/N'])" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING: Not provide the temporary path!\n", "WARNING: Use the default tmp directory: '~/tmp'!\n", "Scanning tokens of queries in corpus...\n" ] }, { "data": { "text/plain": [ "[21, 39] be/V what/W that/I a/D ,/, ask/V and/C ...\n", "girl/N/StanfDepSents.1/3 NaN NaN NaN NaN NaN NaN NaN ...\n", "girl/N/StanfDepSents.1/13 NaN NaN NaN NaN NaN NaN NaN ...\n", "girl/N/StanfDepSents.1/20 NaN NaN NaN NaN NaN NaN NaN ...\n", "girl/N/StanfDepSents.2/29 -4 NaN NaN NaN NaN NaN NaN ...\n", "girl/N/StanfDepSents.8/3 NaN NaN NaN NaN NaN NaN NaN ...\n", "girl/N/StanfDepSents.8/15 NaN NaN NaN NaN NaN NaN NaN ...\n", "girl/N/StanfDepSents.8/25 NaN NaN NaN NaN NaN NaN -2 ...\n", "... ... ... ... ... ... ... ... ..." ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tokhan = TokenHandler(query, settings=settings)\n", "tokens = tokhan.retrieve_tokens()\n", "tokens" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "outputfile = 'output/concordance.tsv'\n", "save_concordance(outputfile, tokhan.type2toks, colloc_fmt='word')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read concordance\n", "\n", "`nephosem.utils.save_concordance()` directly stores the concordance as a tab-separated dataframe in `outputfile`, without headers." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | token_id | \n", "left | \n", "target | \n", "right | \n", "
---|---|---|---|---|
0 | \n", "girl/N/StanfDepSents.1/3 | \n", "The | \n", "girl | \n", "looks healthy | \n", "
1 | \n", "girl/N/StanfDepSents.1/13 | \n", "boy looks at the | \n", "girl | \n", "as she eats | \n", "
2 | \n", "girl/N/StanfDepSents.1/20 | \n", "The | \n", "girl | \n", "eats less healthy food | \n", "
3 | \n", "girl/N/StanfDepSents.2/29 | \n", "are eaten by the | \n", "girl | \n", "NaN | \n", "
4 | \n", "girl/N/StanfDepSents.8/3 | \n", "The | \n", "girl | \n", "sat on the apple | \n", "
5 | \n", "girl/N/StanfDepSents.8/15 | \n", "boy looked at the | \n", "girl | \n", "'s apple | \n", "
6 | \n", "girl/N/StanfDepSents.8/25 | \n", "the boys and the | \n", "girls | \n", "eat apples | \n", "
7 | \n", "girl/N/StanfDepSents.4/7 | \n", "boy says that the | \n", "girl | \n", "should eat the apple | \n", "
8 | \n", "girl/N/StanfDepSents.4/15 | \n", "The | \n", "girl | \n", "eats the apple that | \n", "
9 | \n", "girl/N/StanfDepSents.9/14 | \n", "The older | \n", "girl | \n", "looks at a boy | \n", "
10 | \n", "girl/N/StanfDepSents.5/19 | \n", "What the | \n", "girl | \n", "eats was given by | \n", "
11 | \n", "girl/N/StanfDepSents.11/3 | \n", "The | \n", "girl | \n", "looks at the boy | \n", "
12 | \n", "girl/N/StanfDepSents.11/19 | \n", "the apple which the | \n", "girl | \n", "gave him | \n", "
13 | \n", "girl/N/StanfDepSents.11/28 | \n", "This year , the | \n", "girl | \n", "looked at a boy | \n", "
14 | \n", "girl/N/StanfDepSents.3/21 | \n", "The boy and the | \n", "girl | \n", "eat a healthy and | \n", "
15 | \n", "girl/N/StanfDepSents.6/6 | \n", "The boy gives the | \n", "girl | \n", "a tasty healthy apple | \n", "
16 | \n", "girl/N/StanfDepSents.6/21 | \n", "The | \n", "girl | \n", "does n't eat | \n", "
17 | \n", "girl/N/StanfDepSents.10/13 | \n", "The | \n", "girl | \n", "sits down | \n", "
18 | \n", "girl/N/StanfDepSents.10/19 | \n", "The | \n", "girl | \n", "eats about ten apples | \n", "
19 | \n", "girl/N/StanfDepSents.7/7 | \n", "old boy gives the | \n", "girl | \n", "a baby apple | \n", "
20 | \n", "girl/N/StanfDepSents.7/25 | \n", "The boy asked the | \n", "girl | \n", "about eating apples | \n", "