{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating a concordance with TokenHandler\n", "\n", "The `type2toks` attribute of the `nephosem.TokenHandler` class is a dictionary with type names as keys and `nephosem.TypeNode` objects as values.\n", "The `TypeNode` objects have a `tokens` attribute, which is a list of `nephosem.TokenNode` objects with information on each collected token. From them, we can create a concordance with a function like `tokenConcordance()` below." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "import sys\n", "nephosemdir = \"../../nephosem/\"\n", "sys.path.append(nephosemdir)\n", "mydir = \"./\"\n", "from nephosem import ConfigLoader, Vocab, TokenHandler\n", "from nephosem.utils import save_concordance\n", "conf = ConfigLoader()\n", "settings = conf.update_config('config.ini')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Collect tokens" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "query = Vocab({'girl/N' : 0}) # dummy query just for illustration\n", "# alternatively, if you already have a vocabulary, vocab.subvocab(['girl/N'])" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WARNING: Not provide the temporary path!\n", "WARNING: Use the default tmp directory: '~/tmp'!\n", "Scanning tokens of queries in corpus...\n" ] }, { "data": { "text/plain": [ "[21, 39] be/V what/W that/I a/D ,/, ask/V and/C ...\n", "girl/N/StanfDepSents.1/3 NaN NaN NaN NaN NaN NaN NaN ...\n", "girl/N/StanfDepSents.1/13 NaN NaN NaN NaN NaN NaN NaN ...\n", "girl/N/StanfDepSents.1/20 NaN NaN NaN NaN NaN NaN NaN ...\n", "girl/N/StanfDepSents.2/29 -4 NaN NaN NaN NaN NaN NaN ...\n", "girl/N/StanfDepSents.8/3 NaN NaN NaN NaN NaN NaN NaN ...\n", "girl/N/StanfDepSents.8/15 NaN NaN NaN NaN NaN NaN NaN ...\n", "girl/N/StanfDepSents.8/25 NaN NaN NaN NaN NaN NaN -2 ...\n", "... ... ... ... ... ... ... ... ..." ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tokhan = TokenHandler(query, settings=settings)\n", "tokens = tokhan.retrieve_tokens()\n", "tokens" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "outputfile = 'output/concordance.tsv'\n", "save_concordance(outputfile, tokhan.type2toks, colloc_fmt='word')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Read concordance\n", "\n", "`nephosem.utils.save_concordance()` directly stores the concordance as a tab-separated dataframe in `outputfile`, without headers." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
token_idlefttargetright
0girl/N/StanfDepSents.1/3Thegirllooks healthy
1girl/N/StanfDepSents.1/13boy looks at thegirlas she eats
2girl/N/StanfDepSents.1/20Thegirleats less healthy food
3girl/N/StanfDepSents.2/29are eaten by thegirlNaN
4girl/N/StanfDepSents.8/3Thegirlsat on the apple
5girl/N/StanfDepSents.8/15boy looked at thegirl's apple
6girl/N/StanfDepSents.8/25the boys and thegirlseat apples
7girl/N/StanfDepSents.4/7boy says that thegirlshould eat the apple
8girl/N/StanfDepSents.4/15Thegirleats the apple that
9girl/N/StanfDepSents.9/14The oldergirllooks at a boy
10girl/N/StanfDepSents.5/19What thegirleats was given by
11girl/N/StanfDepSents.11/3Thegirllooks at the boy
12girl/N/StanfDepSents.11/19the apple which thegirlgave him
13girl/N/StanfDepSents.11/28This year , thegirllooked at a boy
14girl/N/StanfDepSents.3/21The boy and thegirleat a healthy and
15girl/N/StanfDepSents.6/6The boy gives thegirla tasty healthy apple
16girl/N/StanfDepSents.6/21Thegirldoes n't eat
17girl/N/StanfDepSents.10/13Thegirlsits down
18girl/N/StanfDepSents.10/19Thegirleats about ten apples
19girl/N/StanfDepSents.7/7old boy gives thegirla baby apple
20girl/N/StanfDepSents.7/25The boy asked thegirlabout eating apples
\n", "
" ], "text/plain": [ " token_id left target \\\n", "0 girl/N/StanfDepSents.1/3 The girl \n", "1 girl/N/StanfDepSents.1/13 boy looks at the girl \n", "2 girl/N/StanfDepSents.1/20 The girl \n", "3 girl/N/StanfDepSents.2/29 are eaten by the girl \n", "4 girl/N/StanfDepSents.8/3 The girl \n", "5 girl/N/StanfDepSents.8/15 boy looked at the girl \n", "6 girl/N/StanfDepSents.8/25 the boys and the girls \n", "7 girl/N/StanfDepSents.4/7 boy says that the girl \n", "8 girl/N/StanfDepSents.4/15 The girl \n", "9 girl/N/StanfDepSents.9/14 The older girl \n", "10 girl/N/StanfDepSents.5/19 What the girl \n", "11 girl/N/StanfDepSents.11/3 The girl \n", "12 girl/N/StanfDepSents.11/19 the apple which the girl \n", "13 girl/N/StanfDepSents.11/28 This year , the girl \n", "14 girl/N/StanfDepSents.3/21 The boy and the girl \n", "15 girl/N/StanfDepSents.6/6 The boy gives the girl \n", "16 girl/N/StanfDepSents.6/21 The girl \n", "17 girl/N/StanfDepSents.10/13 The girl \n", "18 girl/N/StanfDepSents.10/19 The girl \n", "19 girl/N/StanfDepSents.7/7 old boy gives the girl \n", "20 girl/N/StanfDepSents.7/25 The boy asked the girl \n", "\n", " right \n", "0 looks healthy \n", "1 as she eats \n", "2 eats less healthy food \n", "3 NaN \n", "4 sat on the apple \n", "5 's apple \n", "6 eat apples \n", "7 should eat the apple \n", "8 eats the apple that \n", "9 looks at a boy \n", "10 eats was given by \n", "11 looks at the boy \n", "12 gave him \n", "13 looked at a boy \n", "14 eat a healthy and \n", "15 a tasty healthy apple \n", "16 does n't eat \n", "17 sits down \n", "18 eats about ten apples \n", "19 a baby apple \n", "20 about eating apples " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "pd.read_csv(outputfile, sep = '\\t', names = ['token_id', 'left', 'target', 'right'])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.9", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.9" } }, "nbformat": 4, "nbformat_minor": 2 }