Creating a concordance with TokenHandler

The type2toks attribute of the nephosem.TokenHandler class is a dictionary with type names as keys and nephosem.TypeNode objects as values. The TypeNode objects have a tokens attribute, which is a list of nephosem.TokenNode objects with information on each collected token. From them, we can create a concordance with a function like tokenConcordance() below.

[9]:
import sys
nephosemdir = "../../nephosem/"
sys.path.append(nephosemdir)
mydir = "./"
from nephosem import ConfigLoader, Vocab, TokenHandler
from nephosem.utils import save_concordance
conf = ConfigLoader()
settings = conf.update_config('config.ini')

Collect tokens

[2]:
query = Vocab({'girl/N' : 0}) # dummy query just for illustration
# alternatively, if you already have a vocabulary, vocab.subvocab(['girl/N'])
[3]:
tokhan = TokenHandler(query, settings=settings)
tokens = tokhan.retrieve_tokens()
tokens
WARNING: Not provide the temporary path!
WARNING: Use the default tmp directory: '~/tmp'!
Scanning tokens of queries in corpus...
[3]:
[21, 39]                   be/V  what/W  that/I  a/D  ,/,  ask/V  and/C  ...
girl/N/StanfDepSents.1/3   NaN   NaN     NaN     NaN  NaN  NaN    NaN    ...
girl/N/StanfDepSents.1/13  NaN   NaN     NaN     NaN  NaN  NaN    NaN    ...
girl/N/StanfDepSents.1/20  NaN   NaN     NaN     NaN  NaN  NaN    NaN    ...
girl/N/StanfDepSents.2/29  -4    NaN     NaN     NaN  NaN  NaN    NaN    ...
girl/N/StanfDepSents.8/3   NaN   NaN     NaN     NaN  NaN  NaN    NaN    ...
girl/N/StanfDepSents.8/15  NaN   NaN     NaN     NaN  NaN  NaN    NaN    ...
girl/N/StanfDepSents.8/25  NaN   NaN     NaN     NaN  NaN  NaN    -2     ...
...                        ...   ...     ...     ...  ...  ...    ...    ...
[10]:
outputfile = 'output/concordance.tsv'
save_concordance(outputfile, tokhan.type2toks, colloc_fmt='word')

Read concordance

nephosem.utils.save_concordance() directly stores the concordance as a tab-separated dataframe in outputfile, without headers.

[12]:
import pandas as pd
pd.read_csv(outputfile, sep = '\t', names = ['token_id', 'left', 'target', 'right'])
[12]:
token_id left target right
0 girl/N/StanfDepSents.1/3 The girl looks healthy
1 girl/N/StanfDepSents.1/13 boy looks at the girl as she eats
2 girl/N/StanfDepSents.1/20 The girl eats less healthy food
3 girl/N/StanfDepSents.2/29 are eaten by the girl NaN
4 girl/N/StanfDepSents.8/3 The girl sat on the apple
5 girl/N/StanfDepSents.8/15 boy looked at the girl 's apple
6 girl/N/StanfDepSents.8/25 the boys and the girls eat apples
7 girl/N/StanfDepSents.4/7 boy says that the girl should eat the apple
8 girl/N/StanfDepSents.4/15 The girl eats the apple that
9 girl/N/StanfDepSents.9/14 The older girl looks at a boy
10 girl/N/StanfDepSents.5/19 What the girl eats was given by
11 girl/N/StanfDepSents.11/3 The girl looks at the boy
12 girl/N/StanfDepSents.11/19 the apple which the girl gave him
13 girl/N/StanfDepSents.11/28 This year , the girl looked at a boy
14 girl/N/StanfDepSents.3/21 The boy and the girl eat a healthy and
15 girl/N/StanfDepSents.6/6 The boy gives the girl a tasty healthy apple
16 girl/N/StanfDepSents.6/21 The girl does n't eat
17 girl/N/StanfDepSents.10/13 The girl sits down
18 girl/N/StanfDepSents.10/19 The girl eats about ten apples
19 girl/N/StanfDepSents.7/7 old boy gives the girl a baby apple
20 girl/N/StanfDepSents.7/25 The boy asked the girl about eating apples