Creating a concordance with TokenHandler¶

The type2toks attribute of the nephosem.TokenHandler class is a dictionary with type names as keys and nephosem.TypeNode objects as values. The TypeNode objects have a tokens attribute, which is a list of nephosem.TokenNode objects with information on each collected token. From them, we can create a concordance with a function like tokenConcordance() below.

[9]:

import sys
nephosemdir = "../../nephosem/"
sys.path.append(nephosemdir)
mydir = "./"
from nephosem import ConfigLoader, Vocab, TokenHandler
from nephosem.utils import save_concordance
conf = ConfigLoader()
settings = conf.update_config('config.ini')

Collect tokens¶

[2]:

query = Vocab({'girl/N' : 0}) # dummy query just for illustration
# alternatively, if you already have a vocabulary, vocab.subvocab(['girl/N'])

[3]:

tokhan = TokenHandler(query, settings=settings)
tokens = tokhan.retrieve_tokens()
tokens

WARNING: Not provide the temporary path!
WARNING: Use the default tmp directory: '~/tmp'!
Scanning tokens of queries in corpus...

[3]:

[21, 39]                   be/V  what/W  that/I  a/D  ,/,  ask/V  and/C  ...
girl/N/StanfDepSents.1/3   NaN   NaN     NaN     NaN  NaN  NaN    NaN    ...
girl/N/StanfDepSents.1/13  NaN   NaN     NaN     NaN  NaN  NaN    NaN    ...
girl/N/StanfDepSents.1/20  NaN   NaN     NaN     NaN  NaN  NaN    NaN    ...
girl/N/StanfDepSents.2/29  -4    NaN     NaN     NaN  NaN  NaN    NaN    ...
girl/N/StanfDepSents.8/3   NaN   NaN     NaN     NaN  NaN  NaN    NaN    ...
girl/N/StanfDepSents.8/15  NaN   NaN     NaN     NaN  NaN  NaN    NaN    ...
girl/N/StanfDepSents.8/25  NaN   NaN     NaN     NaN  NaN  NaN    -2     ...
...                        ...   ...     ...     ...  ...  ...    ...    ...

[10]:

outputfile = 'output/concordance.tsv'
save_concordance(outputfile, tokhan.type2toks, colloc_fmt='word')

Read concordance¶

nephosem.utils.save_concordance() directly stores the concordance as a tab-separated dataframe in outputfile, without headers.

[12]:

import pandas as pd
pd.read_csv(outputfile, sep = '\t', names = ['token_id', 'left', 'target', 'right'])

[12]:

	token_id	left	target	right
0	girl/N/StanfDepSents.1/3	The	girl	looks healthy
1	girl/N/StanfDepSents.1/13	boy looks at the	girl	as she eats
2	girl/N/StanfDepSents.1/20	The	girl	eats less healthy food
3	girl/N/StanfDepSents.2/29	are eaten by the	girl	NaN
4	girl/N/StanfDepSents.8/3	The	girl	sat on the apple
5	girl/N/StanfDepSents.8/15	boy looked at the	girl	's apple
6	girl/N/StanfDepSents.8/25	the boys and the	girls	eat apples
7	girl/N/StanfDepSents.4/7	boy says that the	girl	should eat the apple
8	girl/N/StanfDepSents.4/15	The	girl	eats the apple that
9	girl/N/StanfDepSents.9/14	The older	girl	looks at a boy
10	girl/N/StanfDepSents.5/19	What the	girl	eats was given by
11	girl/N/StanfDepSents.11/3	The	girl	looks at the boy
12	girl/N/StanfDepSents.11/19	the apple which the	girl	gave him
13	girl/N/StanfDepSents.11/28	This year , the	girl	looked at a boy
14	girl/N/StanfDepSents.3/21	The boy and the	girl	eat a healthy and
15	girl/N/StanfDepSents.6/6	The boy gives the	girl	a tasty healthy apple
16	girl/N/StanfDepSents.6/21	The	girl	does n't eat
17	girl/N/StanfDepSents.10/13	The	girl	sits down
18	girl/N/StanfDepSents.10/19	The	girl	eats about ten apples
19	girl/N/StanfDepSents.7/7	old boy gives the	girl	a baby apple
20	girl/N/StanfDepSents.7/25	The boy asked the	girl	about eating apples