nephosem.deprel package¶
Submodules¶
nephosem.deprel.basic module¶
- class nephosem.deprel.basic.DiGraph¶
Bases:
object
- add_edge(e_id, from_v, to_v, e_label)¶
Add an edge to graph.
- add_node(v_id, v_label)¶
Add a node with id and label (optional) to graph.
- property edges¶
- in_degree(v)¶
- property istree¶
- property nodes¶
- out_degree(v)¶
- predcessors(v)¶
- successors(v)¶
- class nephosem.deprel.basic.FeatureGraph(template, target=- 1, feature_filter={})¶
Bases:
nephosem.deprel.basic.TemplateGraph
Class representing a feature graph inherited from the class TemplateGraph. So it will have the same structure of the template from which it is generated. The generating process of a feature object would be: * 1. replicate a (tree) structure of the template * 2. set target node index * 3. set feature properties for each node (except for the target) and each edge.
The feature properties (i.e. True or False) would be stored in attributes of nodes and edges
- add_match(matched_nodes, matched_edges)¶
Add matched nodes and edges
- Parameters
matched_nodes (dict) – mapping from node index to item string
matched_edges (dict) – mapping from edge index to relation string
- set_feature(feature_filter={})¶
- set_target(target)¶
- show(v_label='label', e_label='rel', figsize=(5.0, 5.0))¶
- show_match(index=1, v_label='label', e_label='rel', figsize=(5.0, 5.0))¶
- property size¶
- class nephosem.deprel.basic.Graph(sentence=None, id2node=None)¶
Bases:
object
- add_edge(e_from_node, e_to_node, e_label)¶
Add an edge to graph.
- add_node(v_id, v_label=None)¶
Add a node with id and label (optional) to graph.
- build_graph(sentence=None, id2node=None)¶
Build a graph
- Parameters
sentence (iterable) – A list of dependency relations.
id2node (dict) – Node id to node string mapping.
- build_graph_raw(sentence)¶
Build a graph from raw text (of a sentence)
- Parameters
sentence (iterable) – A list of strings
- property edges¶
- match(path)¶
Match a graph with path.
- Parameters
path (
PathTemplate
) –- Returns
valid matches
- Return type
iterable
- property nodes¶
- class nephosem.deprel.basic.Path(template, matches=None)¶
Bases:
object
Class storing path matches found in corpus
- add_path(match)¶
Add a match
- Parameters
match (iterable) – A list of str
- property len¶
size of template i.e. ‘:NN:amod:VB:’ has a size of one
- classmethod load(filename)¶
- save(filename, encoding='utf-8')¶
- property size¶
number of matches
- class nephosem.deprel.basic.PathTemplate(nodes, edges)¶
Bases:
object
Class representing a path template
- property len¶
- match_edge(rel, index=0, u=0, v=0)¶
- match_node(item, index=0)¶
- class nephosem.deprel.basic.Sentence(s)¶
Bases:
object
- get_content()¶
- parse(s=None)¶
parse sentence text (raw string from corpus file)
- class nephosem.deprel.basic.SentenceGraph(nodes=None, edges=None, sentence=None)¶
Bases:
nephosem.deprel.basic.DiGraph
- build_graph(sentence)¶
Build a graph from raw text (of a sentence)
- Parameters
sentence (iterable) – A list of strings
- generate_graph(nodes, edges)¶
- match_feature(feature)¶
Match a sentence with a feature (and target pair)
- match_target_feature(feature)¶
Match a graph with a path (a tree/graph object).
- Parameters
feature (
FeatureGraph
) –- Returns
valid matches
- Return type
iterable
- show(v_label='label', e_label='rel', figsize=(5.0, 5.0))¶
- class nephosem.deprel.basic.TemplateGraph(nodes=None, edges=None, graph=None)¶
Bases:
nephosem.deprel.basic.DiGraph
Class representing a dependency template tree/graph
- static islinear(template)¶
- match_edge(rel, idx=0)¶
- match_node(item, idx=0)¶
- show(v_label='label', e_label='rel', figsize=(5.0, 5.0))¶
- nephosem.deprel.basic.get_depth(gx)¶
Get the depth of a tree
- nephosem.deprel.basic.get_depth_of_node(gx, v)¶
Get the depth of a node
- nephosem.deprel.basic.get_root(gx)¶
Get the root of a tree
- nephosem.deprel.basic.match_level(sentence, feature, currmap)¶
Match the next level based on the index mapping (feature index -> sentence index) of current level
- Parameters
sentence (
SentenceGraph
) –feature (
FeatureGraph
) –currmap (dict) – Index mapping from sentence node to feature node (of current level). e.g. feature node idx -> sentence node idx
- Returns
A list of dicts
- Return type
feature node idx -> sentence node idx
- nephosem.deprel.basic.match_sub_template(sentence=None, feature=None, valid_nodes=None, valid_edges=None)¶
- nephosem.deprel.basic.match_successors(sentence, scur, feature, fcur)¶
Match sentence successors with feature successors based on current sentence node and feature node.
- Parameters
sentence (
SentenceGraph
) –scur (int) – Current node index of sentence
feature (
FeatureGraph
) –fcur (int) – Current node index of feature
- Returns
A list of dicts
- Return type
feature node idx -> sentence node idx
- nephosem.deprel.basic.subtree_match(sentence=None, feature=None, lmatches=None)¶
- Parameters
sentence (
SentenceGraph
) –feature (
FeatureGraph
) –lmatches (queue (collections.deque)) – Contains a list of possible matches. Each match is a (finally the length is feature.depth) lists of levels of the feature. Element example: feature node idx -> sentence node idx.
- nephosem.deprel.basic.tree_match(sentence, feature)¶
Match the sentence with the feature.
- Parameters
sentence (
SentenceGraph
) –feature (
FeatureGraph
) –
nephosem.deprel.corpus module¶
nephosem.deprel.dephandler module¶
- class nephosem.deprel.dephandler.DepRelManager(settings)¶
Bases:
nephosem.core.handler.BaseHandler
Handler Class for processing dependency relations
- build_dep_rel(fnames=None, multicore=True)¶
The function will treat all different word types as possible target or context words.
- Parameters
fnames (str, optional) – Filename of a file which records all (a user wants to process) file names of a corpus. Format: corpus_name + settings[“fnames-ext”]
row_vocab (
Vocab
) – Target words (types) vocabulary. If a non-empty vocabulary is passed, only target words (types) in this vocab should be processed. Otherwise all possible words (types) should be processed.col_vocab (
Vocab
) – Context features vocabulary. If a non-empty vocabulary is passed, only context features in this vocab should be processed. Otherwise all possible contexts should be processed.multicore (bool) – Use multicore version of the method or not.
- do_job_single(fnames, **superkwargs)¶
Method doing job for handler class.
- Parameters
fnames (iterable) – A list of filenames
- merge_results()¶
Merge subprocess matrices into one final matrix. sub-process matrices filename format: …/matches.sub.pid
- process(fnames, queue_factor=2)¶
- Parameters
fnames –
queue_factor (int, optional) – Multiplier for size of queue -> size = number of workers * queue_factor.
- read_features(fname=None, features=None, encoding='utf-8')¶
- read_template(fname=None, features=None, encoding='utf-8')¶
Read paths from file
- nephosem.deprel.dephandler.update_dep_rel_caller(fnames, tmpdir=None, settings=None)¶
This method will save path template matches of sub-process. Filename format of sub-process objects:
matrix: paths.sub.pid
- Parameters
fnames (iterable) – A list of filenames
tmpdir (str) – Temporary folder
settings (dict) –
nephosem.deprel.deputils module¶
- nephosem.deprel.deputils.cartesian_product(mapping)¶
{1: (1, 2), 2: (3, 4)} -> [{1:1, 2:3}, {1:2, 2:3}, {1:1, 2:4}, {1:2, 2:4}] Should perform: {1: (1, 2), 2: (1, 2)} -> [{1: 1, 2: 2}] (if the target is not in (1, 2)) [{1: 1, 2: 2}, {1: 2, 2: 1}]
- nephosem.deprel.deputils.group(nodemap)¶
- nephosem.deprel.deputils.judgeIn(sLine)¶
- nephosem.deprel.deputils.judgeIn_v1(sLine)¶
- nephosem.deprel.deputils.judgeOut(sLine)¶
- nephosem.deprel.deputils.outMap(item)¶
- nephosem.deprel.deputils.process_sentence(sLine)¶
Process a sentence line. line : ‘sid:edge1;edge2;edge3…’
- Parameters
sLine –
- nephosem.deprel.deputils.read_nodes(in_nodes, encoding='utf-8')¶
Read nodes from file. i.e. :
a/DT 4 an/DT 21 …
- nephosem.deprel.deputils.split_large_file(filename, encoding='utf-8')¶
Split large corpus file into samller ones for multicore processing