nephosem.deprel package

Submodules

nephosem.deprel.basic module

class nephosem.deprel.basic.DiGraph

Bases: object

add_edge(e_id, from_v, to_v, e_label)

Add an edge to graph.

add_node(v_id, v_label)

Add a node with id and label (optional) to graph.

property edges
in_degree(v)
property istree
property nodes
out_degree(v)
predcessors(v)
successors(v)
class nephosem.deprel.basic.FeatureGraph(template, target=- 1, feature_filter={})

Bases: nephosem.deprel.basic.TemplateGraph

Class representing a feature graph inherited from the class TemplateGraph. So it will have the same structure of the template from which it is generated. The generating process of a feature object would be: * 1. replicate a (tree) structure of the template * 2. set target node index * 3. set feature properties for each node (except for the target) and each edge.

The feature properties (i.e. True or False) would be stored in attributes of nodes and edges

add_match(matched_nodes, matched_edges)

Add matched nodes and edges

Parameters
  • matched_nodes (dict) – mapping from node index to item string

  • matched_edges (dict) – mapping from edge index to relation string

set_feature(feature_filter={})
set_target(target)
show(v_label='label', e_label='rel', figsize=(5.0, 5.0))
show_match(index=1, v_label='label', e_label='rel', figsize=(5.0, 5.0))
property size
class nephosem.deprel.basic.Graph(sentence=None, id2node=None)

Bases: object

add_edge(e_from_node, e_to_node, e_label)

Add an edge to graph.

add_node(v_id, v_label=None)

Add a node with id and label (optional) to graph.

build_graph(sentence=None, id2node=None)

Build a graph

Parameters
  • sentence (iterable) – A list of dependency relations.

  • id2node (dict) – Node id to node string mapping.

build_graph_raw(sentence)

Build a graph from raw text (of a sentence)

Parameters

sentence (iterable) – A list of strings

property edges
match(path)

Match a graph with path.

Parameters

path (PathTemplate) –

Returns

valid matches

Return type

iterable

property nodes
class nephosem.deprel.basic.Path(template, matches=None)

Bases: object

Class storing path matches found in corpus

add_path(match)

Add a match

Parameters

match (iterable) – A list of str

property len

size of template i.e. ‘:NN:amod:VB:’ has a size of one

classmethod load(filename)
save(filename, encoding='utf-8')
property size

number of matches

class nephosem.deprel.basic.PathTemplate(nodes, edges)

Bases: object

Class representing a path template

property len
match_edge(rel, index=0, u=0, v=0)
match_node(item, index=0)
class nephosem.deprel.basic.Sentence(s)

Bases: object

get_content()
parse(s=None)

parse sentence text (raw string from corpus file)

class nephosem.deprel.basic.SentenceGraph(nodes=None, edges=None, sentence=None)

Bases: nephosem.deprel.basic.DiGraph

build_graph(sentence)

Build a graph from raw text (of a sentence)

Parameters

sentence (iterable) – A list of strings

generate_graph(nodes, edges)
match_feature(feature)

Match a sentence with a feature (and target pair)

match_target_feature(feature)

Match a graph with a path (a tree/graph object).

Parameters

feature (FeatureGraph) –

Returns

valid matches

Return type

iterable

show(v_label='label', e_label='rel', figsize=(5.0, 5.0))
class nephosem.deprel.basic.TemplateGraph(nodes=None, edges=None, graph=None)

Bases: nephosem.deprel.basic.DiGraph

Class representing a dependency template tree/graph

static islinear(template)
match_edge(rel, idx=0)
match_node(item, idx=0)
show(v_label='label', e_label='rel', figsize=(5.0, 5.0))
nephosem.deprel.basic.get_depth(gx)

Get the depth of a tree

nephosem.deprel.basic.get_depth_of_node(gx, v)

Get the depth of a node

nephosem.deprel.basic.get_root(gx)

Get the root of a tree

nephosem.deprel.basic.match_level(sentence, feature, currmap)

Match the next level based on the index mapping (feature index -> sentence index) of current level

Parameters
  • sentence (SentenceGraph) –

  • feature (FeatureGraph) –

  • currmap (dict) – Index mapping from sentence node to feature node (of current level). e.g. feature node idx -> sentence node idx

Returns

A list of dicts

Return type

feature node idx -> sentence node idx

nephosem.deprel.basic.match_sub_template(sentence=None, feature=None, valid_nodes=None, valid_edges=None)
nephosem.deprel.basic.match_successors(sentence, scur, feature, fcur)

Match sentence successors with feature successors based on current sentence node and feature node.

Parameters
  • sentence (SentenceGraph) –

  • scur (int) – Current node index of sentence

  • feature (FeatureGraph) –

  • fcur (int) – Current node index of feature

Returns

A list of dicts

Return type

feature node idx -> sentence node idx

nephosem.deprel.basic.subtree_match(sentence=None, feature=None, lmatches=None)
Parameters
  • sentence (SentenceGraph) –

  • feature (FeatureGraph) –

  • lmatches (queue (collections.deque)) – Contains a list of possible matches. Each match is a (finally the length is feature.depth) lists of levels of the feature. Element example: feature node idx -> sentence node idx.

nephosem.deprel.basic.tree_match(sentence, feature)

Match the sentence with the feature.

Parameters

nephosem.deprel.corpus module

nephosem.deprel.dephandler module

class nephosem.deprel.dephandler.DepRelManager(settings)

Bases: nephosem.core.handler.BaseHandler

Handler Class for processing dependency relations

build_dep_rel(fnames=None, multicore=True)

The function will treat all different word types as possible target or context words.

Parameters
  • fnames (str, optional) – Filename of a file which records all (a user wants to process) file names of a corpus. Format: corpus_name + settings[“fnames-ext”]

  • row_vocab (Vocab) – Target words (types) vocabulary. If a non-empty vocabulary is passed, only target words (types) in this vocab should be processed. Otherwise all possible words (types) should be processed.

  • col_vocab (Vocab) – Context features vocabulary. If a non-empty vocabulary is passed, only context features in this vocab should be processed. Otherwise all possible contexts should be processed.

  • multicore (bool) – Use multicore version of the method or not.

do_job_single(fnames, **superkwargs)

Method doing job for handler class.

Parameters

fnames (iterable) – A list of filenames

merge_results()

Merge subprocess matrices into one final matrix. sub-process matrices filename format: …/matches.sub.pid

process(fnames, queue_factor=2)
Parameters
  • fnames

  • queue_factor (int, optional) – Multiplier for size of queue -> size = number of workers * queue_factor.

read_features(fname=None, features=None, encoding='utf-8')
read_template(fname=None, features=None, encoding='utf-8')

Read paths from file

nephosem.deprel.dephandler.update_dep_rel_caller(fnames, tmpdir=None, settings=None)

This method will save path template matches of sub-process. Filename format of sub-process objects:

matrix: paths.sub.pid

Parameters
  • fnames (iterable) – A list of filenames

  • tmpdir (str) – Temporary folder

  • settings (dict) –

nephosem.deprel.deputils module

nephosem.deprel.deputils.cartesian_product(mapping)

{1: (1, 2), 2: (3, 4)} -> [{1:1, 2:3}, {1:2, 2:3}, {1:1, 2:4}, {1:2, 2:4}] Should perform: {1: (1, 2), 2: (1, 2)} -> [{1: 1, 2: 2}] (if the target is not in (1, 2)) [{1: 1, 2: 2}, {1: 2, 2: 1}]

nephosem.deprel.deputils.group(nodemap)
nephosem.deprel.deputils.judgeIn(sLine)
nephosem.deprel.deputils.judgeIn_v1(sLine)
nephosem.deprel.deputils.judgeOut(sLine)
nephosem.deprel.deputils.outMap(item)
nephosem.deprel.deputils.process_sentence(sLine)

Process a sentence line. line : ‘sid:edge1;edge2;edge3…’

Parameters

sLine

nephosem.deprel.deputils.read_nodes(in_nodes, encoding='utf-8')

Read nodes from file. i.e. :

a/DT 4 an/DT 21 …

nephosem.deprel.deputils.split_large_file(filename, encoding='utf-8')

Split large corpus file into samller ones for multicore processing

nephosem.deprel.tmp module

Module contents