nephosem.specutils package¶

Submodules¶

nephosem.specutils.deputils module¶

nephosem.specutils.deputils.draw_labels(gx, v_labels=None, e_labels=None)¶

nephosem.specutils.deputils.draw_match(feature, idx=0, figsize=(5.0, 5.0))¶

nephosem.specutils.deputils.draw_tree(gx, v_label=None, e_label=None, figsize=(5.0, 5.0))¶: Draw a tree layout of a graph :param gx: :type gx: networkx.DiGraph :param v_label: :type v_label: str :param e_label: :type e_label: str :param figsize: :type figsize: tuple of two values

nephosem.specutils.deputils.get_depth(gx)¶: Get the depth of a tree

nephosem.specutils.deputils.get_root(gx)¶: Get the root of a tree

nephosem.specutils.deputils.parse_pattern(target, feature, larrow='<-', rarrow='->')¶: Parse a string of feature regex to a node dict and an edge dict. e.g target = ‘(w+)/(N)w*’, feature = ‘<(nsubj) w+/(V)w* >[(acomp) (w+)/(JJ)]’ ==>

nephosem.specutils.deputils.tree_match(sentence, macro)¶

Match the sentence with the subgraph pattern. Abstract of the matching:

find root node of the pattern

iterate over each sentence node that matches the pattern root

recursively match the sentence with the pattern from each matched node

Parameters

sentence (SentenceGraph) –
macro (MacroGraph) –

nephosem.specutils.mxcalc module¶

Calculations of Matrices

nephosem.specutils.mxcalc.compute_association(freqMTX, nfreq, cfreq, N=None, meas='ppmi')¶

Compute association measures matrix.

The matrix provided can be a submatrix with selected rows and/or columns, but nfreq and cfreq must be marginal frequencies from a reference matrix, i.e. with co-occurrence frequencies for the full corpus. N should be the sum of that reference matrix: if it is not provided, it will be computed as the sum of row or column marginal frequencies (whatever is largest).

Parameters

freqMTX (TypeTokenMatrix) – Raw co-occurrence frequency matrix.
nfreq (Vocab) – Marginal row frequencies of the reference matrix.
cfreq (Vocab) – Marginal collocate frequencies of the reference matrix.
N (int) – Sum of the reference frequency matrix.
meas (str) – Implemented association measures: ‘pmi’, ‘ppmi’, ‘llik’ (log likelihood), ‘chisq’, ‘zscore’, ‘dice’.

Returns

association measure matrix

Return type

TypeTokenMatrix

nephosem.specutils.mxcalc.compute_cosine(measMTX, axis=0)¶

Parameters: measMTX (TypeTokenMatrix) –

nephosem.specutils.mxcalc.compute_distance(measMTX, axis=0, metric='cosine')¶

Compute distance matrix from association measure matrix

Parameters

measMTX (TypeTokenMatrix) –
axis (int) – 0 (row) or 1 (column)
metric (str) – ‘cosine’ (default), ‘euclidean’, ‘cityblock’, ‘l1’, ‘l2’, ‘manhattan’ metrics that are valid in sklearn.metrics.pairwise_distances

Returns

Return type

TypeTokenMatrix

nephosem.specutils.mxcalc.compute_ppmi(freqMTX, nfreq=None, cfreq=None, positive=True)¶: This method is faster than compute_association(). Set positive to False to get pmi values.

nephosem.specutils.mxcalc.compute_simrank(simMTX, reverse=False)¶

Compute similarity rank matrix.

Parameters

simMTX (cosine similarity SquareMatrix) –
reverse (boolean, default False) – True: the rank 1 represents the most similar one of that row False the largest rank 1 represents the most similar one of that row

nephosem.specutils.mxcalc.compute_token_vectors(tcWeightMTX, soccMTX, operation='addition', normalization='l1')¶

Compute token vectors. Build token vectors from a token weights (token-by-context weight matrix) and a second order matrix.

Parameters

tcWeightMTX (TypeTokenMatrix) – Token-Context weight matrix.
soccMTX (TypeTokenMatrix) – Second order collocate matrix.
operation (str) – ‘addition’, ‘multiplication’,’weightedmean’
normalization (str) – ‘l1’, ‘l2’, ‘no’

Returns

token vectors

Return type

TypeTokenMatrix

Note

Values for “normalization” are regulated by sklearn.preprocessing.normalize()

nephosem.specutils.mxcalc.compute_token_weights(tcPositionMTX, twMTX)¶

Compute token-by-context weight matrix. Build token weights from a token-by-context position/boolean matrix and a type-by-context weight matrix.

Parameters

tcPositionMTX (TypeTokenMatrix) –

token-by-context position matrix

target words

|

tokens | |

| —————
twMTX (TypeTokenMatrix) –

type-by-context weight matrix, i.e. ‘ppmi’ (transposed)

target words

context | … |

features | … x … |

(types) | … |

Returns

token weight matrix

Return type

TypeTokenMatrix

Notes

This function will transform all explicit zeros in type-by-context weight matrix to implicit zeros. So, if those explicit zeros a important, be careful of them.

nephosem.specutils.mxutils module¶

nephosem.specutils.mxutils.merge_matrices(matrices)¶

Merge a list of (TypeTokenMatrix) matrices into one.

Parameters: matrices (list) – A list of matrices (TypeTokenMatrix)
Returns: spmatrix, row_items, col_items
Return type: tuple

nephosem.specutils.mxutils.merge_two_matrices(mtx1, mtx2)¶

Merge two (TypeTokenMatrix) matrices.

Parameters

mtx1 (TypeTokenMatrix) –
mtx2 (TypeTokenMatrix) –

Returns

merged matrix

Return type

TypeTokenMatrix

nephosem.specutils.mxutils.transform_dict_to_spmatrix(dict_mtx, rowid2item, colid2item, verbose=False)¶

Generate sparse.csr_matrix from dict of dict, according to row item to id and col item to id mappings

Parameters

dict_mtx (dict of dict) – Matrix represented by a Python dict of dict
rowid2item (list of str) – Alphabetically ascending sorted list of items
colid2item (list of str) – Alphabetically ascending sorted list of items
verbose (bool) –

Returns

Return type

sparse.csr_matrix

nephosem.specutils.mxutils.transform_indices(mtx, new_col_items)¶

nephosem.specutils.mxutils.transform_nodes_to_matrix(type2toks, colloc_fmt='lemma/pos')¶

Transform type nodes to token matrix.

Parameters

type2toks (dict or iterable) – Type string -> token nodes of this type
colloc_fmt (str, default="lemma/pos") – Format for the column names

Returns

tokmx

Return type

TypeTokenMatrix

nephosem.specutils.mxutils.transform_spmatrix_to_dict(spmatrix, rowid2item, colid2item, verbose=False)¶

Generate dict of dict from sparse.csr_matrix, according to row item to id and col item to id mappings

Parameters

spmatrix (csr_matrix) –
rowid2item (iterable (of str)) – Alphabetically ascending sorted list of items
colid2item (iterable (of str)) – Alphabetically ascending sorted list of items
verbose (bool) –

Returns

Return type

Python dict of dict

nephosem.specutils package¶

Submodules¶

nephosem.specutils.deputils module¶

nephosem.specutils.mxcalc module¶

nephosem.specutils.mxutils module¶

Module contents¶