nephosem.specutils package¶
Submodules¶
nephosem.specutils.deputils module¶
- nephosem.specutils.deputils.draw_labels(gx, v_labels=None, e_labels=None)¶
- nephosem.specutils.deputils.draw_match(feature, idx=0, figsize=(5.0, 5.0))¶
- nephosem.specutils.deputils.draw_tree(gx, v_label=None, e_label=None, figsize=(5.0, 5.0))¶
Draw a tree layout of a graph :param gx: :type gx: networkx.DiGraph :param v_label: :type v_label: str :param e_label: :type e_label: str :param figsize: :type figsize: tuple of two values
- nephosem.specutils.deputils.get_depth(gx)¶
Get the depth of a tree
- nephosem.specutils.deputils.get_root(gx)¶
Get the root of a tree
- nephosem.specutils.deputils.parse_pattern(target, feature, larrow='<-', rarrow='->')¶
Parse a string of feature regex to a node dict and an edge dict. e.g target = ‘(w+)/(N)w*’, feature = ‘<(nsubj) w+/(V)w* >[(acomp) (w+)/(JJ)]’ ==>
- nephosem.specutils.deputils.tree_match(sentence, macro)¶
Match the sentence with the subgraph pattern. Abstract of the matching:
find root node of the pattern
iterate over each sentence node that matches the pattern root
recursively match the sentence with the pattern from each matched node
- Parameters
sentence (
SentenceGraph
) –macro (
MacroGraph
) –
nephosem.specutils.mxcalc module¶
Calculations of Matrices
- nephosem.specutils.mxcalc.compute_association(freqMTX, nfreq, cfreq, N=None, meas='ppmi')¶
Compute association measures matrix.
The matrix provided can be a submatrix with selected rows and/or columns, but nfreq and cfreq must be marginal frequencies from a reference matrix, i.e. with co-occurrence frequencies for the full corpus. N should be the sum of that reference matrix: if it is not provided, it will be computed as the sum of row or column marginal frequencies (whatever is largest).
- Parameters
freqMTX (
TypeTokenMatrix
) – Raw co-occurrence frequency matrix.nfreq (
Vocab
) – Marginal row frequencies of the reference matrix.cfreq (
Vocab
) – Marginal collocate frequencies of the reference matrix.N (int) – Sum of the reference frequency matrix.
meas (str) – Implemented association measures: ‘pmi’, ‘ppmi’, ‘llik’ (log likelihood), ‘chisq’, ‘zscore’, ‘dice’.
- Returns
association measure matrix
- Return type
TypeTokenMatrix
- nephosem.specutils.mxcalc.compute_cosine(measMTX, axis=0)¶
- Parameters
measMTX (
TypeTokenMatrix
) –
- nephosem.specutils.mxcalc.compute_distance(measMTX, axis=0, metric='cosine')¶
Compute distance matrix from association measure matrix
- Parameters
measMTX (
TypeTokenMatrix
) –axis (int) – 0 (row) or 1 (column)
metric (str) – ‘cosine’ (default), ‘euclidean’, ‘cityblock’, ‘l1’, ‘l2’, ‘manhattan’ metrics that are valid in sklearn.metrics.pairwise_distances
- Returns
- Return type
TypeTokenMatrix
- nephosem.specutils.mxcalc.compute_ppmi(freqMTX, nfreq=None, cfreq=None, positive=True)¶
This method is faster than compute_association(). Set positive to False to get pmi values.
- nephosem.specutils.mxcalc.compute_simrank(simMTX, reverse=False)¶
Compute similarity rank matrix.
- Parameters
simMTX (cosine similarity SquareMatrix) –
reverse (boolean, default False) – True: the rank 1 represents the most similar one of that row False the largest rank 1 represents the most similar one of that row
- nephosem.specutils.mxcalc.compute_token_vectors(tcWeightMTX, soccMTX, operation='addition', normalization='l1')¶
Compute token vectors. Build token vectors from a token weights (token-by-context weight matrix) and a second order matrix.
- Parameters
tcWeightMTX (
TypeTokenMatrix
) – Token-Context weight matrix.soccMTX (
TypeTokenMatrix
) – Second order collocate matrix.operation (str) – ‘addition’, ‘multiplication’,’weightedmean’
normalization (str) – ‘l1’, ‘l2’, ‘no’
- Returns
token vectors
- Return type
TypeTokenMatrix
Note
Values for “normalization” are regulated by sklearn.preprocessing.normalize()
- nephosem.specutils.mxcalc.compute_token_weights(tcPositionMTX, twMTX)¶
Compute token-by-context weight matrix. Build token weights from a token-by-context position/boolean matrix and a type-by-context weight matrix.
- Parameters
tcPositionMTX (
TypeTokenMatrix
) –- token-by-context position matrix
target words
- tokens | |
- | —————
twMTX (
TypeTokenMatrix
) –- type-by-context weight matrix, i.e. ‘ppmi’ (transposed)
target words
context | … |
- features | … x … |
- (types) | … |
- Returns
token weight matrix
- Return type
TypeTokenMatrix
Notes
This function will transform all explicit zeros in type-by-context weight matrix to implicit zeros. So, if those explicit zeros a important, be careful of them.
nephosem.specutils.mxutils module¶
- nephosem.specutils.mxutils.merge_matrices(matrices)¶
Merge a list of (TypeTokenMatrix) matrices into one.
- Parameters
matrices (list) – A list of matrices (
TypeTokenMatrix
)- Returns
spmatrix, row_items, col_items
- Return type
tuple
- nephosem.specutils.mxutils.merge_two_matrices(mtx1, mtx2)¶
Merge two (TypeTokenMatrix) matrices.
- Parameters
mtx1 (
TypeTokenMatrix
) –mtx2 (
TypeTokenMatrix
) –
- Returns
merged matrix
- Return type
TypeTokenMatrix
- nephosem.specutils.mxutils.transform_dict_to_spmatrix(dict_mtx, rowid2item, colid2item, verbose=False)¶
Generate sparse.csr_matrix from dict of dict, according to row item to id and col item to id mappings
- Parameters
dict_mtx (dict of dict) – Matrix represented by a Python dict of dict
rowid2item (list of str) – Alphabetically ascending sorted list of items
colid2item (list of str) – Alphabetically ascending sorted list of items
verbose (bool) –
- Returns
- Return type
sparse.csr_matrix
- nephosem.specutils.mxutils.transform_indices(mtx, new_col_items)¶
- nephosem.specutils.mxutils.transform_nodes_to_matrix(type2toks, colloc_fmt='lemma/pos')¶
Transform type nodes to token matrix.
- Parameters
type2toks (dict or iterable) – Type string -> token nodes of this type
colloc_fmt (str, default="lemma/pos") – Format for the column names
- Returns
tokmx
- Return type
- nephosem.specutils.mxutils.transform_spmatrix_to_dict(spmatrix, rowid2item, colid2item, verbose=False)¶
Generate dict of dict from sparse.csr_matrix, according to row item to id and col item to id mappings
- Parameters
spmatrix (
csr_matrix
) –rowid2item (iterable (of str)) – Alphabetically ascending sorted list of items
colid2item (iterable (of str)) – Alphabetically ascending sorted list of items
verbose (bool) –
- Returns
- Return type
Python dict of dict