nephosem.specutils package

Submodules

nephosem.specutils.deputils module

nephosem.specutils.deputils.draw_labels(gx, v_labels=None, e_labels=None)
nephosem.specutils.deputils.draw_match(feature, idx=0, figsize=(5.0, 5.0))
nephosem.specutils.deputils.draw_tree(gx, v_label=None, e_label=None, figsize=(5.0, 5.0))

Draw a tree layout of a graph :param gx: :type gx: networkx.DiGraph :param v_label: :type v_label: str :param e_label: :type e_label: str :param figsize: :type figsize: tuple of two values

nephosem.specutils.deputils.get_depth(gx)

Get the depth of a tree

nephosem.specutils.deputils.get_root(gx)

Get the root of a tree

nephosem.specutils.deputils.parse_pattern(target, feature, larrow='<-', rarrow='->')

Parse a string of feature regex to a node dict and an edge dict. e.g target = ‘(w+)/(N)w*’, feature = ‘<(nsubj) w+/(V)w* >[(acomp) (w+)/(JJ)]’ ==>

nephosem.specutils.deputils.tree_match(sentence, macro)

Match the sentence with the subgraph pattern. Abstract of the matching:

  • find root node of the pattern

  • iterate over each sentence node that matches the pattern root

  • recursively match the sentence with the pattern from each matched node

Parameters
  • sentence (SentenceGraph) –

  • macro (MacroGraph) –

nephosem.specutils.mxcalc module

Calculations of Matrices

nephosem.specutils.mxcalc.compute_association(freqMTX, nfreq, cfreq, N=None, meas='ppmi')

Compute association measures matrix.

The matrix provided can be a submatrix with selected rows and/or columns, but nfreq and cfreq must be marginal frequencies from a reference matrix, i.e. with co-occurrence frequencies for the full corpus. N should be the sum of that reference matrix: if it is not provided, it will be computed as the sum of row or column marginal frequencies (whatever is largest).

Parameters
  • freqMTX (TypeTokenMatrix) – Raw co-occurrence frequency matrix.

  • nfreq (Vocab) – Marginal row frequencies of the reference matrix.

  • cfreq (Vocab) – Marginal collocate frequencies of the reference matrix.

  • N (int) – Sum of the reference frequency matrix.

  • meas (str) – Implemented association measures: ‘pmi’, ‘ppmi’, ‘llik’ (log likelihood), ‘chisq’, ‘zscore’, ‘dice’.

Returns

association measure matrix

Return type

TypeTokenMatrix

nephosem.specutils.mxcalc.compute_cosine(measMTX, axis=0)
Parameters

measMTX (TypeTokenMatrix) –

nephosem.specutils.mxcalc.compute_distance(measMTX, axis=0, metric='cosine')

Compute distance matrix from association measure matrix

Parameters
  • measMTX (TypeTokenMatrix) –

  • axis (int) – 0 (row) or 1 (column)

  • metric (str) – ‘cosine’ (default), ‘euclidean’, ‘cityblock’, ‘l1’, ‘l2’, ‘manhattan’ metrics that are valid in sklearn.metrics.pairwise_distances

Returns

Return type

TypeTokenMatrix

nephosem.specutils.mxcalc.compute_ppmi(freqMTX, nfreq=None, cfreq=None, positive=True)

This method is faster than compute_association(). Set positive to False to get pmi values.

nephosem.specutils.mxcalc.compute_simrank(simMTX, reverse=False)

Compute similarity rank matrix.

Parameters
  • simMTX (cosine similarity SquareMatrix) –

  • reverse (boolean, default False) – True: the rank 1 represents the most similar one of that row False the largest rank 1 represents the most similar one of that row

nephosem.specutils.mxcalc.compute_token_vectors(tcWeightMTX, soccMTX, operation='addition', normalization='l1')

Compute token vectors. Build token vectors from a token weights (token-by-context weight matrix) and a second order matrix.

Parameters
  • tcWeightMTX (TypeTokenMatrix) – Token-Context weight matrix.

  • soccMTX (TypeTokenMatrix) – Second order collocate matrix.

  • operation (str) – ‘addition’, ‘multiplication’,’weightedmean’

  • normalization (str) – ‘l1’, ‘l2’, ‘no’

Returns

token vectors

Return type

TypeTokenMatrix

Note

Values for “normalization” are regulated by sklearn.preprocessing.normalize()

nephosem.specutils.mxcalc.compute_token_weights(tcPositionMTX, twMTX)

Compute token-by-context weight matrix. Build token weights from a token-by-context position/boolean matrix and a type-by-context weight matrix.

Parameters
  • tcPositionMTX (TypeTokenMatrix) –

    token-by-context position matrix

    target words

    tokens | |
    | —————

  • twMTX (TypeTokenMatrix) –

    type-by-context weight matrix, i.e. ‘ppmi’ (transposed)

    target words

    context | … |

    features | … x … |
    (types) | … |

Returns

token weight matrix

Return type

TypeTokenMatrix

Notes

This function will transform all explicit zeros in type-by-context weight matrix to implicit zeros. So, if those explicit zeros a important, be careful of them.

nephosem.specutils.mxutils module

nephosem.specutils.mxutils.merge_matrices(matrices)

Merge a list of (TypeTokenMatrix) matrices into one.

Parameters

matrices (list) – A list of matrices (TypeTokenMatrix)

Returns

spmatrix, row_items, col_items

Return type

tuple

nephosem.specutils.mxutils.merge_two_matrices(mtx1, mtx2)

Merge two (TypeTokenMatrix) matrices.

Parameters
  • mtx1 (TypeTokenMatrix) –

  • mtx2 (TypeTokenMatrix) –

Returns

merged matrix

Return type

TypeTokenMatrix

nephosem.specutils.mxutils.transform_dict_to_spmatrix(dict_mtx, rowid2item, colid2item, verbose=False)

Generate sparse.csr_matrix from dict of dict, according to row item to id and col item to id mappings

Parameters
  • dict_mtx (dict of dict) – Matrix represented by a Python dict of dict

  • rowid2item (list of str) – Alphabetically ascending sorted list of items

  • colid2item (list of str) – Alphabetically ascending sorted list of items

  • verbose (bool) –

Returns

Return type

sparse.csr_matrix

nephosem.specutils.mxutils.transform_indices(mtx, new_col_items)
nephosem.specutils.mxutils.transform_nodes_to_matrix(type2toks, colloc_fmt='lemma/pos')

Transform type nodes to token matrix.

Parameters
  • type2toks (dict or iterable) – Type string -> token nodes of this type

  • colloc_fmt (str, default="lemma/pos") – Format for the column names

Returns

tokmx

Return type

TypeTokenMatrix

nephosem.specutils.mxutils.transform_spmatrix_to_dict(spmatrix, rowid2item, colid2item, verbose=False)

Generate dict of dict from sparse.csr_matrix, according to row item to id and col item to id mappings

Parameters
  • spmatrix (csr_matrix) –

  • rowid2item (iterable (of str)) – Alphabetically ascending sorted list of items

  • colid2item (iterable (of str)) – Alphabetically ascending sorted list of items

  • verbose (bool) –

Returns

Return type

Python dict of dict

Module contents