graph4nlp.graph_construction

Graph Constructor

class graph4nlp.graph_construction.DependencyBasedGraphConstruction(vocab)

Dependency-parsing-tree based graph construction class

Parameters
vocab: VocabModel

Vocabulary including all words appeared in graphs.

Methods

add_vocab(g)

Add node tokens appeared in graph g to vocabulary.

parsing(raw_text_data, nlp_processor, …)

Parameters

static_topology(raw_text_data, …[, …])

Graph building method.

forward

add_vocab(g)

Add node tokens appeared in graph g to vocabulary.

Parameters
g: GraphData

Graph data-structure.

classmethod parsing(raw_text_data, nlp_processor, processor_args)
Parameters
raw_text_data: str
nlp_processor: StanfordCoreNLP
processor_args: dict
Returns
parsed_results: list[dict]

Each sentence is a dict. All sentences are packed by a list. key, value “node_num”: int

the node amount

“node_content”: list[dict]

The list consisting node information. Each node is organized by a dict. ‘token’: str

word token

‘position_id’: int

the word’s position id in original sentence. eg: I am a dog. position_id: 0, 1, 2, 3 # noqa

‘id’: int,

the node token’s id which will be used in GraphData

“sentence_id”: int

The sentence’s id in the whole text.

“graph_content”: list[dict]

The list consisting edge information. Each edge is organized by a dict. “edge_type”: str

The edge type token, eg: ‘nsubj’

‘src’: int

The source node id

‘tgt’: int

The target node id

classmethod static_topology(raw_text_data, nlp_processor, processor_args, merge_strategy, edge_strategy, sequential_link=True, verbose=0)

Graph building method.

Parameters
raw_text_data: str or list[list]

Raw text data, it can be multi-sentences. When it is str type, it is the raw text. When it is list[list] type, it is the tokenized token lists.

nlp_processor: StanfordCoreNLP

NLP parsing tools

processor_args: dict

The configure dict for StanfordCoreNLP.annotate

merge_strategy: None or str, option=[None, “tailhead”, “user_define”]

Strategy to merge sub-graphs into one graph None: It will be the default option. We will do as "tailhead". "tailhead": Link the sub-graph i’s tail node with i+1’s head node "user_define": We will give this option to the user. User can override this method to define your merge # noqa

strategy.

edge_strategy: None or str, option=[None, “homogeneous”, “heterogeneous”, “as_node”]

Strategy to process edge. None: It will be the default option. We will do as "homogeneous". "homogeneous": We will drop the edge type information.

If there is a linkage among node i and node j, we will add an edge whose weight # noqa is 1.0. Otherwise there is no edge.

heterogeneous: We will keep the edge type information.

An edge will have type information like n_subj.

as_node: We will view the edge as a graph node.

If there is an edge whose type is k between node i and node j, we will insert a node k into the graph and link node (i, k) and (k, j). # noqa

sequential_link: bool, default=True

Whether to link node tokens sequentially (note that it is bidirectional)

verbose: int, default=0

Whether to output log infors. Set 1 to output more infos.

Returns
——-
joint_graph: GraphData

The merged graph data-structure.

class graph4nlp.graph_construction.ConstituencyBasedGraphConstruction(vocab)

Class for constituency graph construction.

Attributes
embedding_styles(dict)

Specify embedding styles including single_token_item, emb_strategy, num_rnn_layers, bert_model_name and bert_lower_case.

vocab: (set, optional)

Vocabulary including all words appeared in graphs.

Methods

topology(raw_text_data, nlp_processor, merge_strategy=None, edge_strategy=None)

Generate graph structure with nlp parser like CoreNLP etc.

_construct_static_graph(parsed_object, sub_sentence_id, edge_strategy=None)

Construct a single static graph from a single sentence, to be called by topology function.

_graph_connect(nx_graph_list, merge_strategy=None)

Construct a merged graph from a list of graphs, to be called by topology function.

embedding(node_attributes, edge_attributes)

Generate node/edge embeddings from node/edge attributes through an embedding layer.

forward(raw_text_data, nlp_parser)

Generate graph topology and embeddings.

classmethod parsing(raw_text_data, nlp_processor, processor_args)
Parameters
raw_text_data: str
nlp_processor: StanfordCoreNLP
processor_args: json config for constituency graph construction
classmethod static_topology(raw_text_data, nlp_processor, processor_args, merge_strategy=None, edge_strategy=None, sequential_link=3, top_down=False, prune=2, verbose=True)

topology This function generate a graph strcuture from a raw text data.

Parameters
raw_text_datastring

A string to be used to construct a static graph, can be composed of multiple strings

nlp_processorobject

A parser used to parse sentence string to parsing trees like dependency parsing tree or constituency parsing tree

merge_strategyNone or str, option=[None, “tailhead”, “user_define”]

Strategy to merge sub-graphs into one graph None: It will be the default option. We will do as "tailhead". "tailhead": Link the sub-graph i’s tail node with i+1’s head node "user_define": We will give this option to the user. User can override the

method _graph_connnect to define your merge strategy.

edge_strategy: None or str, option=[None, “homogeneous”, “heterogeneous”, “as_node”]

Strategy to process edge. None: It will be the default option. We will do as "homogeneous". "homogeneous": We will drop the edge type information.

If there is a linkage among node i and node j, we will add an edge whose weight is 1.0. Otherwise there is no edge.

heterogeneous: We will keep the edge type information.

An edge will have type information like n_subj. It is not implemented yet.

as_node: We will view the edge as a graph node.

If there is an edge whose type is k between node i and node j, we will insert a node k into the graph and link node (i, k) and (k, j). It is not implemented yet.

sequential_linkint, option=[0,1,2,3]

Strategy to add sequential links between word nodes. 0: Do not add sequential links. 1: Add unidirectional links. 2: Add bidirectional links. 3: Do not add sequential links inside each sentence and add bidirectional links

between adjacent sentences.

top_downbool

If true, edges in constituency tree are from root nodes to leaf nodes. Otherwise, from leaf nodes to root nodes.

pruneint, option=[0,1,2]

Strategies for pruning constituency trees 0: No pruning. 1: Prune pos nodes. 2: Prune nodes with both in-degree and out-degree of 1.

verbosebool

A boolean option to decide whether to print out the graph construction process.

Returns
GraphData

A customized graph data structure

class graph4nlp.graph_construction.IEBasedGraphConstruction(vocab)

Information Extraction based graph construction class

Parameters
embedding_style: dict

Specify embedding styles including single_token_item, emb_strategy, num_rnn_layers, bert_model_name and bert_lower_case.

vocab: VocabModel

Vocabulary including all words appeared in graphs.

Methods

add_vocab(g)

Add node tokens appeared in graph g to vocabulary.

parsing(all_sent_triples_list, edge_strategy)

Parameters

static_topology(raw_text_data, …[, verbose])

Graph building method.

forward

add_vocab(g)

Add node tokens appeared in graph g to vocabulary.

Parameters
g: GraphData

Graph data-structure.

classmethod parsing(all_sent_triples_list, edge_strategy)
Parameters
all_sent_triples_list: list
edge_strategy: str
Returns
parsed_results: dict

parsed_results is an intermediate dict that contains all the information of the constructed IE graph for a piece of raw text input.

parsed_results[‘graph_content’] is a list of dict.

Each dict in parsed_results[‘graph_content’] contains information about a triple (src_ent, rel, tgt_ent).

parsed_results[‘graph_nodes’] contains all nodes in the KG graph.

parsed_results[‘node_num’] is the number of nodes in the KG graph.

classmethod static_topology(raw_text_data, nlp_processor, processor_args, merge_strategy, edge_strategy, verbose=True)

Graph building method.

Parameters
raw_text_data: str

Raw text data, it can be multi-sentences.

nlp_processor: StanfordCoreNLP

NLP parsing tools

merge_strategy: None or str, option=[None, “global”, “user_define”]

Strategy to merge sub-graphs into one graph None: Do not add additional nodes and edges.

global: All subjects in extracted triples are connected by a “GLOBAL_NODE”

using a “global” edge

"user_define": We will give this option to the user. User can override this method to define your merge strategy.

edge_strategy: None or str, option=[None, “as_node”]

Strategy to process edge. None: It will be the default option.

Edge information will be preserved in GraphDate.edge_attributes.

as_node: We will view the edge as a graph node.

If there is an edge whose type is k between node i and node j, we will insert a node k into the graph and link node (i, k) and (k, j). The type of original nodes will be set as ent_node, while the type of edge nodes is ``edge_node`.`

Returns
graph: GraphData

The merged graph data-structure.

class graph4nlp.graph_construction.NodeEmbeddingBasedGraphConstruction(**kwargs)

Class for node embedding based dynamic graph construction.

Methods

add_module(name, module)

Adds a child module to the current module.

apply(fn)

Applies fn recursively to every submodule (as returned by .children()) as well as self.

bfloat16()

Casts all floating point parameters and buffers to bfloat16 datatype.

buffers([recurse])

Returns an iterator over module buffers.

children()

Returns an iterator over immediate children modules.

compute_graph_regularization(adj, node_feat)

Graph graph regularization loss.

compute_similarity_metric(node_emb[, node_mask])

Compute similarity metric.

cpu()

Moves all model parameters and buffers to the CPU.

cuda([device])

Moves all model parameters and buffers to the GPU.

double()

Casts all floating point parameters and buffers to double datatype.

dynamic_topology(graph)

Compute graph topology.

eval()

Sets the module in evaluation mode.

extra_repr()

Set the extra representation of the module

float()

Casts all floating point parameters and buffers to float datatype.

forward(*input)

Defines the computation performed at every call.

get_buffer(target)

Returns the buffer given by target if it exists, otherwise throws an error.

get_extra_state()

Returns any extra state to include in the module’s state_dict.

get_parameter(target)

Returns the parameter given by target if it exists, otherwise throws an error.

get_submodule(target)

Returns the submodule given by target if it exists, otherwise throws an error.

half()

Casts all floating point parameters and buffers to half datatype.

init_topology(raw_text_data[, lower_case, …])

Convert raw text data to the initial node set graph (i.e., no edge information).

load_state_dict(state_dict[, strict])

Copies parameters and buffers from state_dict into this module and its descendants.

modules()

Returns an iterator over all modules in the network.

named_buffers([prefix, recurse])

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix, remove_duplicate])

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse])

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Returns an iterator over module parameters.

register_backward_hook(hook)

Registers a backward hook on the module.

register_buffer(name, tensor[, persistent])

Adds a buffer to the module.

register_forward_hook(hook)

Registers a forward hook on the module.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the module.

register_full_backward_hook(hook)

Registers a backward hook on the module.

register_parameter(name, param)

Adds a parameter to the module.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

set_extra_state(state)

This function is called from load_state_dict() to handle any extra state found within the state_dict.

share_memory()

See torch.Tensor.share_memory_()

sparsify_graph(adj)

Return a sparsified graph of the input graph.

state_dict([destination, prefix, keep_vars])

Returns a dictionary containing a whole state of the module.

to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

to_empty(*, device)

Moves the parameters and buffers to the specified device without copying storage.

train([mode])

Sets the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

xpu([device])

Moves all model parameters and buffers to the XPU.

zero_grad([set_to_none])

Sets gradients of all model parameters to zero.

__call__

dynamic_topology(graph)

Compute graph topology.

Parameters
graphGraphData

The input graph data.

Returns
GraphData

The constructed graph.

classmethod init_topology(raw_text_data, lower_case=True, tokenizer=<function word_tokenize>)

Convert raw text data to the initial node set graph (i.e., no edge information).

Parameters
raw_text_datastr or list/tuple of str

The raw text data. When a list/tuple of tokens is provided, no tokenization will be conducted and each token is a node; otherwise, tokenization will be conducted on the input string to get a list of tokens.

lower_caseboolean

Specify whether to lower case the input text, default: True.

tokenizercallable, optional

The tokenization function.

Returns
GraphData

The constructed graph.

class graph4nlp.graph_construction.NodeEmbeddingBasedRefinedGraphConstruction(alpha_fusion, **kwargs)

Class for node embedding based refined dynamic graph construction.

Parameters
alpha_fusionfloat

Specify the fusion value for combining initial and learned adjacency matrices.

Methods

add_module(name, module)

Adds a child module to the current module.

apply(fn)

Applies fn recursively to every submodule (as returned by .children()) as well as self.

bfloat16()

Casts all floating point parameters and buffers to bfloat16 datatype.

buffers([recurse])

Returns an iterator over module buffers.

children()

Returns an iterator over immediate children modules.

compute_graph_regularization(adj, node_feat)

Graph graph regularization loss.

compute_similarity_metric(node_emb[, node_mask])

Compute similarity metric.

cpu()

Moves all model parameters and buffers to the CPU.

cuda([device])

Moves all model parameters and buffers to the GPU.

double()

Casts all floating point parameters and buffers to double datatype.

dynamic_topology(graph)

Compute graph topology.

eval()

Sets the module in evaluation mode.

extra_repr()

Set the extra representation of the module

float()

Casts all floating point parameters and buffers to float datatype.

forward(*input)

Defines the computation performed at every call.

get_buffer(target)

Returns the buffer given by target if it exists, otherwise throws an error.

get_extra_state()

Returns any extra state to include in the module’s state_dict.

get_parameter(target)

Returns the parameter given by target if it exists, otherwise throws an error.

get_submodule(target)

Returns the submodule given by target if it exists, otherwise throws an error.

half()

Casts all floating point parameters and buffers to half datatype.

init_topology(raw_text_data[, lower_case, …])

Convert raw text data to the initial graph.

load_state_dict(state_dict[, strict])

Copies parameters and buffers from state_dict into this module and its descendants.

modules()

Returns an iterator over all modules in the network.

named_buffers([prefix, recurse])

Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.

named_children()

Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.

named_modules([memo, prefix, remove_duplicate])

Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.

named_parameters([prefix, recurse])

Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.

parameters([recurse])

Returns an iterator over module parameters.

register_backward_hook(hook)

Registers a backward hook on the module.

register_buffer(name, tensor[, persistent])

Adds a buffer to the module.

register_forward_hook(hook)

Registers a forward hook on the module.

register_forward_pre_hook(hook)

Registers a forward pre-hook on the module.

register_full_backward_hook(hook)

Registers a backward hook on the module.

register_parameter(name, param)

Adds a parameter to the module.

requires_grad_([requires_grad])

Change if autograd should record operations on parameters in this module.

set_extra_state(state)

This function is called from load_state_dict() to handle any extra state found within the state_dict.

share_memory()

See torch.Tensor.share_memory_()

sparsify_graph(adj)

Return a sparsified graph of the input graph.

state_dict([destination, prefix, keep_vars])

Returns a dictionary containing a whole state of the module.

to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

to_empty(*, device)

Moves the parameters and buffers to the specified device without copying storage.

train([mode])

Sets the module in training mode.

type(dst_type)

Casts all parameters and buffers to dst_type.

xpu([device])

Moves all model parameters and buffers to the XPU.

zero_grad([set_to_none])

Sets gradients of all model parameters to zero.

__call__

dynamic_topology(graph)

Compute graph topology.

Parameters
graphGraphData

The input graph data.

Returns
GraphData

The constructed graph.

classmethod init_topology(raw_text_data, lower_case=True, tokenizer=<function word_tokenize>, nlp_processor=None, processor_args=None, merge_strategy=None, edge_strategy=None, verbose=False, dynamic_init_topology_builder=None, dynamic_init_topology_aux_args=None)

Convert raw text data to the initial graph.

Parameters
raw_text_datastr or list/tuple of str

The raw text data. When a list/tuple of tokens is provided, no tokenization will be conducted and each token is a node (used for line graph builder); otherwise, tokenization will be conducted on the input string to get a list of tokens.

lower_caseboolean

Specify whether to lower case the input text, default: True.

tokenizercallable, optional

The tokenization function, default: nltk.tokenize.word_tokenize.

nlp_processor: StanfordCoreNLP, optional

The NLP processor, default: None.

processor_args: dict, optional

The NLP processor arguments, default: None.

merge_strategy: str

Strategy to merge sub-graphs into one graph, depends on specific dynamic_init_topology_builder, default: None.

edge_strategy: str

Strategy to process edge, depends on specific dynamic_init_topology_builder, default: None.

verbose: boolean

verbose flag, default: False.

dynamic_init_topology_builderclass, optional

The initial graph topology builder, default: None.

dynamic_init_topology_aux_argsdict, optional

The auxiliary args for dynamic_init_topology_builder.topology, default: None.

Returns
GraphData

The constructed graph.