graph4nlp.graph_construction¶

Graph Constructor¶

class graph4nlp.graph_construction.DependencyBasedGraphConstruction(vocab)¶

Dependency-parsing-tree based graph construction class

Parameters

vocab: VocabModel: Vocabulary including all words appeared in graphs.

Methods

add_vocab(g)

Add node tokens appeared in graph g to vocabulary.

parsing(raw_text_data, nlp_processor, …)

Parameters

static_topology(raw_text_data, …[, …])

Graph building method.

forward

add_vocab(g)¶

Add node tokens appeared in graph g to vocabulary.

Parameters

g: GraphData: Graph data-structure.

classmethod parsing(raw_text_data, nlp_processor, processor_args)¶

Parameters

raw_text_data: str
nlp_processor: StanfordCoreNLP
processor_args: dict

Returns

parsed_results: list[dict]

Each sentence is a dict. All sentences are packed by a list. key, value “node_num”: int

the node amount

“node_content”: list[dict]

The list consisting node information. Each node is organized by a dict. ‘token’: str

word token

‘position_id’: int: the word’s position id in original sentence. eg: I am a dog. position_id: 0, 1, 2, 3 # noqa
‘id’: int,: the node token’s id which will be used in GraphData
“sentence_id”: int: The sentence’s id in the whole text.

“graph_content”: list[dict]

The list consisting edge information. Each edge is organized by a dict. “edge_type”: str

The edge type token, eg: ‘nsubj’

‘src’: int: The source node id
‘tgt’: int: The target node id

classmethod static_topology(raw_text_data, nlp_processor, processor_args, merge_strategy, edge_strategy, sequential_link=True, verbose=0)¶

Graph building method.

Parameters

raw_text_data: str or list[list]

Raw text data, it can be multi-sentences. When it is str type, it is the raw text. When it is list[list] type, it is the tokenized token lists.

nlp_processor: StanfordCoreNLP

NLP parsing tools

processor_args: dict

The configure dict for StanfordCoreNLP.annotate

merge_strategy: None or str, option=[None, “tailhead”, “user_define”]

Strategy to merge sub-graphs into one graph None: It will be the default option. We will do as "tailhead". "tailhead": Link the sub-graph i’s tail node with i+1’s head node "user_define": We will give this option to the user. User can override this method to define your merge # noqa

strategy.

edge_strategy: None or str, option=[None, “homogeneous”, “heterogeneous”, “as_node”]

Strategy to process edge. None: It will be the default option. We will do as "homogeneous". "homogeneous": We will drop the edge type information.

If there is a linkage among node i and node j, we will add an edge whose weight # noqa is 1.0. Otherwise there is no edge.

heterogeneous: We will keep the edge type information.: An edge will have type information like n_subj.
as_node: We will view the edge as a graph node.: If there is an edge whose type is k between node i and node j, we will insert a node k into the graph and link node (i, k) and (k, j). # noqa

sequential_link: bool, default=True

Whether to link node tokens sequentially (note that it is bidirectional)

verbose: int, default=0

Whether to output log infors. Set 1 to output more infos.

Returns

——-

joint_graph: GraphData

The merged graph data-structure.

class graph4nlp.graph_construction.ConstituencyBasedGraphConstruction(vocab)¶

Class for constituency graph construction.

…

Attributes

embedding_styles(dict): Specify embedding styles including single_token_item, emb_strategy, num_rnn_layers, bert_model_name and bert_lower_case.
vocab: (set, optional): Vocabulary including all words appeared in graphs.

Methods

topology(raw_text_data, nlp_processor, merge_strategy=None, edge_strategy=None)	Generate graph structure with nlp parser like `CoreNLP` etc.
_construct_static_graph(parsed_object, sub_sentence_id, edge_strategy=None)	Construct a single static graph from a single sentence, to be called by `topology` function.
_graph_connect(nx_graph_list, merge_strategy=None)	Construct a merged graph from a list of graphs, to be called by `topology` function.
embedding(node_attributes, edge_attributes)	Generate node/edge embeddings from node/edge attributes through an embedding layer.
forward(raw_text_data, nlp_parser)	Generate graph topology and embeddings.

classmethod parsing(raw_text_data, nlp_processor, processor_args)¶

Parameters

raw_text_data: str
nlp_processor: StanfordCoreNLP
processor_args: json config for constituency graph construction

classmethod static_topology(raw_text_data, nlp_processor, processor_args, merge_strategy=None, edge_strategy=None, sequential_link=3, top_down=False, prune=2, verbose=True)¶

topology This function generate a graph strcuture from a raw text data.

Parameters

raw_text_datastring

A string to be used to construct a static graph, can be composed of multiple strings

nlp_processorobject

A parser used to parse sentence string to parsing trees like dependency parsing tree or constituency parsing tree

merge_strategyNone or str, option=[None, “tailhead”, “user_define”]

Strategy to merge sub-graphs into one graph None: It will be the default option. We will do as "tailhead". "tailhead": Link the sub-graph i’s tail node with i+1’s head node "user_define": We will give this option to the user. User can override the

method _graph_connnect to define your merge strategy.

edge_strategy: None or str, option=[None, “homogeneous”, “heterogeneous”, “as_node”]

Strategy to process edge. None: It will be the default option. We will do as "homogeneous". "homogeneous": We will drop the edge type information.

If there is a linkage among node i and node j, we will add an edge whose weight is 1.0. Otherwise there is no edge.

heterogeneous: We will keep the edge type information.: An edge will have type information like n_subj. It is not implemented yet.
as_node: We will view the edge as a graph node.: If there is an edge whose type is k between node i and node j, we will insert a node k into the graph and link node (i, k) and (k, j). It is not implemented yet.

sequential_linkint, option=[0,1,2,3]

Strategy to add sequential links between word nodes. 0: Do not add sequential links. 1: Add unidirectional links. 2: Add bidirectional links. 3: Do not add sequential links inside each sentence and add bidirectional links

between adjacent sentences.

top_downbool

If true, edges in constituency tree are from root nodes to leaf nodes. Otherwise, from leaf nodes to root nodes.

pruneint, option=[0,1,2]

Strategies for pruning constituency trees 0: No pruning. 1: Prune pos nodes. 2: Prune nodes with both in-degree and out-degree of 1.

verbosebool

A boolean option to decide whether to print out the graph construction process.

Returns

GraphData: A customized graph data structure

class graph4nlp.graph_construction.IEBasedGraphConstruction(vocab)¶

Information Extraction based graph construction class

Parameters

embedding_style: dict: Specify embedding styles including single_token_item, emb_strategy, num_rnn_layers, bert_model_name and bert_lower_case.
vocab: VocabModel: Vocabulary including all words appeared in graphs.

Methods

add_vocab(g)

Add node tokens appeared in graph g to vocabulary.

parsing(all_sent_triples_list, edge_strategy)

Parameters

static_topology(raw_text_data, …[, verbose])

Graph building method.

forward

add_vocab(g)¶

Add node tokens appeared in graph g to vocabulary.

Parameters

g: GraphData: Graph data-structure.

classmethod parsing(all_sent_triples_list, edge_strategy)¶

Parameters

all_sent_triples_list: list
edge_strategy: str

Returns

parsed_results: dict

parsed_results is an intermediate dict that contains all the information of the constructed IE graph for a piece of raw text input.

parsed_results[‘graph_content’] is a list of dict.

Each dict in parsed_results[‘graph_content’] contains information about a triple (src_ent, rel, tgt_ent).

parsed_results[‘graph_nodes’] contains all nodes in the KG graph.

parsed_results[‘node_num’] is the number of nodes in the KG graph.

classmethod static_topology(raw_text_data, nlp_processor, processor_args, merge_strategy, edge_strategy, verbose=True)¶

Graph building method.

Parameters

raw_text_data: str

Raw text data, it can be multi-sentences.

nlp_processor: StanfordCoreNLP

NLP parsing tools

merge_strategy: None or str, option=[None, “global”, “user_define”]

Strategy to merge sub-graphs into one graph None: Do not add additional nodes and edges.

global: All subjects in extracted triples are connected by a “GLOBAL_NODE”: using a “global” edge

"user_define": We will give this option to the user. User can override this method to define your merge strategy.

edge_strategy: None or str, option=[None, “as_node”]

Strategy to process edge. None: It will be the default option.

Edge information will be preserved in GraphDate.edge_attributes.

as_node: We will view the edge as a graph node.: If there is an edge whose type is k between node i and node j, we will insert a node k into the graph and link node (i, k) and (k, j). The type of original nodes will be set as ent_node, while the type of edge nodes is ``edge_node`.`

Returns

graph: GraphData: The merged graph data-structure.

class graph4nlp.graph_construction.NodeEmbeddingBasedGraphConstruction(**kwargs)¶

Class for node embedding based dynamic graph construction.

Methods

`add_module`(name, module)	Adds a child module to the current module.
`apply`(fn)	Applies `fn` recursively to every submodule (as returned by `.children()`) as well as self.
`bfloat16`()	Casts all floating point parameters and buffers to `bfloat16` datatype.
`buffers`([recurse])	Returns an iterator over module buffers.
`children`()	Returns an iterator over immediate children modules.
`compute_graph_regularization`(adj, node_feat)	Graph graph regularization loss.
`compute_similarity_metric`(node_emb[, node_mask])	Compute similarity metric.
`cpu`()	Moves all model parameters and buffers to the CPU.
`cuda`([device])	Moves all model parameters and buffers to the GPU.
`double`()	Casts all floating point parameters and buffers to `double` datatype.
`dynamic_topology`(graph)	Compute graph topology.
`eval`()	Sets the module in evaluation mode.
`extra_repr`()	Set the extra representation of the module
`float`()	Casts all floating point parameters and buffers to `float` datatype.
`forward`(*input)	Defines the computation performed at every call.
`get_buffer`(target)	Returns the buffer given by `target` if it exists, otherwise throws an error.
`get_extra_state`()	Returns any extra state to include in the module’s state_dict.
`get_parameter`(target)	Returns the parameter given by `target` if it exists, otherwise throws an error.
`get_submodule`(target)	Returns the submodule given by `target` if it exists, otherwise throws an error.
`half`()	Casts all floating point parameters and buffers to `half` datatype.
`init_topology`(raw_text_data[, lower_case, …])	Convert raw text data to the initial node set graph (i.e., no edge information).
`load_state_dict`(state_dict[, strict])	Copies parameters and buffers from `state_dict` into this module and its descendants.
`modules`()	Returns an iterator over all modules in the network.
`named_buffers`([prefix, recurse])	Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
`named_children`()	Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
`named_modules`([memo, prefix, remove_duplicate])	Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
`named_parameters`([prefix, recurse])	Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
`parameters`([recurse])	Returns an iterator over module parameters.
`register_backward_hook`(hook)	Registers a backward hook on the module.
`register_buffer`(name, tensor[, persistent])	Adds a buffer to the module.
`register_forward_hook`(hook)	Registers a forward hook on the module.
`register_forward_pre_hook`(hook)	Registers a forward pre-hook on the module.
`register_full_backward_hook`(hook)	Registers a backward hook on the module.
`register_parameter`(name, param)	Adds a parameter to the module.
`requires_grad_`([requires_grad])	Change if autograd should record operations on parameters in this module.
`set_extra_state`(state)	This function is called from `load_state_dict()` to handle any extra state found within the state_dict.
`share_memory`()	See `torch.Tensor.share_memory_()`
`sparsify_graph`(adj)	Return a sparsified graph of the input graph.
`state_dict`([destination, prefix, keep_vars])	Returns a dictionary containing a whole state of the module.
`to`(args, *kwargs)	Moves and/or casts the parameters and buffers.
`to_empty`(*, device)	Moves the parameters and buffers to the specified device without copying storage.
`train`([mode])	Sets the module in training mode.
`type`(dst_type)	Casts all parameters and buffers to `dst_type`.
`xpu`([device])	Moves all model parameters and buffers to the XPU.
`zero_grad`([set_to_none])	Sets gradients of all model parameters to zero.

__call__

dynamic_topology(graph)¶

Compute graph topology.

Parameters

graphGraphData: The input graph data.

Returns

GraphData: The constructed graph.

classmethod init_topology(raw_text_data, lower_case=True, tokenizer=<function word_tokenize>)¶

Convert raw text data to the initial node set graph (i.e., no edge information).

Parameters

raw_text_datastr or list/tuple of str: The raw text data. When a list/tuple of tokens is provided, no tokenization will be conducted and each token is a node; otherwise, tokenization will be conducted on the input string to get a list of tokens.
lower_caseboolean: Specify whether to lower case the input text, default: True.
tokenizercallable, optional: The tokenization function.

Returns

GraphData: The constructed graph.

class graph4nlp.graph_construction.NodeEmbeddingBasedRefinedGraphConstruction(alpha_fusion, **kwargs)¶

Class for node embedding based refined dynamic graph construction.

Parameters

alpha_fusionfloat: Specify the fusion value for combining initial and learned adjacency matrices.

Methods

`add_module`(name, module)	Adds a child module to the current module.
`apply`(fn)	Applies `fn` recursively to every submodule (as returned by `.children()`) as well as self.
`bfloat16`()	Casts all floating point parameters and buffers to `bfloat16` datatype.
`buffers`([recurse])	Returns an iterator over module buffers.
`children`()	Returns an iterator over immediate children modules.
`compute_graph_regularization`(adj, node_feat)	Graph graph regularization loss.
`compute_similarity_metric`(node_emb[, node_mask])	Compute similarity metric.
`cpu`()	Moves all model parameters and buffers to the CPU.
`cuda`([device])	Moves all model parameters and buffers to the GPU.
`double`()	Casts all floating point parameters and buffers to `double` datatype.
`dynamic_topology`(graph)	Compute graph topology.
`eval`()	Sets the module in evaluation mode.
`extra_repr`()	Set the extra representation of the module
`float`()	Casts all floating point parameters and buffers to `float` datatype.
`forward`(*input)	Defines the computation performed at every call.
`get_buffer`(target)	Returns the buffer given by `target` if it exists, otherwise throws an error.
`get_extra_state`()	Returns any extra state to include in the module’s state_dict.
`get_parameter`(target)	Returns the parameter given by `target` if it exists, otherwise throws an error.
`get_submodule`(target)	Returns the submodule given by `target` if it exists, otherwise throws an error.
`half`()	Casts all floating point parameters and buffers to `half` datatype.
`init_topology`(raw_text_data[, lower_case, …])	Convert raw text data to the initial graph.
`load_state_dict`(state_dict[, strict])	Copies parameters and buffers from `state_dict` into this module and its descendants.
`modules`()	Returns an iterator over all modules in the network.
`named_buffers`([prefix, recurse])	Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
`named_children`()	Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
`named_modules`([memo, prefix, remove_duplicate])	Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
`named_parameters`([prefix, recurse])	Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
`parameters`([recurse])	Returns an iterator over module parameters.
`register_backward_hook`(hook)	Registers a backward hook on the module.
`register_buffer`(name, tensor[, persistent])	Adds a buffer to the module.
`register_forward_hook`(hook)	Registers a forward hook on the module.
`register_forward_pre_hook`(hook)	Registers a forward pre-hook on the module.
`register_full_backward_hook`(hook)	Registers a backward hook on the module.
`register_parameter`(name, param)	Adds a parameter to the module.
`requires_grad_`([requires_grad])	Change if autograd should record operations on parameters in this module.
`set_extra_state`(state)	This function is called from `load_state_dict()` to handle any extra state found within the state_dict.
`share_memory`()	See `torch.Tensor.share_memory_()`
`sparsify_graph`(adj)	Return a sparsified graph of the input graph.
`state_dict`([destination, prefix, keep_vars])	Returns a dictionary containing a whole state of the module.
`to`(args, *kwargs)	Moves and/or casts the parameters and buffers.
`to_empty`(*, device)	Moves the parameters and buffers to the specified device without copying storage.
`train`([mode])	Sets the module in training mode.
`type`(dst_type)	Casts all parameters and buffers to `dst_type`.
`xpu`([device])	Moves all model parameters and buffers to the XPU.
`zero_grad`([set_to_none])	Sets gradients of all model parameters to zero.

__call__

dynamic_topology(graph)¶

Compute graph topology.

Parameters

graphGraphData: The input graph data.

Returns

GraphData: The constructed graph.

classmethod init_topology(raw_text_data, lower_case=True, tokenizer=<function word_tokenize>, nlp_processor=None, processor_args=None, merge_strategy=None, edge_strategy=None, verbose=False, dynamic_init_topology_builder=None, dynamic_init_topology_aux_args=None)¶

Convert raw text data to the initial graph.

Parameters

raw_text_datastr or list/tuple of str: The raw text data. When a list/tuple of tokens is provided, no tokenization will be conducted and each token is a node (used for line graph builder); otherwise, tokenization will be conducted on the input string to get a list of tokens.
lower_caseboolean: Specify whether to lower case the input text, default: True.
tokenizercallable, optional: The tokenization function, default: nltk.tokenize.word_tokenize.
nlp_processor: StanfordCoreNLP, optional: The NLP processor, default: None.
processor_args: dict, optional: The NLP processor arguments, default: None.
merge_strategy: str: Strategy to merge sub-graphs into one graph, depends on specific dynamic_init_topology_builder, default: None.
edge_strategy: str: Strategy to process edge, depends on specific dynamic_init_topology_builder, default: None.
verbose: boolean: verbose flag, default: False.
dynamic_init_topology_builderclass, optional: The initial graph topology builder, default: None.
dynamic_init_topology_aux_argsdict, optional: The auxiliary args for dynamic_init_topology_builder.topology, default: None.

Returns

GraphData: The constructed graph.