graph4nlp.graph_construction¶
Graph Constructor¶
-
class
graph4nlp.graph_construction.
DependencyBasedGraphConstruction
(vocab)¶ Dependency-parsing-tree based graph construction class
- Parameters
- vocab: VocabModel
Vocabulary including all words appeared in graphs.
Methods
add_vocab
(g)Add node tokens appeared in graph g to vocabulary.
parsing
(raw_text_data, nlp_processor, …)- Parameters
static_topology
(raw_text_data, …[, …])Graph building method.
forward
-
add_vocab
(g)¶ Add node tokens appeared in graph g to vocabulary.
- Parameters
- g: GraphData
Graph data-structure.
-
classmethod
parsing
(raw_text_data, nlp_processor, processor_args)¶ - Parameters
- raw_text_data: str
- nlp_processor: StanfordCoreNLP
- processor_args: dict
- Returns
- parsed_results: list[dict]
Each sentence is a dict. All sentences are packed by a list. key, value “node_num”: int
the node amount
- “node_content”: list[dict]
The list consisting node information. Each node is organized by a dict. ‘token’: str
word token
- ‘position_id’: int
the word’s position id in original sentence. eg: I am a dog. position_id: 0, 1, 2, 3 # noqa
- ‘id’: int,
the node token’s id which will be used in GraphData
- “sentence_id”: int
The sentence’s id in the whole text.
- “graph_content”: list[dict]
The list consisting edge information. Each edge is organized by a dict. “edge_type”: str
The edge type token, eg: ‘nsubj’
- ‘src’: int
The source node
id
- ‘tgt’: int
The target node
id
-
classmethod
static_topology
(raw_text_data, nlp_processor, processor_args, merge_strategy, edge_strategy, sequential_link=True, verbose=0)¶ Graph building method.
- Parameters
- raw_text_data: str or list[list]
Raw text data, it can be multi-sentences. When it is
str
type, it is the raw text. When it islist[list]
type, it is the tokenized token lists.- nlp_processor: StanfordCoreNLP
NLP parsing tools
- processor_args: dict
The configure dict for StanfordCoreNLP.annotate
- merge_strategy: None or str, option=[None, “tailhead”, “user_define”]
Strategy to merge sub-graphs into one graph
None
: It will be the default option. We will do as"tailhead"
."tailhead"
: Link the sub-graphi
’s tail node withi+1
’s head node"user_define"
: We will give this option to the user. User can override this method to define your merge # noqastrategy.
- edge_strategy: None or str, option=[None, “homogeneous”, “heterogeneous”, “as_node”]
Strategy to process edge.
None
: It will be the default option. We will do as"homogeneous"
."homogeneous"
: We will drop the edge type information.If there is a linkage among node
i
and nodej
, we will add an edge whose weight # noqa is1.0
. Otherwise there is no edge.heterogeneous
: We will keep the edge type information.An edge will have type information like
n_subj
.as_node
: We will view the edge as a graph node.If there is an edge whose type is
k
between nodei
and nodej
, we will insert a nodek
into the graph and link node (i
,k
) and (k
,j
). # noqa
- sequential_link: bool, default=True
Whether to link node tokens sequentially (note that it is bidirectional)
- verbose: int, default=0
Whether to output log infors. Set 1 to output more infos.
- Returns
- ——-
- joint_graph: GraphData
The merged graph data-structure.
-
class
graph4nlp.graph_construction.
ConstituencyBasedGraphConstruction
(vocab)¶ Class for constituency graph construction.
…
- Attributes
- embedding_styles(dict)
Specify embedding styles including
single_token_item
,emb_strategy
,num_rnn_layers
,bert_model_name
andbert_lower_case
.- vocab: (set, optional)
Vocabulary including all words appeared in graphs.
Methods
topology(raw_text_data, nlp_processor, merge_strategy=None, edge_strategy=None)
Generate graph structure with nlp parser like
CoreNLP
etc._construct_static_graph(parsed_object, sub_sentence_id, edge_strategy=None)
Construct a single static graph from a single sentence, to be called by
topology
function._graph_connect(nx_graph_list, merge_strategy=None)
Construct a merged graph from a list of graphs, to be called by
topology
function.embedding(node_attributes, edge_attributes)
Generate node/edge embeddings from node/edge attributes through an embedding layer.
forward(raw_text_data, nlp_parser)
Generate graph topology and embeddings.
-
classmethod
parsing
(raw_text_data, nlp_processor, processor_args)¶ - Parameters
- raw_text_data: str
- nlp_processor: StanfordCoreNLP
- processor_args: json config for constituency graph construction
-
classmethod
static_topology
(raw_text_data, nlp_processor, processor_args, merge_strategy=None, edge_strategy=None, sequential_link=3, top_down=False, prune=2, verbose=True)¶ topology This function generate a graph strcuture from a raw text data.
- Parameters
- raw_text_datastring
A string to be used to construct a static graph, can be composed of multiple strings
- nlp_processorobject
A parser used to parse sentence string to parsing trees like dependency parsing tree or constituency parsing tree
- merge_strategyNone or str, option=[None, “tailhead”, “user_define”]
Strategy to merge sub-graphs into one graph
None
: It will be the default option. We will do as"tailhead"
."tailhead"
: Link the sub-graphi
’s tail node withi+1
’s head node"user_define"
: We will give this option to the user. User can override themethod
_graph_connnect
to define your merge strategy.- edge_strategy: None or str, option=[None, “homogeneous”, “heterogeneous”, “as_node”]
Strategy to process edge.
None
: It will be the default option. We will do as"homogeneous"
."homogeneous"
: We will drop the edge type information.If there is a linkage among node
i
and nodej
, we will add an edge whose weight is1.0
. Otherwise there is no edge.heterogeneous
: We will keep the edge type information.An edge will have type information like
n_subj
. It is not implemented yet.as_node
: We will view the edge as a graph node.If there is an edge whose type is
k
between nodei
and nodej
, we will insert a nodek
into the graph and link node (i
,k
) and (k
,j
). It is not implemented yet.
- sequential_linkint, option=[0,1,2,3]
Strategy to add sequential links between word nodes.
0
: Do not add sequential links.1
: Add unidirectional links.2
: Add bidirectional links.3
: Do not add sequential links inside each sentence and add bidirectional linksbetween adjacent sentences.
- top_downbool
If true, edges in constituency tree are from root nodes to leaf nodes. Otherwise, from leaf nodes to root nodes.
- pruneint, option=[0,1,2]
Strategies for pruning constituency trees
0
: No pruning.1
: Prune pos nodes.2
: Prune nodes with both in-degree and out-degree of 1.- verbosebool
A boolean option to decide whether to print out the graph construction process.
- Returns
- GraphData
A customized graph data structure
-
class
graph4nlp.graph_construction.
IEBasedGraphConstruction
(vocab)¶ Information Extraction based graph construction class
- Parameters
- embedding_style: dict
Specify embedding styles including
single_token_item
,emb_strategy
,num_rnn_layers
,bert_model_name
andbert_lower_case
.- vocab: VocabModel
Vocabulary including all words appeared in graphs.
Methods
add_vocab
(g)Add node tokens appeared in graph g to vocabulary.
parsing
(all_sent_triples_list, edge_strategy)- Parameters
static_topology
(raw_text_data, …[, verbose])Graph building method.
forward
-
add_vocab
(g)¶ Add node tokens appeared in graph g to vocabulary.
- Parameters
- g: GraphData
Graph data-structure.
-
classmethod
parsing
(all_sent_triples_list, edge_strategy)¶ - Parameters
- all_sent_triples_list: list
- edge_strategy: str
- Returns
- parsed_results: dict
parsed_results is an intermediate dict that contains all the information of the constructed IE graph for a piece of raw text input.
parsed_results[‘graph_content’] is a list of dict.
Each dict in parsed_results[‘graph_content’] contains information about a triple (src_ent, rel, tgt_ent).
parsed_results[‘graph_nodes’] contains all nodes in the KG graph.
parsed_results[‘node_num’] is the number of nodes in the KG graph.
-
classmethod
static_topology
(raw_text_data, nlp_processor, processor_args, merge_strategy, edge_strategy, verbose=True)¶ Graph building method.
- Parameters
- raw_text_data: str
Raw text data, it can be multi-sentences.
- nlp_processor: StanfordCoreNLP
NLP parsing tools
- merge_strategy: None or str, option=[None, “global”, “user_define”]
Strategy to merge sub-graphs into one graph
None
: Do not add additional nodes and edges.global
: All subjects in extracted triples are connected by a “GLOBAL_NODE”using a “global” edge
"user_define"
: We will give this option to the user. User can override this method to define your merge strategy.- edge_strategy: None or str, option=[None, “as_node”]
Strategy to process edge.
None
: It will be the default option.Edge information will be preserved in GraphDate.edge_attributes.
as_node
: We will view the edge as a graph node.If there is an edge whose type is
k
between nodei
and nodej
, we will insert a nodek
into the graph and link node (i
,k
) and (k
,j
). Thetype
of original nodes will be set asent_node
, while thetype
of edge nodes is ``edge_node`.`
- Returns
- graph: GraphData
The merged graph data-structure.
-
class
graph4nlp.graph_construction.
NodeEmbeddingBasedGraphConstruction
(**kwargs)¶ Class for node embedding based dynamic graph construction.
Methods
add_module
(name, module)Adds a child module to the current module.
apply
(fn)Applies
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Returns an iterator over module buffers.
children
()Returns an iterator over immediate children modules.
compute_graph_regularization
(adj, node_feat)Graph graph regularization loss.
compute_similarity_metric
(node_emb[, node_mask])Compute similarity metric.
cpu
()Moves all model parameters and buffers to the CPU.
cuda
([device])Moves all model parameters and buffers to the GPU.
double
()Casts all floating point parameters and buffers to
double
datatype.dynamic_topology
(graph)Compute graph topology.
eval
()Sets the module in evaluation mode.
extra_repr
()Set the extra representation of the module
float
()Casts all floating point parameters and buffers to
float
datatype.forward
(*input)Defines the computation performed at every call.
get_buffer
(target)Returns the buffer given by
target
if it exists, otherwise throws an error.get_extra_state
()Returns any extra state to include in the module’s state_dict.
get_parameter
(target)Returns the parameter given by
target
if it exists, otherwise throws an error.get_submodule
(target)Returns the submodule given by
target
if it exists, otherwise throws an error.half
()Casts all floating point parameters and buffers to
half
datatype.init_topology
(raw_text_data[, lower_case, …])Convert raw text data to the initial node set graph (i.e., no edge information).
load_state_dict
(state_dict[, strict])Copies parameters and buffers from
state_dict
into this module and its descendants.modules
()Returns an iterator over all modules in the network.
named_buffers
([prefix, recurse])Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix, remove_duplicate])Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse])Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters
([recurse])Returns an iterator over module parameters.
register_backward_hook
(hook)Registers a backward hook on the module.
register_buffer
(name, tensor[, persistent])Adds a buffer to the module.
register_forward_hook
(hook)Registers a forward hook on the module.
register_forward_pre_hook
(hook)Registers a forward pre-hook on the module.
register_full_backward_hook
(hook)Registers a backward hook on the module.
register_parameter
(name, param)Adds a parameter to the module.
requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state
(state)This function is called from
load_state_dict()
to handle any extra state found within the state_dict.share_memory
()See
torch.Tensor.share_memory_()
sparsify_graph
(adj)Return a sparsified graph of the input graph.
state_dict
([destination, prefix, keep_vars])Returns a dictionary containing a whole state of the module.
to
(*args, **kwargs)Moves and/or casts the parameters and buffers.
to_empty
(*, device)Moves the parameters and buffers to the specified device without copying storage.
train
([mode])Sets the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.xpu
([device])Moves all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Sets gradients of all model parameters to zero.
__call__
-
dynamic_topology
(graph)¶ Compute graph topology.
- Parameters
- graphGraphData
The input graph data.
- Returns
- GraphData
The constructed graph.
-
classmethod
init_topology
(raw_text_data, lower_case=True, tokenizer=<function word_tokenize>)¶ Convert raw text data to the initial node set graph (i.e., no edge information).
- Parameters
- raw_text_datastr or list/tuple of str
The raw text data. When a list/tuple of tokens is provided, no tokenization will be conducted and each token is a node; otherwise, tokenization will be conducted on the input string to get a list of tokens.
- lower_caseboolean
Specify whether to lower case the input text, default:
True
.- tokenizercallable, optional
The tokenization function.
- Returns
- GraphData
The constructed graph.
-
-
class
graph4nlp.graph_construction.
NodeEmbeddingBasedRefinedGraphConstruction
(alpha_fusion, **kwargs)¶ Class for node embedding based refined dynamic graph construction.
- Parameters
- alpha_fusionfloat
Specify the fusion value for combining initial and learned adjacency matrices.
Methods
add_module
(name, module)Adds a child module to the current module.
apply
(fn)Applies
fn
recursively to every submodule (as returned by.children()
) as well as self.bfloat16
()Casts all floating point parameters and buffers to
bfloat16
datatype.buffers
([recurse])Returns an iterator over module buffers.
children
()Returns an iterator over immediate children modules.
compute_graph_regularization
(adj, node_feat)Graph graph regularization loss.
compute_similarity_metric
(node_emb[, node_mask])Compute similarity metric.
cpu
()Moves all model parameters and buffers to the CPU.
cuda
([device])Moves all model parameters and buffers to the GPU.
double
()Casts all floating point parameters and buffers to
double
datatype.dynamic_topology
(graph)Compute graph topology.
eval
()Sets the module in evaluation mode.
extra_repr
()Set the extra representation of the module
float
()Casts all floating point parameters and buffers to
float
datatype.forward
(*input)Defines the computation performed at every call.
get_buffer
(target)Returns the buffer given by
target
if it exists, otherwise throws an error.get_extra_state
()Returns any extra state to include in the module’s state_dict.
get_parameter
(target)Returns the parameter given by
target
if it exists, otherwise throws an error.get_submodule
(target)Returns the submodule given by
target
if it exists, otherwise throws an error.half
()Casts all floating point parameters and buffers to
half
datatype.init_topology
(raw_text_data[, lower_case, …])Convert raw text data to the initial graph.
load_state_dict
(state_dict[, strict])Copies parameters and buffers from
state_dict
into this module and its descendants.modules
()Returns an iterator over all modules in the network.
named_buffers
([prefix, recurse])Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself.
named_children
()Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself.
named_modules
([memo, prefix, remove_duplicate])Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself.
named_parameters
([prefix, recurse])Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself.
parameters
([recurse])Returns an iterator over module parameters.
register_backward_hook
(hook)Registers a backward hook on the module.
register_buffer
(name, tensor[, persistent])Adds a buffer to the module.
register_forward_hook
(hook)Registers a forward hook on the module.
register_forward_pre_hook
(hook)Registers a forward pre-hook on the module.
register_full_backward_hook
(hook)Registers a backward hook on the module.
register_parameter
(name, param)Adds a parameter to the module.
requires_grad_
([requires_grad])Change if autograd should record operations on parameters in this module.
set_extra_state
(state)This function is called from
load_state_dict()
to handle any extra state found within the state_dict.share_memory
()See
torch.Tensor.share_memory_()
sparsify_graph
(adj)Return a sparsified graph of the input graph.
state_dict
([destination, prefix, keep_vars])Returns a dictionary containing a whole state of the module.
to
(*args, **kwargs)Moves and/or casts the parameters and buffers.
to_empty
(*, device)Moves the parameters and buffers to the specified device without copying storage.
train
([mode])Sets the module in training mode.
type
(dst_type)Casts all parameters and buffers to
dst_type
.xpu
([device])Moves all model parameters and buffers to the XPU.
zero_grad
([set_to_none])Sets gradients of all model parameters to zero.
__call__
-
dynamic_topology
(graph)¶ Compute graph topology.
- Parameters
- graphGraphData
The input graph data.
- Returns
- GraphData
The constructed graph.
-
classmethod
init_topology
(raw_text_data, lower_case=True, tokenizer=<function word_tokenize>, nlp_processor=None, processor_args=None, merge_strategy=None, edge_strategy=None, verbose=False, dynamic_init_topology_builder=None, dynamic_init_topology_aux_args=None)¶ Convert raw text data to the initial graph.
- Parameters
- raw_text_datastr or list/tuple of str
The raw text data. When a list/tuple of tokens is provided, no tokenization will be conducted and each token is a node (used for line graph builder); otherwise, tokenization will be conducted on the input string to get a list of tokens.
- lower_caseboolean
Specify whether to lower case the input text, default:
True
.- tokenizercallable, optional
The tokenization function, default:
nltk.tokenize.word_tokenize
.- nlp_processor: StanfordCoreNLP, optional
The NLP processor, default:
None
.- processor_args: dict, optional
The NLP processor arguments, default:
None
.- merge_strategy: str
Strategy to merge sub-graphs into one graph, depends on specific
dynamic_init_topology_builder
, default:None
.- edge_strategy: str
Strategy to process edge, depends on specific
dynamic_init_topology_builder
, default:None
.- verbose: boolean
verbose flag, default:
False
.- dynamic_init_topology_builderclass, optional
The initial graph topology builder, default:
None
.- dynamic_init_topology_aux_argsdict, optional
The auxiliary args for dynamic_init_topology_builder.topology, default:
None
.
- Returns
- GraphData
The constructed graph.