# graph4nlp.loss¶

## Losses¶

class graph4nlp.loss.CoverageLoss(cover_loss)

The loss function for coverage mechanism.

Parameters

cover_loss (float) – The weight for coverage loss.

Methods

 add_module(name, module) Adds a child module to the current module. apply(fn) Applies fn recursively to every submodule (as returned by .children()) as well as self. bfloat16() Casts all floating point parameters and buffers to bfloat16 datatype. buffers([recurse]) Returns an iterator over module buffers. children() Returns an iterator over immediate children modules. cpu() Moves all model parameters and buffers to the CPU. cuda([device]) Moves all model parameters and buffers to the GPU. double() Casts all floating point parameters and buffers to double datatype. eval() Sets the module in evaluation mode. extra_repr() Set the extra representation of the module float() Casts all floating point parameters and buffers to float datatype. forward(enc_attn_weights, coverage_vectors) The calculation function. get_buffer(target) Returns the buffer given by target if it exists, otherwise throws an error. get_extra_state() Returns any extra state to include in the module’s state_dict. get_parameter(target) Returns the parameter given by target if it exists, otherwise throws an error. get_submodule(target) Returns the submodule given by target if it exists, otherwise throws an error. half() Casts all floating point parameters and buffers to half datatype. load_state_dict(state_dict[, strict]) Copies parameters and buffers from state_dict into this module and its descendants. modules() Returns an iterator over all modules in the network. named_buffers([prefix, recurse]) Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself. named_children() Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself. named_modules([memo, prefix, remove_duplicate]) Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself. named_parameters([prefix, recurse]) Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself. parameters([recurse]) Returns an iterator over module parameters. register_backward_hook(hook) Registers a backward hook on the module. register_buffer(name, tensor[, persistent]) Adds a buffer to the module. register_forward_hook(hook) Registers a forward hook on the module. register_forward_pre_hook(hook) Registers a forward pre-hook on the module. register_full_backward_hook(hook) Registers a backward hook on the module. register_parameter(name, param) Adds a parameter to the module. requires_grad_([requires_grad]) Change if autograd should record operations on parameters in this module. set_extra_state(state) This function is called from load_state_dict() to handle any extra state found within the state_dict. share_memory() See torch.Tensor.share_memory_() state_dict([destination, prefix, keep_vars]) Returns a dictionary containing a whole state of the module. to(*args, **kwargs) Moves and/or casts the parameters and buffers. to_empty(*, device) Moves the parameters and buffers to the specified device without copying storage. train([mode]) Sets the module in training mode. type(dst_type) Casts all parameters and buffers to dst_type. xpu([device]) Moves all model parameters and buffers to the XPU. zero_grad([set_to_none]) Sets gradients of all model parameters to zero.
 __call__
forward(enc_attn_weights, coverage_vectors)

The calculation function.

Parameters
• enc_attn_weights (list[torch.Tensor]) – The list containing all decoding steps’ attention weights. The length should be the decoding step. Each element should be the tensor.

• coverage_vectors (list[torch.Tensor]) – The list containing all coverage vectors in decoding module.

Returns

coverage_loss – The loss.

Return type

torch.Tensor

class graph4nlp.loss.SeqGenerationLoss(ignore_index, use_coverage=False, coverage_weight=0.3)

The general loss for Graph2Seq model.

Parameters
• ignore_index (ignore_index) – The token index which will be ignored during calculation. Usually it is the padding index.

• use_coverage (bool, default=False) – Whether use coverage mechanism. If set True, the we will add the coverage loss.

• coverage_weight (float, default=0.3) – The weight of coverage loss.

Methods

 add_module(name, module) Adds a child module to the current module. apply(fn) Applies fn recursively to every submodule (as returned by .children()) as well as self. bfloat16() Casts all floating point parameters and buffers to bfloat16 datatype. buffers([recurse]) Returns an iterator over module buffers. children() Returns an iterator over immediate children modules. cpu() Moves all model parameters and buffers to the CPU. cuda([device]) Moves all model parameters and buffers to the GPU. double() Casts all floating point parameters and buffers to double datatype. eval() Sets the module in evaluation mode. extra_repr() Set the extra representation of the module float() Casts all floating point parameters and buffers to float datatype. forward(logits, label[, enc_attn_weights, …]) The calculation method. get_buffer(target) Returns the buffer given by target if it exists, otherwise throws an error. get_extra_state() Returns any extra state to include in the module’s state_dict. get_parameter(target) Returns the parameter given by target if it exists, otherwise throws an error. get_submodule(target) Returns the submodule given by target if it exists, otherwise throws an error. half() Casts all floating point parameters and buffers to half datatype. load_state_dict(state_dict[, strict]) Copies parameters and buffers from state_dict into this module and its descendants. modules() Returns an iterator over all modules in the network. named_buffers([prefix, recurse]) Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself. named_children() Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself. named_modules([memo, prefix, remove_duplicate]) Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself. named_parameters([prefix, recurse]) Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself. parameters([recurse]) Returns an iterator over module parameters. register_backward_hook(hook) Registers a backward hook on the module. register_buffer(name, tensor[, persistent]) Adds a buffer to the module. register_forward_hook(hook) Registers a forward hook on the module. register_forward_pre_hook(hook) Registers a forward pre-hook on the module. register_full_backward_hook(hook) Registers a backward hook on the module. register_parameter(name, param) Adds a parameter to the module. requires_grad_([requires_grad]) Change if autograd should record operations on parameters in this module. set_extra_state(state) This function is called from load_state_dict() to handle any extra state found within the state_dict. share_memory() See torch.Tensor.share_memory_() state_dict([destination, prefix, keep_vars]) Returns a dictionary containing a whole state of the module. to(*args, **kwargs) Moves and/or casts the parameters and buffers. to_empty(*, device) Moves the parameters and buffers to the specified device without copying storage. train([mode]) Sets the module in training mode. type(dst_type) Casts all parameters and buffers to dst_type. xpu([device]) Moves all model parameters and buffers to the XPU. zero_grad([set_to_none]) Sets gradients of all model parameters to zero.
 __call__
forward(logits, label, enc_attn_weights=None, coverage_vectors=None)

The calculation method.

Parameters
• logits (torch.Tensor) – The probability with the shape of [batch_size, max_decoder_step, vocab_size]. Note that it is calculated by softmax.

• label (torch.Tensor) – The ground-truth with the shape of [batch_size, max_decoder_step].

• enc_attn_weights (list[torch.Tensor], default=None) – The list containing all decoding steps’ attention weights. The length should be the decoding step. Each element should be the tensor.

• coverage_vectors (list[torch.Tensor], default=None) – The list containing all coverage vectors in decoding module.

Returns
graph2seq_loss: torch.Tensor
class graph4nlp.loss.GeneralLoss(loss_type, weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean', pos_weight=None)

This general loss are backended on the pytorch loss function. The detailed decription for each loss function can be found at:

pytorch loss function <https://pytorch.org/docs/stable/nn.html#loss-functions>

Parameters
loss_type: str

the loss function to select (NLL,BCEWithLogits, MultiLabelMargin,SoftMargin ,CrossEntropy )

NLL loss<https://pytorch.org/docs/stable/_modules/torch/nn/modules/loss.html#NLLLoss> measures the negative log likelihood loss. It is useful to train a classification problem with C classes.

BCEWithLogits loss <https://pytorch.org/docs/stable/_modules/torch/nn/modules/loss.html#BCEWithLogitsLoss> combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoidfollowed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.

BCE Loss<https://pytorch.org/docs/stable/_modules/torch/nn/modules/loss.html#BCELoss> creates a criterion that measures the Binary Cross Entropy between the target and the output.

MultiLabelMargin loss <https://pytorch.org/docs/stable/_modules/torch/nn/modules/loss.html#MultiLabelMarginLoss> creates a criterion that optimizes a multi-class multi-classification hinge loss (margin-based loss) between input $$x$$ (a 2D mini-batch Tensor) and output $$y$$ (which is a 2D Tensor of target class indices).

SoftMargin loss <https://pytorch.org/docs/stable/_modules/torch/nn/modules/loss.html#SoftMarginLoss> creates a criterion that optimizes a two-class classification logistic loss between input tensor $$x$$ and target tensor $$y$$ (containing 1 or -1).

CrossEntropy loss <https://pytorch.org/docs/stable/_modules/torch/nn/modules/loss.html#CrossEntropyLoss>  combines pytorch function nn.LogSoftmax and nn.NLLLoss in one single class. It is useful when training a classification problem with C classes.

weight: Tensor, optional

a manual rescaling weight given to the loss of each batch element. If given, has to be a Tensor of size nbatch. This parameter is not suitable for SoftMargin loss functions.

size_average: bool, optional

By default,the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field size_average is set to False, the losses are instead summed for each minibatch. Ignored when reduce is False. Default: True.

reduce: bool, optional

By default, the losses are averaged or summed over observations for each minibatch depending on size_average. When reduce is False, returns a loss per batch element instead and ignores size_average. Default: True

reduction: string, optional

Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied,

'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed.

Note: size_average and reduce are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction. Default: 'mean'

pos_weight:Tensor, optional

A weight of positive examples. Must be a vector with length equal to the number of classes. This paramter is only suitable for BCEWithLogits loss function.

ignore_index: int, optional

Specifies a target value that is ignored and does not contribute to the input gradient. When size_average is True, the loss is averaged over non-ignored targets. This paramter is only suitable for CrossEntropy loss function.

Methods

 add_module(name, module) Adds a child module to the current module. apply(fn) Applies fn recursively to every submodule (as returned by .children()) as well as self. bfloat16() Casts all floating point parameters and buffers to bfloat16 datatype. buffers([recurse]) Returns an iterator over module buffers. children() Returns an iterator over immediate children modules. cpu() Moves all model parameters and buffers to the CPU. cuda([device]) Moves all model parameters and buffers to the GPU. double() Casts all floating point parameters and buffers to double datatype. eval() Sets the module in evaluation mode. extra_repr() Set the extra representation of the module float() Casts all floating point parameters and buffers to float datatype. forward(input, target) Compute the loss. get_buffer(target) Returns the buffer given by target if it exists, otherwise throws an error. get_extra_state() Returns any extra state to include in the module’s state_dict. get_parameter(target) Returns the parameter given by target if it exists, otherwise throws an error. get_submodule(target) Returns the submodule given by target if it exists, otherwise throws an error. half() Casts all floating point parameters and buffers to half datatype. load_state_dict(state_dict[, strict]) Copies parameters and buffers from state_dict into this module and its descendants. modules() Returns an iterator over all modules in the network. named_buffers([prefix, recurse]) Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself. named_children() Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself. named_modules([memo, prefix, remove_duplicate]) Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself. named_parameters([prefix, recurse]) Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself. parameters([recurse]) Returns an iterator over module parameters. register_backward_hook(hook) Registers a backward hook on the module. register_buffer(name, tensor[, persistent]) Adds a buffer to the module. register_forward_hook(hook) Registers a forward hook on the module. register_forward_pre_hook(hook) Registers a forward pre-hook on the module. register_full_backward_hook(hook) Registers a backward hook on the module. register_parameter(name, param) Adds a parameter to the module. requires_grad_([requires_grad]) Change if autograd should record operations on parameters in this module. set_extra_state(state) This function is called from load_state_dict() to handle any extra state found within the state_dict. share_memory() See torch.Tensor.share_memory_() state_dict([destination, prefix, keep_vars]) Returns a dictionary containing a whole state of the module. to(*args, **kwargs) Moves and/or casts the parameters and buffers. to_empty(*, device) Moves the parameters and buffers to the specified device without copying storage. train([mode]) Sets the module in training mode. type(dst_type) Casts all parameters and buffers to dst_type. xpu([device]) Moves all model parameters and buffers to the XPU. zero_grad([set_to_none]) Sets gradients of all model parameters to zero.
 __call__
forward(input, target)

Compute the loss.

Parameters
NLL loss:
Input: tensor.

$$(N, C)$$ where C = number of classes, or $$(N, C, d_1, d_2, ..., d_K)$$ with $$K \geq 1$$ in the case of K-dimensional loss.

Target: tensor.

$$(N)$$ where each value is $$0 \leq \text{targets}[i] \leq C-1$$, or $$(N, d_1, d_2, ..., d_K)$$ with $$K \geq 1$$ in the case of K-dimensional loss.

Output: scalar.

If reduction is 'none', then the same size as the target: $$(N)$$, or $$(N, d_1, d_2, ..., d_K)$$ with $$K \geq 1$$ in the case of K-dimensional loss.

BCE/BCEWithLogits loss:
Input: Tensor.

$$(N, *)$$ where $$*$$ means, any number of additional dimensions

Target: Tensor.

$$(N, *)$$, same shape as the input

Output: scalar.

If reduction is 'none', then $$(N, *)$$, same shape as input.

MultiLabelMargin loss:
Input: Tensor.

$$(C)$$ or $$(N, C)$$ where N is the batch size and C is the number of classes.

Target: Tensor.

$$(C)$$ or $$(N, C)$$, label targets padded by -1 ensuring same shape as the input.

Output: Scalar.

If reduction is 'none', then $$(N)$$.

SoftMargin loss:
Input: Tensor.

$$(*)$$ where $$*$$ means, any number of additional dimensions

Target: Tensor.

$$(*)$$, same shape as the input

Output: scalar.

If reduction is 'none', then same shape as the input

CrossEntropy:
Input: Tensor.

$$(N, C)$$ where C = number of classes, or $$(N, C, d_1, d_2, ..., d_K)$$ with $$K \geq 1$$ in the case of K-dimensional loss.

Target: Tensor.

$$(N)$$ where each value is $$0 \leq \text{targets}[i] \leq C-1$$, or $$(N, d_1, d_2, ..., d_K)$$ with $$K \geq 1$$ in the case of K-dimensional loss.

Output: scalar.

If reduction is 'none', then the same size as the target: $$(N)$$, or $$(N, d_1, d_2, ..., d_K)$$ with $$K \geq 1$$ in the case of K-dimensional loss.

class graph4nlp.loss.KGLoss(loss_type, size_average=None, reduce=None, reduction='mean', adv_temperature=None, weight=None)

In the state-of-the-art KGE models, loss functions were designed according to various pointwise, pairwise and multi-class approaches. Refers to Loss Functions in Knowledge Graph Embedding Models

Pointwise Loss Function

MSELoss Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input $$x$$ and target $$y$$.

SOFTMARGINLOSS Creates a criterion that optimizes a two-class classification logistic loss between input tensor $$x$$ and target tensor $$y$$ (containing 1 or -1). Tips: The number of positive and negative samples should be about the same,

otherwise it’s easy to overfit

$\text{loss}(x, y) = \sum_i \frac{\log(1 + \exp(-y[i]*x[i]))}{\text{x.nelement}()}$

Pairwise Loss Function

SoftplusLoss refers to the paper OpenKE: An Open Toolkit for Knowledge Embedding

SigmoidLoss refers to the paper OpenKE: An Open Toolkit for Knowledge Embedding

Multi-Class Loss Function

Binary Cross Entropy Loss Creates a criterion that measures the Binary Cross Entropy between the target and the output. Note that the targets $$y$$ should be numbers between 0 and 1.

Methods

 add_module(name, module) Adds a child module to the current module. apply(fn) Applies fn recursively to every submodule (as returned by .children()) as well as self. bfloat16() Casts all floating point parameters and buffers to bfloat16 datatype. buffers([recurse]) Returns an iterator over module buffers. children() Returns an iterator over immediate children modules. cpu() Moves all model parameters and buffers to the CPU. cuda([device]) Moves all model parameters and buffers to the GPU. double() Casts all floating point parameters and buffers to double datatype. eval() Sets the module in evaluation mode. extra_repr() Set the extra representation of the module float() Casts all floating point parameters and buffers to float datatype. forward([input, target, p_score, n_score]) Parameters get_buffer(target) Returns the buffer given by target if it exists, otherwise throws an error. get_extra_state() Returns any extra state to include in the module’s state_dict. get_parameter(target) Returns the parameter given by target if it exists, otherwise throws an error. get_submodule(target) Returns the submodule given by target if it exists, otherwise throws an error. half() Casts all floating point parameters and buffers to half datatype. load_state_dict(state_dict[, strict]) Copies parameters and buffers from state_dict into this module and its descendants. modules() Returns an iterator over all modules in the network. named_buffers([prefix, recurse]) Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself. named_children() Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself. named_modules([memo, prefix, remove_duplicate]) Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself. named_parameters([prefix, recurse]) Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself. parameters([recurse]) Returns an iterator over module parameters. register_backward_hook(hook) Registers a backward hook on the module. register_buffer(name, tensor[, persistent]) Adds a buffer to the module. register_forward_hook(hook) Registers a forward hook on the module. register_forward_pre_hook(hook) Registers a forward pre-hook on the module. register_full_backward_hook(hook) Registers a backward hook on the module. register_parameter(name, param) Adds a parameter to the module. requires_grad_([requires_grad]) Change if autograd should record operations on parameters in this module. set_extra_state(state) This function is called from load_state_dict() to handle any extra state found within the state_dict. share_memory() See torch.Tensor.share_memory_() state_dict([destination, prefix, keep_vars]) Returns a dictionary containing a whole state of the module. to(*args, **kwargs) Moves and/or casts the parameters and buffers. to_empty(*, device) Moves the parameters and buffers to the specified device without copying storage. train([mode]) Sets the module in training mode. type(dst_type) Casts all parameters and buffers to dst_type. xpu([device]) Moves all model parameters and buffers to the XPU. zero_grad([set_to_none]) Sets gradients of all model parameters to zero.
 __call__
forward(input=None, target=None, p_score=None, n_score=None)
Parameters
MSELoss
input: Tensor.

$$(N,*)$$ where $$*$$ means any number of additional dimensions

target: Tensor.

$$(N,*)$$, same shape as the input

output:

If reduction is ‘none’, then same shape as the input

SoftMarginLoss
input: Tensor.

$$(*)$$ where * means, any number of additional dimensions

target: Tensor.

same shape as the input

output: scalar.

If reduction is ‘none’, then same shape as the input

SoftplusLoss
p_score: Tensor.

$$(*)$$ where * means, any number of additional dimensions

n_score: Tensor.

$$(*)$$ where * means, any number of additional dimensions. The dimension could be different from the p_score dimension.

output: scalar.

SigmoidLoss
p_score: Tensor.

$$(*)$$ where * means, any number of additional dimensions

n_score: Tensor.

$$(*)$$ where * means, any number of additional dimensions. The dimension could be different from the p_score dimension.

output: scalar.

BCELoss:
Input: Tensor.

$$(N, *)$$ where $$*$$ means, any number of additional dimensions

Target: Tensor.

$$(N, *)$$, same shape as the input

Output: scalar.

If reduction is 'none', then $$(N, *)$$, same shape as input.