.. _guide-manipulate:

Manipulating GraphData
===========

After constructing a GraphData instance and adding several nodes and edges to it, the next job is to attach some useful
information to it for further processing. In ``GraphData``, there are two types of information related to the nodes and
edges, namely `features` and `attributes`.

Features
---------

Features are PyTorch tensors designated for each node or edge. To access features, users may call
``GraphData.nodes[node_index].features`` or ``GraphData.edges[index].features``.
These methods return the features of the specified nodes or edges in a dictionary representation, where the keys are the
names of the feature and the values are the corresponding tensors.
Alternatively, if the user wants to access features from the whole-graph level, the ``GraphData.node_features`` and
``GraphData.edge_features`` interfaces do the exact job.

.. code::

    g = GraphData()
    g.add_nodes(10)

    # Note that the first dimension of features represent the number of instances(nodes/edges).
    # Any manipulation to the features should keep the match between the number of instances and the dimension size
    # An invalid example
    g.node_features['node_feat'] = torch.randn((9, 10))
    >>> raise SizeMisatchError

    g.node_features['node_feat'] = torch.randn((10, 10))
    g.node_features['zero'] = torch.zeros(10)
    g.node_features['idx'] = torch.tensor(list(range(10)), dtype=torch.long)
    g.node_features
    >>> {'node_feat': tensor([[-2.2053, -0.9236, -0.4437, -0.7142,  1.5309, -1.5863,  0.6002, -0.6847,
          1.3772,  0.1066],
        [ 0.8875,  1.7674, -0.0354, -0.7681, -2.6256, -1.3399, -2.3798, -0.7418,
          1.2901,  0.6641],
        [-1.5530,  0.9147,  0.0618, -0.0879,  1.0005,  1.2638, -1.4481,  1.2975,
         -0.0304,  0.8707],
        [-0.3448, -0.7484, -1.0194, -0.5096, -0.2596,  0.1056,  1.1560,  0.3463,
         -0.1986,  0.9243],
        [-0.3555, -0.7062, -1.0459,  0.1305, -0.1338,  1.2952,  1.2923, -0.5740,
         -0.5492, -0.2497],
        [-0.7125,  1.2456, -0.2136,  0.8562,  1.8037, -0.0379, -1.6863,  1.2693,
         -0.1980, -0.3153],
        [ 0.4099, -0.8295,  0.6984,  0.4125, -0.8396,  1.8205, -1.1458, -0.0837,
         -0.2388,  0.0552],
        [-1.4068, -1.9334, -0.0367, -1.3297,  1.0705, -0.5606, -0.0458,  0.1358,
          1.3042, -0.8282],
        [ 0.7764,  0.1442,  1.6043,  0.1052,  1.4648, -2.1791,  0.6740,  0.2858,
          0.0482,  0.9058],
        [-1.5054,  0.8992,  0.0893, -1.2325,  0.8888, -1.2222,  2.0569,  0.0218,
          1.5519, -0.8234]]),
        'node_emb': None,
        'zero': tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),
        'idx': tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])}

Note that there are some reserved keys for features, which are initialized to be ``None``. In node features the reserved keys
are `node_feat` and `node_emb`. In edge features the reserved keys are `edge_feat`, `edge_emb` and `edge_weight`.
Users are encouraged to use these keys as common feature names.
This means that the feature dictionary of an empty graph will have these items with the corresponding values being ``None``.

Another thing to notice is that when adding new nodes or edges to a graph whose features are already set(the value is not
None), zero padding will be performed on the newly added instances.

.. code::

    g.add_nodes(1)
    g.node_features     # Zero padding is performed
    >>> {'node_feat': tensor([[-2.2053, -0.9236, -0.4437, -0.7142,  1.5309, -1.5863,  0.6002, -0.6847,
          1.3772,  0.1066],
        [ 0.8875,  1.7674, -0.0354, -0.7681, -2.6256, -1.3399, -2.3798, -0.7418,
          1.2901,  0.6641],
        [-1.5530,  0.9147,  0.0618, -0.0879,  1.0005,  1.2638, -1.4481,  1.2975,
         -0.0304,  0.8707],
        [-0.3448, -0.7484, -1.0194, -0.5096, -0.2596,  0.1056,  1.1560,  0.3463,
         -0.1986,  0.9243],
        [-0.3555, -0.7062, -1.0459,  0.1305, -0.1338,  1.2952,  1.2923, -0.5740,
         -0.5492, -0.2497],
        [-0.7125,  1.2456, -0.2136,  0.8562,  1.8037, -0.0379, -1.6863,  1.2693,
         -0.1980, -0.3153],
        [ 0.4099, -0.8295,  0.6984,  0.4125, -0.8396,  1.8205, -1.1458, -0.0837,
         -0.2388,  0.0552],
        [-1.4068, -1.9334, -0.0367, -1.3297,  1.0705, -0.5606, -0.0458,  0.1358,
          1.3042, -0.8282],
        [ 0.7764,  0.1442,  1.6043,  0.1052,  1.4648, -2.1791,  0.6740,  0.2858,
          0.0482,  0.9058],
        [-1.5054,  0.8992,  0.0893, -1.2325,  0.8888, -1.2222,  2.0569,  0.0218,
          1.5519, -0.8234],
        [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
          0.0000,  0.0000]]),
        'node_emb': None,
        'zero': tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),
        'idx': tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0])}


Attributes
-------------

Another important attached information is `attribute`. Like `features`, `attribute` is related to each node/edge
instance and is basically a list of dictionaries. The list index corresponds to the node/edge index and the dictionary
at each position stands for the corresponding attributes of that instance.
Essentially, `attribute` is designed to make up for the limit of `features` in storing arbitrary objects. The reserved
keys are `node_attr` for node attributes and `edge_attr` for edge attributes.

.. code::

    g = GraphData()
    g.add_nodes(2)  # Add 2 nodes to an empty graph
    g.node_attributes
    >>> [{'node_attr': None}, {'node_attr': None}]
    g.node_attributes[1]['node_attr'] = 'hello'
    g.node_attributes
    >>> [{'node_attr': None}, {'node_attr': 'hello'}]


Features vs. Attributes
----------------

To make it clear, in this subsection we compare the differences between features and attributes in order for users to
better utilize them.

1. Types of storage

``features`` store only the numerical feature objects. In current version these data are PyTorch tensors. The shape of these
tensor data should be consistent with the number of nodes/edges in the graph. Specifically, the first dimension
of the tensor data corresponds to the number of instances. For example, in a graph with 10 nodes and 20 edges, the shape
of any node feature tensor should be [10, *] and [20, *] for any edge feature.

On the other hand, ``attributes`` store arbitrary type of data. The data can be of any type and do not necessarily need
to have a ``shape``.

2. Order of access

Both ``features`` and ``attributes`` have two levels of keys: *names* and *indices*. ``features`` are implemented as a
dictionary where the keys are strings and values are tensors. Therefore, the first level of key is the feature names.
In this way, the second level of keys are just direct access to the corresponding PyTorch tensors.

On the other hand, ``attributes`` are implemented as a list of dictionaries, where the list indices are the node indices.
Therefore, when accessing attributes, users should use the index first.