Introduction
Most network analysts want to use network analysis to better
understand empirical networks. {manynet}
offers several
options for creating or generating, importing or coercing networks into
formats that you can use. This tutorial is going to cover several ways
to get data into the package:
- using data from the
{manynet}
or other packages - importing and using data from outside of R
- creating or generating networks using functions in the
{manynet}
package
Finding and using packaged data
As many R packages do, {manynet}
includes a number of
datasets used for teaching and testing the functions contained in the
package. These are sometimes classical network datasets, such as the Southern
Women dataset or Zachary’s Karateka
dataset, and sometimes new data with neat themes, features, or
attributes that make them exemplar teaching or testing data.
Finding the data
To see what data is in the package, you can explore the documentation
available on the website (see here)
or use a command in R to list the data available in the package. The
command, helpfully, is data(package = "manynet")
. Type this
in to the box below to see what datasets are available in the package.
There are buttons to start over, receive any hints/solutions available,
as well as to run the code you have entered to discover its effects. Try
it out now!
data(package = "_____")
data(package = "manynet")
Calling the data
Ok, so we can see that there are a number of very interesting datasets available in this package. How do we access and use this data?
The easiest way to call the data is just to make sure that the
package is loaded using the command library(manynet)
, and
then use the selected dataset as named above. 1 Let’s try
calling ison_adolescents
by first loading the
{manynet}
‘library’ and then just typing
ison_adolescents
to see what happens.
library(______)
ison______
library(manynet)
ison_adolescents
Alternatively, the data can be called directly out of the package like this:
example_name <- manynet::ison_adolescents
, but since we think you will probably want all of the other functions available in{manynet}
at your disposal, you may as well just load the package entirely.↩︎
Formats
All of the network data available in the package is in a special
tbl_graph
format, from the {tidygraph}
package, that makes it compatible, flexible, and transparent. When you
call one of these data objects, some information about the type of
network it is, how many nodes and ties it has, and the first few
examples of nodes and ties is given. 2 Let’s see
whether we can make sense of the main features of this network?
We will see later how we can visualise, describe, and model this network in a variety of ways.
You may have noticed when the package was first loaded that it mentioned that the
print_tbl_graph
method from that package was overwritten. That’s so that we can make some different choices about what and how networks are described.↩︎
Classes
We can do the same with networks from other R packages too. One
challenge there though is that they may not be in the same
tbl_graph
format. A commonly used package is
{network}
, which also includes a few example datasets. Can
you remember how to find out which data are available in this package,
and call the last one in the list?
# You may need to call the data out of this package directly,
# such as:
data(flo, package = "network")
data(package = "_____")
library(_____)
data(_____)
_____
data(package = "network")
data(flo)
flo
This data uses quite a different class to what we encountered above.
It prints out the full adjacency matrix of the Florentine network, but
as a network
-class object (i.e. from the
{network}
package). This is no problem for
{manynet}
(or {migraph}
), since every included
function works the same on any of the compatible classes, but in case
you would like to work with a network in a particular class, or it needs
to be in a particular format for further work (e.g. for use with
{ergm}
), then {manynet}
has you covered for
that too.
Coercing networks between different classes of objects uses the
as_*()
functions. These functions will do their best to
coerce data from the current class of the object to the class named in
the function. Some classes have ‘slots’ or recognition for some kinds of
information that others don’t. For example, coercing a
tbl_graph
into an edgelist will sacrifice all the
information about nodal attributes. Still, we aim for these functions to
be as lossless as possible and welcome feedback that highlights how
these translations can be improved. Let’s see whether we can coerce our
‘flo’ network into a tbl_graph
(‘tidygraph’) class
object.
_____ <- as_t_____(_____)
flo <- as_tidygraph(_____)
flo <- as_tidygraph(flo)
Other packages that include network data include David Schoch’s
eponymous {networkdata}
package. The data in this package are {igraph}
-class
objects. Can you coerce one of the datasets in this package into a
tidygraph format? Into a network format? Into a matrix? Into an
edgelist?
Finding and importing external data
Finding data
Researchers will regularly find themselves needing to import and work with network data from outside of R. There are a great number of networks datasets and data resources available online. 3 See for example:
- UCINET data
- Pajek data
- GML datasets
- UCIrvine Network Data Repository
- KONECT project
- SNAP Stanford Large Network Dataset Collection
Yet these resources contain data in a range of different formats,
some that are specifically made to work with certain software, others
that rely on open standards, and yet others that keep data in a very
standard edgelist (and perhaps nodelist) format in .csv files or
similar. Fortunately, {manynet}
has functions to help with
importing data from such formats too.
Here we keep a necessarily partial list, but we are happy to update it with additional suggestions.↩︎
Importing edgelists
One format most users are long familiar with is Excel. In Excel,
users are typically collecting network data as edgelists, nodelists, or
both. Recall that edgelists tabulate senders/from and receivers/to of
each tie in the first two columns and any other edge- or tie-related
attributes as additional columns. There may optionally also be a
nodelist that tabulates Edgelists are typically the main object to be
imported, and we can import them from an Excel file or a
.csv
file.4 For the sake of this exercise,
we’ll import some data, adols.csv
, that I’ve pre-saved
within the package in the data/
folder of this tutorial.
Try the following code chunk.
adolties <- read_edgelist("data/adols.csv")
flonodes <- read_edgelist("data/flonode.csv")
If you do not specify a particular file name, a helpful popup will open that assists you with locating and importing a file from your operating system. Importing a nodelist of nodal attributes operates very similarly.
Note that if you import from a .csv file, please specify whether the separation value should be commas (
sv = "comma"
) or semi-colons (sv = "semi-colon"
). The function expects comma separated values by default.↩︎
Exporting edgelists
In some cases, users will be faced with having to collect data
themselves, or wish to first manipulate the data in Excel before
importing it, but may be uncertain about the expected format of an
edgelist. Here it may be useful to try exporting one of the built-in
datasets in {manynet}
to see how complete network data
looks. If this is potentially complex, calling
write_edgelist()
without any arguments will export a test
file with a barebones structure that you can overwrite with your own
data.
write______(ison_marvel_relationships, "_____/marvedges.xlsx")
write______(ison_marvel_relationships, "_____/marvnodes.xlsx")
write_edgelist(ison_marvel_relationships, "data/marvedges.xlsx")
write_nodelist(ison_marvel_relationships, "data/marvnodes.xlsx")
Importing other formats
There are other functions here too that help import from or export to common external network data formats. Here are some examples:
read_pajek()
andwrite_pajek()
for importing and exporting .net or .paj filesread_ucinet()
andwrite_ucinet()
for importing and exporting .##h files (.##d files are automatically imported alongside them)
For more information on any of these functions, you can also ask for
help by typing ?read_pajek
in the console. Whereas
read_edgelist()
and read_nodelist()
will
import into a tibble/data frame class, read_pajek()
and
read_ucinet()
will import the network into a tidygraph
format (see above). Of course, any network data that is imported can
quite easily be coerced into any other compatible class. Let’s say we
want to import the adolescents edgelist back in, but we want it in an
igraph format. There are three ways you might do this:
# 1. Separate steps
adols <- read_edgelist("data/adols.csv")
adolsigraph1 <- as_igraph(adols)
adolsigraph1
# 2. Nested steps
adolsigraph2 <- as_igraph(read_edgelist("data/adols.csv"))
adolsigraph2
# 3. Chained steps
adolsigraph3 <- read_edgelist("data/adols.csv") %>% as_igraph()
adolsigraph3
How does it compare to the original?
Working with network data
Reformatting network data
As mentioned above, {manynet}
attempts to retain as much
information as possible when converting objects between different
classes. The presumption is that users should explicitly decide to
reduce or simplify their data. {manynet}
includes functions
for reformatting, transforming (or removing) certain properties of
network objects. Here will introduce a few functions used for
‘reformatting’ networks. We call functions ‘reformatting functions’ if
they change the type but not the order (number of nodes) in the network.
A good example is to_undirected()
. The astute among you may
have noticed that when we imported the adolescents network, it returned
a directed network instead of the original undirected
network. This was a consequence of a heuristic used during the import,
but gives us a good occasion to try out to_undirected()
.
Reimport the data/adols.csv
file, make it an igraph-class
object, and then make it undirected.
read_edgelist("data/adols.csv") %>% as_igraph() %>% to_undirected()
Try this out with other compatible classes of objects, and reformatting other aspects of the network. For example:
to_unnamed()
removes/anonymises all vertex/node labelsto_named()
adds some random (U.S.) childrens’ names, which can be useful for identifying particular nodesto_undirected()
replaces directed ties with an undirected tie (if an arc in either direction is present)to_redirected()
replaces undirected ties with directed ties (arcs) or, if already directed, swaps arcs’ directionto_unweighted()
binarises or dichotomises a network around a particular threshold (by default1
)to_unsigned()
returns just the “positive” or “negative” ties from a signed network, respectivelyto_uniplex()
reduces a multigraph or multiplex network to one with a single set of edges or tiesto_simplex()
removes all loops or self-ties from a complex network
Transforming network data
These functions are similar to the reformatting functions, and are
also named to_*()
, but their operation always changes the
network’s ‘order’ (number of nodes). Good examples of this are
to_mode1()
and to_mode2()
for transforming a
two-mode network into one of its one-mode projections.
to_mode1()
will transform (project) the network to a
one-mode network of shared ties among its first set of nodes, while
to_mode2()
will project the original network to a network
of shared ties among its second set of nodes. For more information on
projection, see for example Knoke et al. (2021). Let’s try this out on a
classic two-mode network, ison_southern_women
. Assign and
name the transformed networks something sensible using e.g.
women <- to_mode...
so that we can continue working with
this data afterwards. To assign and immediately print the result, wrap
the line in parentheses.
ison_southern_women
(s_women <- to_mode1(ison_southern_women))
(s_events <- to_mode2(ison_southern_women))
Grabbing key details from network data
We can ask other questions of this data too. {manynet}
(and {migraph}
) use a simple function naming convention so
that you always know to what it relates.
network_*()
functions usually return one value for the network or graph, whether that be a string likeEvelyn
, logical value likeTRUE
, or some number like3
or-0.003
5node_*()
functions always return a vector of values for the network as long as the number of nodes or vertices in the network (of any mode)tie_*()
functions always return a vector of values for the network as long as the number of ties or edges in the network (of any sign or type)
To find out how many nodes are in the network, use
network_nodes()
. To find out how many nodes are in each
mode, use network_dims()
. To find out the names of those
nodes, use node_names()
. Use such functions to find
out:
- how many nodes are in the
ison_southern_women
network - how many nodes are in each mode
- how many ties are in the network
- what nodal attributes there are in the network
- what the names of the nodes are
network_nodes(ison_southern_women)
network_dims(ison_southern_women)
network_ties(ison_southern_women)
network_node_attributes(ison_southern_women)
node_names(ison_southern_women)
Now use these functions on the two projections you have created to find out a) how many nodes there are in each of these networks, b) what the names of the nodes are, and c) what tie attributes there are in the networks.
network_nodes(s_women)
network_nodes(s_events)
node_names(s_women)
node_names(s_events)
network_tie_attributes(s_women)
network_tie_attributes(s_events)
So we can see that the to_mode*()
functions have created
a network of only one of the modes in the network. The ties in these
projected networks, representing shared connections to nodes of the
other mode, are weighted. This shows up when listing the network’s tie
attributes, or could be retrieved using tie_weights()
, but
can also be checked with the simple logical check
is_weighted()
.
There are a bunch of logical checks for many common properties or
features of networks. For example, one can check whether a network
is_twomode()
, is_directed()
, or
is_labelled()
. Remember, all these to_*()
and
is_*()
functions work on any compatible class; the
to_*()
functions will also attempt to return that same
class of object, making it even easier to manipulate networks into shape
for analysis.
Retrieve the tie weights from your women projection. Find the average (mean) of this vector of tie weights, and the average (mean) tie weight overall.
tie_weights(_____)
mean(tie_weights(_____))
mean(as_matrix(_____))
tie_weights(s_women)
mean(tie_weights(s_women))
mean(as_matrix(s_women))
Ok, so now we know that projection transforms an (unweighted)
two-mode network into a weighted one-mode network and what these weights
represent. Note though that counting the frequency of shared ties to
nodes in the other mode is just one (albeit the default) option for how
ties in the projection are weighted. Other options included in
{manynet}
include the Jaccard index, Rand simple matching
coefficient, Pearson coefficient, and Yule’s Q. These may be of interest
if, for example, overlap should be weighted by participation.
Other transforming functions include:
to_giant()
identifies and returns only the main component of a network.to_no_isolates()
identifies and returns a network including only nodes with at least one tie.to_subgraph()
returns only a subgraph of the network based on e.g. some nodal attribute.to_ties()
returns a network where the ties in the original network become the nodes, and the ties are shared adjacencies to nodes.to_matching()
returns a network in which each node is only tied to one of its previously existing ties such that the network’s cardinality is maximised.
There are a few exceptions to this in the
{manynet}
package, or when attributes of the network are listed (see below).↩︎
Adding data
If you import one or more edgelists and nodelists, it can be useful to bind these together in an igraph, tidygraph, or network class object.
Adding nodal attributes to a given network is relatively
straightforward. {manynet}
offers a more
{igraph}
-like syntax,
e.g. add_node_attribute()
, as well as a more
{dplyr}
-like syntax, e.g. mutate()
, for those
already familiar with these tools in R:
ison_adolescents %>%
mutate(color = "red",
degree = 1:8) %>%
mutate_ties(weight = 1:10)
Note that to use {dplyr}
-like functions like
mutate()
, rename()
, filter()
,
select()
, or join()
on network ties, you will
need to append the function name with _ties
.