Investigate diffusion through simulation
In this tutorial, we’re going to look at several diffusion processes
and how they operate across various networks. Let’s start off with
creating and visualising a few networks that we might be interested in
here. Let’s take the ison_networkers
dataset from
{manynet}
, and create or generate ring, lattice, random,
scale-free, and small-world versions with the same number of nodes.
# Let's create a new object, "nw", the removes the names of all vertex names
# Hint: We want to use two functions used for reformatting networks, graphs, and matrices
nw <- ____(____(ison_networkers))
# We also want to remove edge direction, so that any pair of nodes with at least
# one directed edge will be connected by an undirected edge in the new network.
nw <- ____(to_unnamed(ison_networkers))
nw <- to_undirected(to_unnamed(ison_networkers))
Creating and visualising different network structures
Now, using the “nw” network from the last section, let’s create or generate ring, lattice, random, scale-free, and small-world versions with the same number of nodes using functions like:
create_ring()
: Creates a ring or chord graph of the given dimensions that loops around and is of a specified width or thickness.create_lattice()
: Creates a graph of the given dimensions with ties to all neighbouring nodesgenerate_random()
: Generates a random network with a particular probability.generate_scalefree()
: Generates a small-world structure following the lattice rewiring model.generate_smallworld()
: Generates a scale-free structure following the preferential attachment model.
# Let's generate a ring structure, "rg", with a width of 2, using the appropriate
# function above
rg <- ____(____, ____)
rg <- create_ring(nw, width = 2)
# Let's generate a lattice structure, "la", using the appropriate function above
la <- ____(____)
la <- create_lattice(nw)
# Let's generate a random structure, "rd", without attributes
rd <- ____(____, ____)
rd <- generate_random(nw, with_attr = FALSE)
# The last two will look similar. For the smallworld structure we call the object "sw"
# and for scalefree, "sf". We will also set the proportion of possible ties to 0.025.
sf <- ____(nw, ____)
sw <- ____(nw, ____)
sf <- generate_scalefree(nw, 0.025)
sw <- generate_smallworld(nw, 0.025)
# Finally, let's plot the respective graphs:
autographr(____) + ggtitle("Networkers") +
autographr(____) + ggtitle("Ring") +
autographr(____) + ggtitle("Lattice") +
autographr(____) + ggtitle("Random") +
autographr(____) + ggtitle("Scale-Free") +
autographr(____) + ggtitle("Small-World")
# Here is the solution:
rg <- create_ring(nw, width = 2)
la <- create_lattice(nw)
rd <- generate_random(nw, with_attr = FALSE)
sf <- generate_scalefree(nw, 0.025)
sw <- generate_smallworld(nw, 0.025)
autographr(nw) + ggtitle("Networkers") +
autographr(rg) + ggtitle("Ring") +
autographr(la) + ggtitle("Lattice") +
autographr(rd) + ggtitle("Random") +
autographr(sf) + ggtitle("Scale-Free") +
autographr(sw) + ggtitle("Small-World")
Examining diffusion across networks of different structure
Now, let’s start off by examining a pretty straight-forward
structure, that of the ring network. To run a basic diffusion model
across this network, simply pass it to play_diffusion()
and
(save and) plot the result.
# Let's call the ring structure from the previous section, "rg", and create a new object
# "rg1" with a seed of 1. Don't forget to plot it!
rg1 <- play_diffusion(____, ____)
____(____)
rg_d <- play_diffusion(rg, seeds = 1)
plot(rg_d)
# visualise the diffusion within the network
autographs(play_diffusion(rg, seeds = 1), node_color = "Infected", layout = "circle") +
ggplot2::guides(colour = guide_legend(""))
The result object, when printed, lists how many of the nodes in the
network, n
, are ‘infected’ (I
) or not
(S
) at each step t
. The plot visualises this,
with the proportion of S
in blue and I
in red.
The bar plot behind shows how many nodes are newly ‘infected’ at each
time point, or the so-called ‘force of infection’ (\(F = \beta I\)).
We can see that there is a pretty constant diffusion across this network, with 2-3 nodes being newly infected at each time-point. The whole network is infected by the eighth time-point.
Varying seed nodes
Since the ring network we constructed is cyclical, then no matter where the ‘infection’ starts, it should diffuse throughout the whole network. To see whether this is true, try choosing the sixteenth (middle) node and see whether the result is any different.
rg_d2 <- play_diffusion(rg, seeds = 16)
plot(rg_d2)
Now what if we seed the network with more than one infected node? Choosing the first four nodes we can see that the process is jump-started, but doesn’t really conclude that much faster.
# Remember we want to see the first four nodes.
plot(play_diffusion(rg, seeds = ____))
rg_d3 <- play_diffusion(rg, seeds = 1:4)
plot(rg_d3)
# visualise the diffusion within the network
autographs(play_diffusion(rg, seeds = 1:4), node_color = "Infected", layout = "circle") +
ggplot2::guides(colour = guide_legend(""))
But what if we seed the network at three different places? Here we
can use node_is_random()
to randomly select some nodes to
seed. Try it with four randomly-selected nodes and see what you get.
# We will be using the node_is_random() within the seed argument to random select
# 4 nodes
plot(play_diffusion(rg, seeds = ____(rg, ____)))
plot(play_diffusion(rg, seeds = node_is_random(rg, 4)))
Where the innovation/disease is optimally seeded to accelerate or decelerate diffusions is a crucial question in network intervention studies.
Varying networks
Now let’s see whether where the infection is seeded matters when the network has a different structure. Here let’s play and plot two diffusion on the lattice network, one with the first node as seed and again one on the last.
plot(play_diffusion(la, seeds = 1))/
plot(play_diffusion(la, seeds = 16))
la %>%
add_node_attribute("color", c(1, rep(0, 14), 2, rep(0, 16))) %>%
autographr(node_color = "color")
# visualise diffusion in lattice graph
autographd(play_diffusion(la, seeds = 16), layout = "grid", keep_isolates = FALSE)
Let’s try one more network type, this time the scale-free network. Play and plot the results over ten steps for node 10, random, maximum, and minimum nodes as seeds.
Similar to the previous examples, we will be using the following functions within the seed argument:
node_is_random()
: Returns a logical vector indicating a random selection of nodes as TRUE.node_is_max()
: Returns logical of which nodes hold the maximum of some measure.node_is_min()
: Returns logical of which nodes hold the minimum of some measure.
sf %>%
as_tidygraph() %>%
mutate(degree = ifelse(node_is_max(node_degree(sf)), "max",
ifelse(node_is_min(node_degree(sf)), "min", "others"))) %>%
autographr(node_color = "degree") + guides(color = "legend") + labs(color = "degree")
plot(play_diffusion(sf, seeds = 10, steps = 10)) /
plot(play_diffusion(sf, seeds = node_is_random(sf), steps = 10)) /
plot(play_diffusion(sf, seeds = node_is_max(node_degree(sf)), steps = 10)) /
plot(play_diffusion(sf, seeds = node_is_min(node_degree(sf)), steps = 10))
# visualise diffusion in scalefree network
autographs(play_diffusion(sf, seeds = node_is_min(node_degree(sf)), steps = 10),
node_color = "Infected") +
ggplot2::guides(colour = guide_legend(""))
autographd(play_diffusion(sf, seeds = 16, steps = 10))
Varying thresholds
So far, we’ve been using a simple diffusion model where each node needs only to be in contact with one infectious individual to be infected. But what if nodes have higher thresholds or even where they vary?
Let’s first start out with our ring network again. Show that whereas a threshold of one will result in complete infection, a threshold of two will not lead to any diffusion process unless there are two seeds and that they are in another nodes neighbourhood.
plot(play_diffusion(rg, seeds = 1, thresholds = 1))/
plot(play_diffusion(rg, seeds = 1, thresholds = 2))/
plot(play_diffusion(rg, seeds = 1:2, thresholds = 2))/
plot(play_diffusion(rg, seeds = c(1,16), thresholds = 2))
In our ring network, all nodes have the same degree. But many typical social networks include some variation in degree. A threshold of 2 would be easy to surpass for particularly well connected nodes, but impossible for pendants. Let’s see what happens when we use this threshold on a scale-free network.
plot(play_diffusion(sf, seeds = 1, thresholds = 2))
That’s because there’s variation in degree in a scale-free network. Let’s try again, but this time we’re going to specify the threshold as a proportion of contacts that should be infected before the node will become infected. Try thresholds of 0.1, 0.25, and 0.5 on two seeds and 10 steps.
plot(play_diffusion(sf, seeds = 1:2, thresholds = ____, steps = ____))/
plot(play_diffusion(sf, seeds = 1:2, thresholds = ____, steps = ____))/
plot(play_diffusion(sf, seeds = 1:2, thresholds = ____, steps = ____))
plot(play_diffusion(sf, seeds = 1:2, thresholds = 0.1, steps = 10))/
plot(play_diffusion(sf, seeds = 1:2, thresholds = 0.25, steps = 10))/
plot(play_diffusion(sf, seeds = 1:2, thresholds = 0.5, steps = 10))
What’s happening here is that the high degree nodes in this scale-free network are obstructing the diffusion process because it is unlikely that many of their branches are already infected.
Lastly, note that it may be that thresholds vary across the network. You could make this depend on some nodal attribute, or just assign some random variation. Try two diffusion models, one where the threshold is 0.1 for the first 10 and 0.25 for the latter group of 22 nodes, and another diffusion where the threshold levels are reversed.
plot(play_diffusion(sf, thresholds = c(rep(____,____), rep(____,____))))/
plot(play_diffusion(sf, thresholds = c(rep(____,____), rep(____,____))))
plot(play_diffusion(sf, thresholds = c(rep(0.1,10), rep(0.25,22))))/
plot(play_diffusion(sf, thresholds = c(rep(0.25,10), rep(0.1,22))))
# visualise the diffusion within the network
autographd(play_diffusion(sf, thresholds = c(rep(0.1,10), rep(0.25,22))))
Since the first ten nodes are the first to join the scale-free network and are preferentially attached by those who follow, they will have a higher degree and only with a lower threshold will we see complete infection.
Investigate epidemiological models
So far we’ve been looking at variations on a pretty straight-forward diffusion process where nodes can only belong to one of two states or ‘compartments’, Susceptible and Infected (the basic SI model). This has been useful, but sometimes what we are interested in, whether disease, innovation, or some other behaviour, has more complicated and probabilistic dynamics. But before we get into that, let’s see how we can play and plot several simulations to see what the range of outcomes might be like.
Running multiple simulations
To do this, we need to use play_diffusions()
(note the
plural). It has all the same arguments as its singular counterpart,
along with a couple of additional parameters to indicate how many
simulations it should run, e.g. times = 50
, whether it
should use strategy = "multisession"
to run the simulations
across multiple cores instead of the default
strategy = "sequential"
, and verbose = TRUE
if
it should inform you of computational progress. Try this out with our
well-mixed random network, 10 steps, 5 times, and with a
transmissibility
parameter set to 0.5 to indicate that in
only 1/2 cases is contagion successful.
# Remember, we are looking at the random network from before, "rd", with
# a transmissibility parameter of 0.5, 5 times, and 10 steps.
plot(play_diffusions(____, transmissibility = ____, times = ____, steps = ____))
plot(play_diffusions(rd, transmissibility = 0.5, times = 5, steps = 10))
Note that in this plot the number of new infections is not plotted because this might vary a bit each time the simulation is run. Instead, the loess line smooths over the varying trajectories and a (hardly distinguishable for this call) grey border to the line represents the standard error. The blue line is the proportion of nodes in the Susceptible compartment, and the red line is the proportion of nodes in the Infected compartment.
SIR models
Let’s start off with an SIR model in which, after
some period in which an infected node is themselves infectious, they
recover and can no longer infect or become reinfected. To add a
recovered component to the model, specify the recovery
argument. Let’s try a rate of recovery of 0.20, which means that it’ll
take an infected node on average 5 steps (days?) to recover.
# Remember, we are still looking at the random network, "rd", with a
# recovery rate of 20 percent.
plot(play_diffusions(____, recovery = ____))
plot(play_diffusions(rd, recovery = 0.2))
What we see in these kinds of models is typically a spike in
infections towards the start, but as these early infections recover and
become immune, then they can provide some herd immunity to those who
remain susceptible. If you get moderately different results each time,
try increasing the number of times
the simulation is run,
which should average out these differences and make the results more
reliable.
plot(play_diffusions(rd, recovery = 0.2, times = 100))
SIRS models
That’s great, but maybe the immunity conferred from having recovered
from the contagion doesn’t last forever. In this kind of model, add an
additional waning
parameter of 0.05. This means that after
twenty steps (on average), a recovered node may lose its recovered
status and become susceptible again. Play a single diffusion so that you
can see what’s going on in a particular run.
plot(play_diffusion(rd, recovery = 0.2, waning = 0.05))
# visualise diffusion within random graph
autographd(play_diffusion(rd, seeds = c(1,10), recovery = 0.2, waning = 0.05))
Depending on your particular simulation, there might be some variation, so let’s run this same diffusion but multiple (100?) times.
plot(play_diffusions(rd, recovery = 0.2, waning = 0.05, times = 100))
SEIR models
Lastly, we’ll consider a compartment for nodes that have been Exposed
but are not yet infectious. This kind of an incubation period is due to
some latency
(\(\sigma\)).
This should also be specified as a proportion, but note that this is
inverted internally. This means that a latency of 0 means that exposure
immediately renders the node infectious. A latency of 0.75 means that it
will take the node approximately 4 days (1/1-0.75 = 1/0.25 = 4) to
become infectious. Play a single diffusion so that you can see what’s
going on in a particular run.
plot(play_diffusion(rd, seeds = 10, latency = 0.25, recovery = 0.2))
# visualise diffusion with latency and recovery
autographd(play_diffusion(rd, seeds = 10, latency = 0.25, recovery = 0.2))
R-nought and herd immunity
So how can we establish the \(R_0\) here?
\(R_0\), or the diffusion’s
reproductive number, is calculated as: \(R_0 =
\min\left(\frac{T}{1/IL}, \bar{k}\right)\) It is measured as the
network’s transmissibility (proportion of susceptible nodes that are
infected at each time period) over the network’s average infection
length (average length of time nodes remain infected).
- Where \(R_0\) > 1, the ‘disease’
will ‘infect’ more and more nodes in the network.
- Where \(R_0\) < 1, the ‘disease’
will not sustain itself and eventually die out.
- Where \(R_0\) = 1, the ‘disease’ will
continue as endemic, if conditions allow.
set.seed(123)
rd_diff <- play_diffusion(rd, seeds = 10, recovery = 0.1)
plot(rd_diff)
# R-nought
network_reproduction(rd_diff)
# Herd Immunity Threshold
network_immunity(rd_diff)
ceiling(network_immunity(rd_diff) * manynet::network_nodes(rd))
The Herd Immunity Threshold or HIT indicates the threshold at which the reduction of susceptible members of the network means that infections will no longer keep increasing, allowing herd immunity to be achieved. In this model, the HIT score of 0.94 indicates that 94% of nodes in the network would need to be vaccinated or otherwise protected to achieve herd immunity. Multiplying the HIT score with the number of nodes in the network (32 in this example) gives the number of nodes that would need to be vaccinated: 31 nodes.
Investigate learning through simulation
Lastly, we’re going to consider a different kind of model: a DeGroot learning model. As you will recall, a network that is strongly connected and aperiodic will converge to a consensus of (any) beliefs entered.
Expectations of convergence and consensus
Let’s try this out on the ison_networkers
dataset
included in the package. First of all, check whether the network is
connected and aperiodic via the following
functions:
is_connected()
: Tests whether network is weakly connected if the network is undirected or strongly connected if directed.is_aperiodic()
: Tests whether network is aperiodic, meaning there is no integer k > 1 that divides the length of every cycle of the graph.
# By default is_connected() will check whether a directed network
# is strongly connected.
is_connected(ison_networkers)
is_aperiodic(ison_networkers)
Playing the DeGroot learning model
Now let’s see whether you are right. We want to see whether some
random distribution of beliefs converges to a consensus in this network
(ison_networkers
). Let’s play the DeGroot learning game on
this network with a vector of random belief probabilities (the same
length as the nodes in the network) drawn from the binomial distribution
with probability 0.25. Create the distribution of
beliefs and graph the network to show where they have
been distributed. Then play the learning model with these beliefs, and
plot the result.
beliefs <- rbinom(network_nodes(____), 1, prob = 0.25)
____ %>% mutate(____ = beliefs) %>% autographr(node_color = "____")
netlearn <- play_learning(____, ____)
plot(____)
beliefs <- rbinom(network_nodes(ison_networkers), 1, prob = 0.25)
ison_networkers %>% mutate(beliefs = beliefs) %>% autographr(node_color = "beliefs")
netlearn <- play_learning(ison_networkers, beliefs)
plot(netlearn)
Each line in this plot represents the belief trajectory of a single node at each step. About a quarter of the nodes begin believing, and the other three quarters do not. Then we can see how responsive these nodes are to the random distribution of beliefs across the network. Some revise their beliefs more significantly than others.