Skip to Tutorial Content

Investigate diffusion through simulation

In this tutorial, we’re going to look at several diffusion processes and how they operate across various networks. Let’s start off with creating and visualising a few networks that we might be interested in here. Let’s take the ison_networkers dataset from {manynet}, and create or generate ring, lattice, random, scale-free, and small-world versions with the same number of nodes.

# Let's create a new object, "nw", the removes the names of all vertex names
# Hint: We want to use two functions used for reformatting networks, graphs, and matrices

nw <- ____(____(ison_networkers))
# We also want to remove edge direction, so that any pair of nodes with at least
# one directed edge will be connected by an undirected edge in the new network.

nw <- ____(to_unnamed(ison_networkers))
nw <- to_undirected(to_unnamed(ison_networkers))

Creating and visualising different network structures

Now, using the “nw” network from the last section, let’s create or generate ring, lattice, random, scale-free, and small-world versions with the same number of nodes using functions like:

  • create_ring(): Creates a ring or chord graph of the given dimensions that loops around and is of a specified width or thickness.
  • create_lattice(): Creates a graph of the given dimensions with ties to all neighbouring nodes
  • generate_random(): Generates a random network with a particular probability.
  • generate_scalefree(): Generates a small-world structure following the lattice rewiring model.
  • generate_smallworld(): Generates a scale-free structure following the preferential attachment model.
# Let's generate a ring structure, "rg", with a width of 2, using the appropriate
# function above

rg <- ____(____, ____)
rg <- create_ring(nw, width = 2)
# Let's generate a lattice structure, "la", using the appropriate function above

la <- ____(____)
la <- create_lattice(nw)
# Let's generate a random structure, "rd", without attributes

rd <- ____(____, ____)
rd <- generate_random(nw, with_attr = FALSE)
# The last two will look similar. For the smallworld structure we call the object "sw" 
# and for scalefree, "sf". We will also set the proportion of possible ties to 0.025.

sf <- ____(nw, ____)
sw <- ____(nw, ____)
sf <- generate_scalefree(nw, 0.025)
sw <- generate_smallworld(nw, 0.025)
# Finally, let's plot the respective graphs:

autographr(____) + ggtitle("Networkers") +
autographr(____) + ggtitle("Ring") +
autographr(____) + ggtitle("Lattice") +
autographr(____) + ggtitle("Random") +
autographr(____) + ggtitle("Scale-Free") +
autographr(____) + ggtitle("Small-World")
# Here is the solution:

rg <- create_ring(nw, width = 2)
la <- create_lattice(nw)
rd <- generate_random(nw, with_attr = FALSE)
sf <- generate_scalefree(nw, 0.025)
sw <- generate_smallworld(nw, 0.025)
autographr(nw) + ggtitle("Networkers") +
autographr(rg) + ggtitle("Ring") +
autographr(la) + ggtitle("Lattice") +
autographr(rd) + ggtitle("Random") +
autographr(sf) + ggtitle("Scale-Free") +
autographr(sw) + ggtitle("Small-World")

Examining diffusion across networks of different structure

Now, let’s start off by examining a pretty straight-forward structure, that of the ring network. To run a basic diffusion model across this network, simply pass it to play_diffusion() and (save and) plot the result.

# Let's call the ring structure from the previous section, "rg", and create a new object
# "rg1" with a seed of 1. Don't forget to plot it!

rg1 <- play_diffusion(____, ____)
____(____)
rg_d <- play_diffusion(rg, seeds = 1)
plot(rg_d)

# visualise the diffusion within the network
autographs(play_diffusion(rg, seeds = 1), node_color = "Infected", layout = "circle") +
  ggplot2::guides(colour = guide_legend(""))

The result object, when printed, lists how many of the nodes in the network, n, are ‘infected’ (I) or not (S) at each step t. The plot visualises this, with the proportion of S in blue and I in red. The bar plot behind shows how many nodes are newly ‘infected’ at each time point, or the so-called ‘force of infection’ (\(F = \beta I\)).

We can see that there is a pretty constant diffusion across this network, with 2-3 nodes being newly infected at each time-point. The whole network is infected by the eighth time-point.

Varying seed nodes

Since the ring network we constructed is cyclical, then no matter where the ‘infection’ starts, it should diffuse throughout the whole network. To see whether this is true, try choosing the sixteenth (middle) node and see whether the result is any different.

rg_d2 <- play_diffusion(rg, seeds = 16)
plot(rg_d2)

Now what if we seed the network with more than one infected node? Choosing the first four nodes we can see that the process is jump-started, but doesn’t really conclude that much faster.

# Remember we want to see the first four nodes.

plot(play_diffusion(rg, seeds = ____))
rg_d3 <- play_diffusion(rg, seeds = 1:4)
plot(rg_d3)

# visualise the diffusion within the network
autographs(play_diffusion(rg, seeds = 1:4), node_color = "Infected", layout = "circle") +
  ggplot2::guides(colour = guide_legend(""))

But what if we seed the network at three different places? Here we can use node_is_random() to randomly select some nodes to seed. Try it with four randomly-selected nodes and see what you get.

# We will be using the node_is_random() within the seed argument to random select 
# 4 nodes

plot(play_diffusion(rg, seeds = ____(rg, ____)))
plot(play_diffusion(rg, seeds = node_is_random(rg, 4)))

Where the innovation/disease is optimally seeded to accelerate or decelerate diffusions is a crucial question in network intervention studies.

Varying networks

Now let’s see whether where the infection is seeded matters when the network has a different structure. Here let’s play and plot two diffusion on the lattice network, one with the first node as seed and again one on the last.

plot(play_diffusion(la, seeds = 1))/
plot(play_diffusion(la, seeds = 16))
la %>%
  add_node_attribute("color", c(1, rep(0, 14), 2, rep(0, 16))) %>%
  autographr(node_color = "color")

# visualise diffusion in lattice graph
autographd(play_diffusion(la, seeds = 16), layout = "grid", keep_isolates = FALSE)

Let’s try one more network type, this time the scale-free network. Play and plot the results over ten steps for node 10, random, maximum, and minimum nodes as seeds.

Similar to the previous examples, we will be using the following functions within the seed argument:

  • node_is_random(): Returns a logical vector indicating a random selection of nodes as TRUE.
  • node_is_max(): Returns logical of which nodes hold the maximum of some measure.
  • node_is_min(): Returns logical of which nodes hold the minimum of some measure.
sf %>%
  as_tidygraph() %>%
  mutate(degree = ifelse(node_is_max(node_degree(sf)), "max",
                      ifelse(node_is_min(node_degree(sf)), "min", "others"))) %>%
  autographr(node_color = "degree") + guides(color = "legend") + labs(color = "degree")
plot(play_diffusion(sf, seeds = 10, steps = 10)) / 
plot(play_diffusion(sf, seeds = node_is_random(sf), steps = 10)) /
plot(play_diffusion(sf, seeds = node_is_max(node_degree(sf)), steps = 10)) /
plot(play_diffusion(sf, seeds = node_is_min(node_degree(sf)), steps = 10))

# visualise diffusion in scalefree network
autographs(play_diffusion(sf, seeds = node_is_min(node_degree(sf)), steps = 10),
           node_color = "Infected") +
  ggplot2::guides(colour = guide_legend(""))
autographd(play_diffusion(sf, seeds = 16, steps = 10))

Varying thresholds

So far, we’ve been using a simple diffusion model where each node needs only to be in contact with one infectious individual to be infected. But what if nodes have higher thresholds or even where they vary?

Let’s first start out with our ring network again. Show that whereas a threshold of one will result in complete infection, a threshold of two will not lead to any diffusion process unless there are two seeds and that they are in another nodes neighbourhood.

plot(play_diffusion(rg, seeds = 1, thresholds = 1))/
plot(play_diffusion(rg, seeds = 1, thresholds = 2))/
plot(play_diffusion(rg, seeds = 1:2, thresholds = 2))/
plot(play_diffusion(rg, seeds = c(1,16), thresholds = 2))

In our ring network, all nodes have the same degree. But many typical social networks include some variation in degree. A threshold of 2 would be easy to surpass for particularly well connected nodes, but impossible for pendants. Let’s see what happens when we use this threshold on a scale-free network.

plot(play_diffusion(sf, seeds = 1, thresholds = 2))

That’s because there’s variation in degree in a scale-free network. Let’s try again, but this time we’re going to specify the threshold as a proportion of contacts that should be infected before the node will become infected. Try thresholds of 0.1, 0.25, and 0.5 on two seeds and 10 steps.

plot(play_diffusion(sf, seeds = 1:2, thresholds = ____, steps = ____))/
plot(play_diffusion(sf, seeds = 1:2, thresholds = ____, steps = ____))/
plot(play_diffusion(sf, seeds = 1:2, thresholds = ____, steps = ____))
plot(play_diffusion(sf, seeds = 1:2, thresholds = 0.1, steps = 10))/
plot(play_diffusion(sf, seeds = 1:2, thresholds = 0.25, steps = 10))/
plot(play_diffusion(sf, seeds = 1:2, thresholds = 0.5, steps = 10))

What’s happening here is that the high degree nodes in this scale-free network are obstructing the diffusion process because it is unlikely that many of their branches are already infected.

Lastly, note that it may be that thresholds vary across the network. You could make this depend on some nodal attribute, or just assign some random variation. Try two diffusion models, one where the threshold is 0.1 for the first 10 and 0.25 for the latter group of 22 nodes, and another diffusion where the threshold levels are reversed.

plot(play_diffusion(sf, thresholds = c(rep(____,____), rep(____,____))))/
plot(play_diffusion(sf, thresholds = c(rep(____,____), rep(____,____))))
plot(play_diffusion(sf, thresholds = c(rep(0.1,10), rep(0.25,22))))/
plot(play_diffusion(sf, thresholds = c(rep(0.25,10), rep(0.1,22))))

# visualise the diffusion within the network
autographd(play_diffusion(sf, thresholds = c(rep(0.1,10), rep(0.25,22))))

Since the first ten nodes are the first to join the scale-free network and are preferentially attached by those who follow, they will have a higher degree and only with a lower threshold will we see complete infection.

Investigate epidemiological models

So far we’ve been looking at variations on a pretty straight-forward diffusion process where nodes can only belong to one of two states or ‘compartments’, Susceptible and Infected (the basic SI model). This has been useful, but sometimes what we are interested in, whether disease, innovation, or some other behaviour, has more complicated and probabilistic dynamics. But before we get into that, let’s see how we can play and plot several simulations to see what the range of outcomes might be like.

Running multiple simulations

To do this, we need to use play_diffusions() (note the plural). It has all the same arguments as its singular counterpart, along with a couple of additional parameters to indicate how many simulations it should run, e.g. times = 50, whether it should use strategy = "multisession" to run the simulations across multiple cores instead of the default strategy = "sequential", and verbose = TRUE if it should inform you of computational progress. Try this out with our well-mixed random network, 10 steps, 5 times, and with a transmissibility parameter set to 0.5 to indicate that in only 1/2 cases is contagion successful.

# Remember, we are looking at the random network from before, "rd", with 
# a transmissibility parameter of 0.5, 5 times, and 10 steps.

plot(play_diffusions(____, transmissibility = ____, times = ____, steps = ____))
plot(play_diffusions(rd, transmissibility = 0.5, times = 5, steps = 10))

Note that in this plot the number of new infections is not plotted because this might vary a bit each time the simulation is run. Instead, the loess line smooths over the varying trajectories and a (hardly distinguishable for this call) grey border to the line represents the standard error. The blue line is the proportion of nodes in the Susceptible compartment, and the red line is the proportion of nodes in the Infected compartment.

SIR models

Let’s start off with an SIR model in which, after some period in which an infected node is themselves infectious, they recover and can no longer infect or become reinfected. To add a recovered component to the model, specify the recovery argument. Let’s try a rate of recovery of 0.20, which means that it’ll take an infected node on average 5 steps (days?) to recover.

# Remember, we are still looking at the random network, "rd", with a 
# recovery rate of 20 percent.

plot(play_diffusions(____, recovery = ____))
plot(play_diffusions(rd, recovery = 0.2))

What we see in these kinds of models is typically a spike in infections towards the start, but as these early infections recover and become immune, then they can provide some herd immunity to those who remain susceptible. If you get moderately different results each time, try increasing the number of times the simulation is run, which should average out these differences and make the results more reliable.

plot(play_diffusions(rd, recovery = 0.2, times = 100))

SIRS models

That’s great, but maybe the immunity conferred from having recovered from the contagion doesn’t last forever. In this kind of model, add an additional waning parameter of 0.05. This means that after twenty steps (on average), a recovered node may lose its recovered status and become susceptible again. Play a single diffusion so that you can see what’s going on in a particular run.

plot(play_diffusion(rd, recovery = 0.2, waning = 0.05))

# visualise diffusion within random graph
autographd(play_diffusion(rd, seeds = c(1,10), recovery = 0.2, waning = 0.05))

Depending on your particular simulation, there might be some variation, so let’s run this same diffusion but multiple (100?) times.

plot(play_diffusions(rd, recovery = 0.2, waning = 0.05, times = 100))

SEIR models

Lastly, we’ll consider a compartment for nodes that have been Exposed but are not yet infectious. This kind of an incubation period is due to some latency (\(\sigma\)). This should also be specified as a proportion, but note that this is inverted internally. This means that a latency of 0 means that exposure immediately renders the node infectious. A latency of 0.75 means that it will take the node approximately 4 days (1/1-0.75 = 1/0.25 = 4) to become infectious. Play a single diffusion so that you can see what’s going on in a particular run.

plot(play_diffusion(rd, seeds = 10, latency = 0.25, recovery = 0.2))

# visualise diffusion with latency and recovery
autographd(play_diffusion(rd, seeds = 10, latency = 0.25, recovery = 0.2))

R-nought and herd immunity

So how can we establish the \(R_0\) here?

\(R_0\), or the diffusion’s reproductive number, is calculated as: \(R_0 = \min\left(\frac{T}{1/IL}, \bar{k}\right)\) It is measured as the network’s transmissibility (proportion of susceptible nodes that are infected at each time period) over the network’s average infection length (average length of time nodes remain infected).
- Where \(R_0\) > 1, the ‘disease’ will ‘infect’ more and more nodes in the network.
- Where \(R_0\) < 1, the ‘disease’ will not sustain itself and eventually die out.
- Where \(R_0\) = 1, the ‘disease’ will continue as endemic, if conditions allow.

set.seed(123)
rd_diff <- play_diffusion(rd, seeds = 10, recovery = 0.1)
plot(rd_diff)
# R-nought
network_reproduction(rd_diff)
# Herd Immunity Threshold
network_immunity(rd_diff)
ceiling(network_immunity(rd_diff) * manynet::network_nodes(rd))

The Herd Immunity Threshold or HIT indicates the threshold at which the reduction of susceptible members of the network means that infections will no longer keep increasing, allowing herd immunity to be achieved. In this model, the HIT score of 0.94 indicates that 94% of nodes in the network would need to be vaccinated or otherwise protected to achieve herd immunity. Multiplying the HIT score with the number of nodes in the network (32 in this example) gives the number of nodes that would need to be vaccinated: 31 nodes.

Investigate learning through simulation

Lastly, we’re going to consider a different kind of model: a DeGroot learning model. As you will recall, a network that is strongly connected and aperiodic will converge to a consensus of (any) beliefs entered.

Expectations of convergence and consensus

Let’s try this out on the ison_networkers dataset included in the package. First of all, check whether the network is connected and aperiodic via the following functions:

  • is_connected(): Tests whether network is weakly connected if the network is undirected or strongly connected if directed.
  • is_aperiodic(): Tests whether network is aperiodic, meaning there is no integer k > 1 that divides the length of every cycle of the graph.
# By default is_connected() will check whether a directed network
# is strongly connected.
is_connected(ison_networkers)
is_aperiodic(ison_networkers)

Playing the DeGroot learning model

Now let’s see whether you are right. We want to see whether some random distribution of beliefs converges to a consensus in this network (ison_networkers). Let’s play the DeGroot learning game on this network with a vector of random belief probabilities (the same length as the nodes in the network) drawn from the binomial distribution with probability 0.25. Create the distribution of beliefs and graph the network to show where they have been distributed. Then play the learning model with these beliefs, and plot the result.

beliefs <- rbinom(network_nodes(____), 1, prob = 0.25)
____ %>% mutate(____ = beliefs) %>% autographr(node_color = "____")
netlearn <- play_learning(____, ____)
plot(____)
beliefs <- rbinom(network_nodes(ison_networkers), 1, prob = 0.25)
ison_networkers %>% mutate(beliefs = beliefs) %>% autographr(node_color = "beliefs")
netlearn <- play_learning(ison_networkers, beliefs)
plot(netlearn)

Each line in this plot represents the belief trajectory of a single node at each step. About a quarter of the nodes begin believing, and the other three quarters do not. Then we can see how responsive these nodes are to the random distribution of beliefs across the network. Some revise their beliefs more significantly than others.

Diffusion

by James Hollway