This vignette covers how to work with network objects, edgelists, and
partnership histories in EpiModel network models with custom extension
modules. It assumes familiarity with setting up and running network
models with netsim() and with the extension API. See the Network Modeling for
Epidemics (NME) course materials and the EpiModel Gallery
for background.
For working with nodal attributes and epidemic summary statistics, see the companion vignette Working with Custom Attributes and Summary Statistics in EpiModel.
EpiModel supports two storage modes for networks, controlled by the
tergmLite parameter in control.net():
Full mode (tergmLite = FALSE, the
default): Networks are stored as networkDynamic objects,
which preserve the complete history of edge activations and
deactivations. This allows extraction of the full dynamic network after
simulation. However, networkDynamic objects consume
substantial memory.
tergmLite mode (tergmLite = TRUE):
Networks are stored as lightweight networkLite objects
containing only the current edgelist and nodal attributes. This provides
a 20–50x performance improvement and much lower memory usage, making it
essential for large-scale research models. The trade-off is that
get_network() returns a networkLite (a
snapshot) rather than a full dynamic network history.
Most extension modules work identically under both modes because they access networks through EpiModel’s accessor functions rather than manipulating network objects directly.
Inside a custom module, use the get_network() and
set_network() accessors to work with network objects:
# Get the network for layer 1
nw <- get_network(dat, network = 1)
# After modifying a network, set it back
dat <- set_network(dat, network = 1, nw = nw)These accessors handle the tergmLite vs. full-mode distinction
internally, so your module code works under both storage modes. In full
mode, get_network() returns a networkDynamic
object; in tergmLite mode, it returns a networkLite.
In practice, you rarely need to access network objects directly. Instead, use the edgelist accessor functions described below, which also work correctly under both storage modes.
After a netsim call, extract network objects with
get_network():
sim <- netsim(est, param, init, control)
# Extract the network from simulation 1, network layer 1
nw <- get_network(sim, sim = 1, network = 1)
# Collapse to a static cross-section at time step 50 (full mode only)
nw_at_50 <- get_network(sim, sim = 1, collapse = TRUE, at = 50)In full mode, get_network() returns a
networkDynamic object. In tergmLite mode, it returns a
networkLite object representing the final state. The
collapse and at arguments are only available
in full mode.
Note: Network objects are only saved in the output
when save.network is TRUE in
control.net() (the default for full mode). In tergmLite
mode, there is no network history to save.
The transmission matrix records every transmission event during the simulation:
This returns a data.frame with columns including
at (time step), sus (ID of the newly infected
node), inf (ID of the infecting node), infDur
(duration of infector’s infection), transProb,
actRate, and finalProb. Transmission matrices
are saved by default (save.transmat = TRUE in
control.net()).
Current edgelists are the set of active partnerships at the present time step. These are the most commonly used network data structures inside extension modules.
get_edgelist() returns the current edgelist for a given
network as a two-column matrix of positional IDs:
Each row is an active partnership. Column 1 is the positional ID of the “head” node; column 2 is the “tail” node. This function works identically under both storage modes.
get_edgelists_df() combines edgelists from multiple
network layers into a single data.frame with a
network column:
# All networks
el_all <- get_edgelists_df(dat, networks = NULL)
# Specific networks
el_12 <- get_edgelists_df(dat, networks = c(1, 2))The returned data.frame has columns head,
tail, and network.
The discordant edgelist identifies partnerships where partners have different values of a status attribute—the key data structure for modeling transmission. For example, in an SI model, discordant edges are those where one partner is susceptible and the other is infected:
disc_el <- get_discordant_edgelist(
dat,
status.attr = "status",
head.status = "i",
tail.status = "s"
)The returned data.frame has columns head,
tail, head_status, tail_status,
and network. Both orderings are captured: if node A
(infected) is partnered with node B (susceptible), the edge appears
regardless of which is the “head” vs “tail” in the underlying
network.
See also discord_edgelist() for the original, simpler
version of this function used in built-in models.
EpiModel uses two ways to reference nodes:
By position: Think of it like a row number in a
spreadsheet. dat$attr$active[3] accesses the third node’s
value directly. This is the standard way to look up node information and
is very fast. In a model with 100 nodes, positions range from 1 to 100.
When nodes depart, they may be dropped from the vectors, freeing their
position for new arrivals.
By unique_id: A globally unique
integer attribute assigned to each node at creation and never reused.
Slower to look up, but allows referencing nodes that have already
departed. Used by cumulative edgelists and attribute histories.
Conversion between the two systems is handled internally by EpiModel.
The get_unique_ids() and get_posit_ids()
functions perform the conversion. See
help("unique_id-tools", package = "EpiModel") for
details.
The cumulative edgelist is a historical record of all edges in a network, including the time steps when each edge started and stopped. This allows querying both current and past partnerships—essential for contact tracing, partnership duration analysis, and reachability analysis.
The cumulative edgelist follows a four-step lifecycle. The same data structure is produced once and read at different stages:
control.net(): set
cumulative.edgelist = TRUE (in-memory tracking during the
run) and, if you want to use the result after netsim()
returns, also set save.cumulative.edgelist = TRUE.control$truncate.el.cuml
to decide how much dissolved-edge history to retain. Custom modules that
mutate the network outside the TERGM machinery should also call
update_cumulative_edgelist() after the change.get_cumulative_edgelist() /
get_cumulative_edgelists_df(), or with one of the derived
helpers below (get_partners(),
get_cumulative_degree()).sim$cumulative.edgelist[[s]] directly on the returned
netsim object.The control settings, accessors, and helpers all share the same
column convention: head/tail (or
index/partner) by unique ID,
start/stop time steps (inclusive, with
NA stop for active edges), and
network index for multi-layer models.
Cumulative edgelist tracking must be explicitly enabled in
control.net():
control <- control.net(
type = "SI",
nsims = 1,
nsteps = 100,
cumulative.edgelist = TRUE, # Enable in-memory tracking during the run
truncate.el.cuml = 0, # Drop dissolved edges immediately (default)
save.cumulative.edgelist = TRUE, # Attach the result to the returned netsim
verbose = FALSE
)Without cumulative.edgelist = TRUE, calls to
update_cumulative_edgelist() silently do nothing, and calls
to get_cumulative_edgelist() raise an error.
The truncate.el.cuml control sets the default truncation
passed to every automatic update_cumulative_edgelist()
call. The default (0) keeps only currently active edges,
which is enough for tracking active-edge start times while keeping
memory low. Use Inf to keep full history, or a positive
integer to retain dissolved edges for that many steps after they
ended.
The cumulative edgelist must be updated at each time step after network resimulation. In a custom module or at the end of initialization:
The truncate argument controls memory usage:
truncate = Inf: Keep the full history of all edges (no
removal).truncate = 0 (the default): Keep only currently active
edges. Use this if you only need to track active edge start times.truncate = N: Remove edges that ended more than
N time steps ago. This balances historical depth with
memory use.To update all networks in a multi-layer model:
Inside a custom module, where dat is the live
netsim_dat object:
The returned tibble has four columns:
head: the unique_id of the first
node.tail: the unique_id of the second
node.start: the time step when the edge formed.stop: the last time step the edge was active, or
NA if the edge is still active.An edge with start = 5 and stop = 12
existed from steps 5 through 12, inclusive. When no edges have been
recorded yet, the function still returns a tibble with
these four columns and zero rows. If
cumulative.edgelist = FALSE, the function raises an error
rather than silently returning empty.
The networks argument accepts a vector of network
indices or NULL (all networks). The returned
data.frame adds a network column identifying
which network layer each edge belongs to.
When save.cumulative.edgelist = TRUE, the cumulative
edgelist is attached to the netsim return object as
sim$cumulative.edgelist, a list with one element per
simulation:
Each element is the data.frame produced by
get_cumulative_edgelists_df() (head,
tail, start, stop,
network). Do not call the accessor functions on a processed
netsim object: the live dat$run state they
depend on is dropped during process_out.net(). Note also
that truncate_sim() left-truncates the epidemiological time
series but does not currently trim sim$cumulative.edgelist;
filter it manually by start / stop if you need
a matching time window.
get_partners() extracts the partners of specified nodes
from the cumulative edgelist:
partner_list <- get_partners(
dat,
index_posit_ids,
networks = NULL,
truncate = Inf,
only.active.nodes = FALSE
)Arguments:
dat: the main list object.index_posit_ids: a vector of positional IDs for the
nodes of interest (the “indexes”).networks: which network layers to search
(NULL for all).truncate: only include edges that ended within this
many steps (filters by edge age).only.active.nodes: if TRUE, exclude
partnerships with inactive (departed) nodes.The output is similar to get_cumulative_edgelists_df()
but with columns index and partner (both
containing unique IDs) instead of head and
tail. Note that indexes are specified by positional ID but
the output uses unique IDs, since partners may include nodes that have
already departed.
get_cumulative_degree() counts the number of distinct
partners each node has had over the tracked history:
cum_degree <- get_cumulative_degree(
dat,
index_posit_ids = 1:50,
networks = NULL,
truncate = Inf,
only.active.nodes = FALSE
)This returns a data.frame with columns
index_pid (positional ID) and degree
(cumulative partner count). It wraps get_partners() and
counts unique partners per index.
Reachability functions determine which nodes can be connected through chains of partnerships over a time window. These are useful for outbreak investigation (forward reachability: who could a node have infected?) and source tracing (backward reachability: who could have infected a node?).
These functions operate on cumulative edgelist objects directly—not
on dat—and are typically used for post-simulation analysis.
Most often, the input comes from the saved
sim$cumulative.edgelist[[s]] slot of a netsim
run; see the Converting External networkDynamic
Objects and Deduplicating Across Sources subsections below
for two ways to construct input from other starting points.
el_cuml <- get_cumulative_edgelist(dat, network = 1)
fwd <- get_forward_reachable(
el_cuml,
from_step = 1,
to_step = 52,
nodes = c(10, 25, 42), # NULL for all nodes with edges
dense_optim = "auto"
)Returns a list with two elements:
reached: a named list where each element contains the
set of nodes reachable from each index node through chains of
partnerships active during [from_step, to_step].lengths: a matrix with one row per node and one column
per time step (plus an initial column), showing how the reachable set
grows over time. This allows back-calculating the distance to each
reachable node.Nodes are identified by unique ID in the output (named as
node_ID). Nodes with no edges during the analysis period
are excluded from the output; their forward reachable set is just
themselves (size 1).
Same interface and output structure as
get_forward_reachable(), but follows partnerships backward
in time. This answers the question: which nodes could have reached this
node through a chain of partnerships during the specified period?
networkDynamic ObjectsIf you have a networkDynamic object that was produced
outside of a cumulative.edgelist = TRUE run (for example,
from a non-tergmLite simulation, from a saved RDS file, or
built manually), as_cumulative_edgelist() converts it into
the same data.frame shape the reachability functions
expect:
nd <- get_network(sim, sim = 1, network = 1) # full-mode netsim only
el_cuml <- as_cumulative_edgelist(nd)
fwd <- get_forward_reachable(el_cuml, from_step = 1, to_step = 100)This is the recommended entry point for post-hoc reachability
analysis on simulations that did not enable cumulative-edgelist tracking
up front, provided the run kept the full networkDynamic
history (i.e., not tergmLite).
The reachability functions assume non-overlapping spells per
(head, tail) pair. When concatenating cumulative edgelists
from multiple sources (e.g., separate simulation segments, or several
as_cumulative_edgelist() conversions stitched together),
use dedup_cumulative_edgelist() to merge overlapping spells
into a single row:
Both reachability functions use the progressr package for
progress reporting. Wrap calls in
progressr::with_progress() to display a progress bar:
These functions are efficient for large networks because they operate
on cumulative edgelists (much smaller than full
networkDynamic objects). For sets of more than 5 nodes,
they are faster than iterating over tsna::tPath(). The
dense_optim argument controls an adjacency-list
optimization that helps with dense networks; "auto" enables
it when the number of edges exceeds the number of nodes.