Working with Network Objects in EpiModel

Introduction

This vignette covers how to work with network objects, edgelists, and partnership histories in EpiModel network models with custom extension modules. It assumes familiarity with setting up and running network models with netsim() and with the extension API. See the Network Modeling for Epidemics (NME) course materials and the EpiModel Gallery for background.

For working with nodal attributes and epidemic summary statistics, see the companion vignette Working with Custom Attributes and Summary Statistics in EpiModel.

Network Storage Modes

EpiModel supports two storage modes for networks, controlled by the tergmLite parameter in control.net():

Full mode (tergmLite = FALSE, the default): Networks are stored as networkDynamic objects, which preserve the complete history of edge activations and deactivations. This allows extraction of the full dynamic network after simulation. However, networkDynamic objects consume substantial memory.
tergmLite mode (tergmLite = TRUE): Networks are stored as lightweight networkLite objects containing only the current edgelist and nodal attributes. This provides a 20–50x performance improvement and much lower memory usage, making it essential for large-scale research models. The trade-off is that get_network() returns a networkLite (a snapshot) rather than a full dynamic network history.

Most extension modules work identically under both modes because they access networks through EpiModel’s accessor functions rather than manipulating network objects directly.

Accessing Network Objects

During Simulation (Inside Modules)

Inside a custom module, use the get_network() and set_network() accessors to work with network objects:

# Get the network for layer 1
nw <- get_network(dat, network = 1)

# After modifying a network, set it back
dat <- set_network(dat, network = 1, nw = nw)

These accessors handle the tergmLite vs. full-mode distinction internally, so your module code works under both storage modes. In full mode, get_network() returns a networkDynamic object; in tergmLite mode, it returns a networkLite.

In practice, you rarely need to access network objects directly. Instead, use the edgelist accessor functions described below, which also work correctly under both storage modes.

After Simulation

After a netsim call, extract network objects with get_network():

sim <- netsim(est, param, init, control)

# Extract the network from simulation 1, network layer 1
nw <- get_network(sim, sim = 1, network = 1)

# Collapse to a static cross-section at time step 50 (full mode only)
nw_at_50 <- get_network(sim, sim = 1, collapse = TRUE, at = 50)

In full mode, get_network() returns a networkDynamic object. In tergmLite mode, it returns a networkLite object representing the final state. The collapse and at arguments are only available in full mode.

Note: Network objects are only saved in the output when save.network is TRUE in control.net() (the default for full mode). In tergmLite mode, there is no network history to save.

Transmission Matrix

The transmission matrix records every transmission event during the simulation:

transmat <- get_transmat(sim, sim = 1)

This returns a data.frame with columns including at (time step), sus (ID of the newly infected node), inf (ID of the infecting node), infDur (duration of infector’s infection), transProb, actRate, and finalProb. Transmission matrices are saved by default (save.transmat = TRUE in control.net()).

Current Edgelists

Current edgelists are the set of active partnerships at the present time step. These are the most commonly used network data structures inside extension modules.

Single Network

get_edgelist() returns the current edgelist for a given network as a two-column matrix of positional IDs:

el <- get_edgelist(dat, network = 1)

Each row is an active partnership. Column 1 is the positional ID of the “head” node; column 2 is the “tail” node. This function works identically under both storage modes.

Multiple Networks

get_edgelists_df() combines edgelists from multiple network layers into a single data.frame with a network column:

# All networks
el_all <- get_edgelists_df(dat, networks = NULL)

# Specific networks
el_12 <- get_edgelists_df(dat, networks = c(1, 2))

The returned data.frame has columns head, tail, and network.

Discordant Edgelist

The discordant edgelist identifies partnerships where partners have different values of a status attribute—the key data structure for modeling transmission. For example, in an SI model, discordant edges are those where one partner is susceptible and the other is infected:

disc_el <- get_discordant_edgelist(
  dat,
  status.attr = "status",
  head.status = "i",
  tail.status = "s"
)

The returned data.frame has columns head, tail, head_status, tail_status, and network. Both orderings are captured: if node A (infected) is partnered with node B (susceptible), the edge appears regardless of which is the “head” vs “tail” in the underlying network.

See also discord_edgelist() for the original, simpler version of this function used in built-in models.

Positional Indexing and Unique IDs

EpiModel uses two ways to reference nodes:

By position: Think of it like a row number in a spreadsheet. dat$attr$active[3] accesses the third node’s value directly. This is the standard way to look up node information and is very fast. In a model with 100 nodes, positions range from 1 to 100. When nodes depart, they may be dropped from the vectors, freeing their position for new arrivals.
By unique_id: A globally unique integer attribute assigned to each node at creation and never reused. Slower to look up, but allows referencing nodes that have already departed. Used by cumulative edgelists and attribute histories.

Conversion between the two systems is handled internally by EpiModel. The get_unique_ids() and get_posit_ids() functions perform the conversion. See help("unique_id-tools", package = "EpiModel") for details.

Cumulative Edgelist

The cumulative edgelist is a historical record of all edges in a network, including the time steps when each edge started and stopped. This allows querying both current and past partnerships—essential for contact tracing, partnership duration analysis, and reachability analysis.

Lifecycle

The cumulative edgelist follows a four-step lifecycle. The same data structure is produced once and read at different stages:

Enable in control.net(): set cumulative.edgelist = TRUE (in-memory tracking during the run) and, if you want to use the result after netsim() returns, also set save.cumulative.edgelist = TRUE.
Track during the run: the built-in network-resimulation module updates the cumulative edgelist once per network at every time step, using control$truncate.el.cuml to decide how much dissolved-edge history to retain. Custom modules that mutate the network outside the TERGM machinery should also call update_cumulative_edgelist() after the change.
Read during the run (inside a module) with get_cumulative_edgelist() / get_cumulative_edgelists_df(), or with one of the derived helpers below (get_partners(), get_cumulative_degree()).
Read after the run by accessing sim$cumulative.edgelist[[s]] directly on the returned netsim object.

The control settings, accessors, and helpers all share the same column convention: head/tail (or index/partner) by unique ID, start/stop time steps (inclusive, with NA stop for active edges), and network index for multi-layer models.

Enabling Cumulative Edgelists

Cumulative edgelist tracking must be explicitly enabled in control.net():

control <- control.net(
  type = "SI",
  nsims = 1,
  nsteps = 100,
  cumulative.edgelist = TRUE,       # Enable in-memory tracking during the run
  truncate.el.cuml = 0,             # Drop dissolved edges immediately (default)
  save.cumulative.edgelist = TRUE,  # Attach the result to the returned netsim
  verbose = FALSE
)

Without cumulative.edgelist = TRUE, calls to update_cumulative_edgelist() silently do nothing, and calls to get_cumulative_edgelist() raise an error.

The truncate.el.cuml control sets the default truncation passed to every automatic update_cumulative_edgelist() call. The default (0) keeps only currently active edges, which is enough for tracking active-edge start times while keeping memory low. Use Inf to keep full history, or a positive integer to retain dissolved edges for that many steps after they ended.

Updating the Cumulative Edgelist

The cumulative edgelist must be updated at each time step after network resimulation. In a custom module or at the end of initialization:

dat <- update_cumulative_edgelist(dat, network = 1, truncate = Inf)

The truncate argument controls memory usage:

truncate = Inf: Keep the full history of all edges (no removal).
truncate = 0 (the default): Keep only currently active edges. Use this if you only need to track active edge start times.
truncate = N: Remove edges that ended more than N time steps ago. This balances historical depth with memory use.

To update all networks in a multi-layer model:

for (n_network in seq_len(dat$num.nw)) {
  dat <- update_cumulative_edgelist(dat, n_network, truncate = 100)
}

Accessing the Cumulative Edgelist

During Simulation: Specific Network

Inside a custom module, where dat is the live netsim_dat object:

el_cuml <- get_cumulative_edgelist(dat, network = 1)

The returned tibble has four columns:

head: the unique_id of the first node.
tail: the unique_id of the second node.
start: the time step when the edge formed.
stop: the last time step the edge was active, or NA if the edge is still active.

An edge with start = 5 and stop = 12 existed from steps 5 through 12, inclusive. When no edges have been recorded yet, the function still returns a tibble with these four columns and zero rows. If cumulative.edgelist = FALSE, the function raises an error rather than silently returning empty.

During Simulation: Multiple Networks

el_cumls <- get_cumulative_edgelists_df(dat, networks = NULL)

The networks argument accepts a vector of network indices or NULL (all networks). The returned data.frame adds a network column identifying which network layer each edge belongs to.

After Simulation

When save.cumulative.edgelist = TRUE, the cumulative edgelist is attached to the netsim return object as sim$cumulative.edgelist, a list with one element per simulation:

el_cuml <- sim$cumulative.edgelist[[1]]

Each element is the data.frame produced by get_cumulative_edgelists_df() (head, tail, start, stop, network). Do not call the accessor functions on a processed netsim object: the live dat$run state they depend on is dropped during process_out.net(). Note also that truncate_sim() left-truncates the epidemiological time series but does not currently trim sim$cumulative.edgelist; filter it manually by start / stop if you need a matching time window.

Contact Tracing

get_partners() extracts the partners of specified nodes from the cumulative edgelist:

partner_list <- get_partners(
    dat,
    index_posit_ids,
    networks = NULL,
    truncate = Inf,
    only.active.nodes = FALSE
)

Arguments:

dat: the main list object.
index_posit_ids: a vector of positional IDs for the nodes of interest (the “indexes”).
networks: which network layers to search (NULL for all).
truncate: only include edges that ended within this many steps (filters by edge age).
only.active.nodes: if TRUE, exclude partnerships with inactive (departed) nodes.

The output is similar to get_cumulative_edgelists_df() but with columns index and partner (both containing unique IDs) instead of head and tail. Note that indexes are specified by positional ID but the output uses unique IDs, since partners may include nodes that have already departed.

Cumulative Degree

get_cumulative_degree() counts the number of distinct partners each node has had over the tracked history:

cum_degree <- get_cumulative_degree(
    dat,
    index_posit_ids = 1:50,
    networks = NULL,
    truncate = Inf,
    only.active.nodes = FALSE
)

This returns a data.frame with columns index_pid (positional ID) and degree (cumulative partner count). It wraps get_partners() and counts unique partners per index.

Reachability Analysis

Reachability functions determine which nodes can be connected through chains of partnerships over a time window. These are useful for outbreak investigation (forward reachability: who could a node have infected?) and source tracing (backward reachability: who could have infected a node?).

These functions operate on cumulative edgelist objects directly—not on dat—and are typically used for post-simulation analysis. Most often, the input comes from the saved sim$cumulative.edgelist[[s]] slot of a netsim run; see the Converting External networkDynamic Objects and Deduplicating Across Sources subsections below for two ways to construct input from other starting points.

Forward Reachable Set

el_cuml <- get_cumulative_edgelist(dat, network = 1)

fwd <- get_forward_reachable(
  el_cuml,
  from_step = 1,
  to_step = 52,
  nodes = c(10, 25, 42),  # NULL for all nodes with edges
  dense_optim = "auto"
)

Returns a list with two elements:

reached: a named list where each element contains the set of nodes reachable from each index node through chains of partnerships active during [from_step, to_step].
lengths: a matrix with one row per node and one column per time step (plus an initial column), showing how the reachable set grows over time. This allows back-calculating the distance to each reachable node.

Nodes are identified by unique ID in the output (named as node_ID). Nodes with no edges during the analysis period are excluded from the output; their forward reachable set is just themselves (size 1).

Backward Reachable Set

bkwd <- get_backward_reachable(
  el_cuml,
  from_step = 1,
  to_step = 52,
  nodes = c(10, 25, 42)
)

Same interface and output structure as get_forward_reachable(), but follows partnerships backward in time. This answers the question: which nodes could have reached this node through a chain of partnerships during the specified period?

Converting External `networkDynamic` Objects

If you have a networkDynamic object that was produced outside of a cumulative.edgelist = TRUE run (for example, from a non-tergmLite simulation, from a saved RDS file, or built manually), as_cumulative_edgelist() converts it into the same data.frame shape the reachability functions expect:

nd <- get_network(sim, sim = 1, network = 1)   # full-mode netsim only
el_cuml <- as_cumulative_edgelist(nd)
fwd <- get_forward_reachable(el_cuml, from_step = 1, to_step = 100)

This is the recommended entry point for post-hoc reachability analysis on simulations that did not enable cumulative-edgelist tracking up front, provided the run kept the full networkDynamic history (i.e., not tergmLite).

Deduplicating Across Sources

The reachability functions assume non-overlapping spells per (head, tail) pair. When concatenating cumulative edgelists from multiple sources (e.g., separate simulation segments, or several as_cumulative_edgelist() conversions stitched together), use dedup_cumulative_edgelist() to merge overlapping spells into a single row:

el_cuml <- dedup_cumulative_edgelist(rbind(el_run1, el_run2))

Performance Notes

Both reachability functions use the progressr package for progress reporting. Wrap calls in progressr::with_progress() to display a progress bar:

progressr::with_progress({
  fwd <- get_forward_reachable(el_cuml, from_step = 1, to_step = 260)
})

These functions are efficient for large networks because they operate on cumulative edgelists (much smaller than full networkDynamic objects). For sets of more than 5 nodes, they are faster than iterating over tsna::tPath(). The dense_optim argument controls an adjacency-list optimization that helps with dense networks; "auto" enables it when the number of edges exceeds the number of nodes.

Working with Network Objects in EpiModel

Introduction

Network Storage Modes

Accessing Network Objects

During Simulation (Inside Modules)

After Simulation

Transmission Matrix

Current Edgelists

Single Network

Multiple Networks

Discordant Edgelist

Positional Indexing and Unique IDs

Cumulative Edgelist

Lifecycle

Enabling Cumulative Edgelists

Updating the Cumulative Edgelist

Accessing the Cumulative Edgelist

During Simulation: Specific Network

During Simulation: Multiple Networks

After Simulation

Contact Tracing

Cumulative Degree

Reachability Analysis

Forward Reachable Set

Backward Reachable Set

Converting External networkDynamic Objects

Deduplicating Across Sources

Performance Notes

Converting External `networkDynamic` Objects