API for ergm Terms

Pavel N. Krivitsky

options(rmarkdown.html_vignette.check_title = FALSE)

Summary

This document seeks to be the most up-to-date API documentation for ergm terms. Note that it is not intended to be a tutorial as much as a description of what inputs and outputs different parts of the system expect.

Overview

Types of storage and types of term

The storage API defines two types of storage: private storage, which is attached the ModelTerm structure and is specific to each ergm term, and public storage, which is attached to the Model and can be accessed by all terms.

A statistic is a familiar ergm term like “edges” or “nodefactor”: it adds at least one sufficient statistic to the model. Every statistic can have private storage, and it can read from public storage, but it cannot write to public storage.

An auxiliary in an ergm term but not an ERGM term in the mathematical sense: it adds no statistics to the model and exists only to initialize and maintain public storage to be used by statistics. It may not be specified on an ergm formula by the end-user, but only requested by a statistic.

An auxiliary can rely on another auxiliary’s public storage. Note that circular dependencies are not checked.

Code path

For the purposes of this overview, the following information is relevant, and is elaborated formally later:

  1. The user passes a formula to the procedure (ergm, summary, etc.); this formula may only have statistics.
  2. ergm_model() is called.
    1. It iterates through the formula terms, calling InitErgmTerm.<NAME>() functions in turn (or InitWtErgmTerm.<NAME>() for valued ERGMs), adding their output to the model term list. Some terms include auxiliaries formulas in their list.
    2. call.ErgmTerm() when it finds that a term has requested auxiliaries, attaches an attribute attr(., "aux.slots") containing an integer vector for the model’s own and/or requested auxiliaries’ positions on the aux_storage vector.
    3. It calls ergm.auxstorage() with the complete model.
      1. It iterates through initialized model terms, looking at their auxiliaries element for a one-sided formula listing their requested auxiliaries.
      2. It iterates through the auxiliary formulas, calling term initialization on them.
      3. It iterates through the auxiliaries elements of auxiliary terms and initialises those, etc..
      4. It constructs a list of unique initialized auxilariy terms (that is, when two or more terms had requested an identical auxiliary).
      5. It inserts the initialized unique auxiliary terms in model$terms and the index (in the unique list) of the auxiliary requested by each statistic in the aux.slots of the requesting statistic.
      6. For auxiliary itself, the first element of aux.slots is set to the position of the auxiliary itself.
  3. The ergm_state is constructed from an edgelist (state$el), an empty network (state$nw0), a model (state$model), and (optionally) a proposal (state$proposal) and a statistics vector (state$stats).
  4. update.ergm_state() is called.
    1. It iterates through the formula terms, calling term$ext.encode() (if defined) to construct a vector state$ext.state. state$ext.flag is set to reconciled.
  5. The ergm_state is passed to the C code.
  6. Redgelist2Network() initializes the network.
  7. ModelInitialize() is called:
    1. ModelInitialize() initializes all terms (statistic and auxiliaries), also counting up the number of auxiliaries (distinguished by having no c_, d_, or s_ functions). A term can export both a c_ function and a d_ function. In that case, it is responsible for deleting one of them when its i_ function is called.
    2. It counts up the number of auxiliaries and allocates an array of void *s, one for each auxiliary. The mtp->aux_storage pointer for each term is set to point to that (one) array.
    3. It assigns attr(model$terms[[i]], "aux.slots") to mtp->aux_slots.
    4. It assigns the ergm_model SEXP to model->R, in case it’s needed.
    5. It assigns the ergm_model$terms[[i]] SEXP to mtp->R, in case it’s needed.
    6. It calls InitStats(), which calls the initializer (i_ function) of each term or, if not found, an updater (u_ function) with invalid input (i.e., toggle \((0,0)\)) is called in case the term developer prefers a one-function implementation.
    • The terms are initialized in reverse order, so auxiliaries are initialized before the statistics are, and statistics and auxiliaries can count on their auxiliaries being initialized by the time they are initialized.
  8. For each iteration, a proposal is made.
    1. ChangeStats() is called.
      1. It calls the d_ functions, for those terms for which they are initialized.
      2. It iterates through the proposed toggles.
        1. It calls the c_ functions.
        2. It adds its output to the cumulative change.
        3. It calls the u_ functions with the toggle (if more to come).
        4. It makes the toggle provisionally (if more to come).
      3. It undoes the provisional toggles.
    2. If the proposal is accepted, u_ function is called, and network is updated for each toggle.
  9. ModelDestroy() is called:
    1. DestroyStats() is called, iterating through the terms.
      1. Finalizer (f_) function is called, if defined.
      2. If mtp->storage is not NULL, it is freed.
    2. Other parts of the model are freed; in particular those of mtp->aux_storage if not NULL.
  10. ErgmStateRSave() is called:
    1. Network2Redgelist() is called, returning a SEXP with the state.
    2. For each term, a writer (w_ function) is called if defined, returning a SEXP with the extended state.
    3. These are stored in a vector state$ext.state. Element state$ext.flag is set to signal that a change was made on the C side.
  11. NetworkDestroy() is called.
  12. States and other outputs are passed back to R.
  13. update.ergm_state() is called.
    1. It iterates through the formula terms, calling term$ext.decode() (if defined) to update state$nw0 or other aspects of the network. state$ext.flag is set to reconciled.

Formal API definition

Evaluation API

R side

The following is adapted from the header of R/InitErgmTerm.R

InitErgmTerm.* and InitWtErgmTerm.* functions

The following are the minimal arguments of these functions:

function(nw, arglist, ...)

Expects

nw: a network object taken from the LHS of the model formula or the basis= argument (if given); valued networks are instrumented with %ergmlhs% "response" information.

arglist: a list of arguments passed to the term call on the formula. If arguments are passed with names, the list is named. Typically, helper function check.ErgmTerm() is used to preprocess this list with the equivalent of R’s match.call(). Notably, thanks to R’s lazy evaluation, it is possible to obtain the argument expressions without evaluating them by calling substitute(arglist). See InitErgmTerm.: and InitErgmTerm.* in R/InitErgmTerm.interaction.R for examples.

env: an environment, typically of the formula from which the term was extracted.

Any term options are passed as direct arguments (not arglist). See options?ergm for details.

Returns

Three return types are possible:

  1. NULL, to indicate that this term does not add to the model. (E.g., nodefactor("a", levels=FALSE).)
  2. An ergm_model object, in which case its terms are “pasted” into the model. This is useful for some term operators.
  3. A named list defining the properties of the term. The following names have special meanings; but any other elements can be included and will be accessible on the C side.

In the following, let \(p = \dim(\eta) = \dim(g(y))\), the dimension of the statistic \(g(y)\) being computed and of the canonical parameter \(\eta\). If the model is curved, the model parameter vector \(\theta\) is first mapped onto \(\eta=\eta(\theta)\); otherwise, \(\eta\equiv\theta\). Let \(q=\dim(\theta)\).

name (required): a string containing the term’s C-side name: ergm will search for name prepended with "c_" for change statistics, "d_" for difference, "s_" for summary, etc.. This is the only required element.

coef.names: a character vector of length \(p\) of names for the elements of the canonical statistic of the model (and canonical parameters); can be absent or zero-length for auxiliaries.

inputs: a vector of (double-precision) numeric inputs that will be made available to the C-side implementation as a double vector; optionally,inputs may have an attribute named "ParamsBeforeCov", which is used primarily for backwards compatibility, but can also be used to more conveniently separate, e.g., metadata elements from vertex attribute elements.

iinputs: a vector of integer inputs that will be made available to the C-side implementation as an int vector; optionally,iinputs may have an attribute named "ParamsBeforeCov", which is used primarily for backwards compatibility, but can also be used to more conveniently separate, e.g., metadata elements from vertex attribute elements.

pkgname: a string containing the name of the R package containing the C implementation; if not specified, the package in which the Init*ErgmTerm.* function was found is assumed.

dependence: a logical value indicating whether the addition of this term to the model induces dyadic-dependence; if all terms (ignoring auxiliaries) have dependence set to FALSE, the model is inferred to be dyad-independent; if not specified, TRUE is assumed.

emptynwstats: a numeric vector of length \(p\) providing the value of the statistic if evaluated on an empty network; if not specified or NULL, assumed to be a vector of zeros. (See InitErgmTerm.degree() in R/InitEergmTerm.R for an example.)

minpar and maxpar: numeric vectors of length \(q\) giving the bounds on the valid values for the model’s parameters; if not specified, -Inf and +Inf vectors are assumed.

offset: a logical vector of length \(q\) that allows the term to mark some of its own statistics as having fixed parameters.

For curved terms, all of the following must be present except cov:

params: a list whose names correspond to element names of the curved parameter vector \(\theta\); the items in the list are there for historical reasons and are ignored.

map: a function taking at least two arguments, x (a numeric \(q\)-vector containing \(\theta\)), n\(=p\) (the length of the output) and an optional cov parameter; it is to return a numeric vector of length n containing \(\eta(\theta)\).

gradient: a function taking the same arguments as map and returning the gradient of map (\(\eta'(\theta)\)) as a \(q\times p\) numeric matrix.

cov: an optional arbitrary data structure that if present is passed to map and gradient.

C side

ModelTerm and WtModelTerm data structures

The data structure contains information accessible to the C-side statistics. Here, outlist refers to the list returned by the Init*ErgmTerm.* function.

A number of its elements are for internal use and should generally not be considered a part of the API or accessed by the term (with some rare exceptions). These include: c_func, d_func, i_func, u_func, f_func, s_func, w_func, x_func, z_func, statspos, statcache, emptynwstats.

Furthermore, elements aux_storage and aux_slots should not be accessed directly but only with helper described in the auxiliary storage API below.

The following elements are a part of the API:

double *attrib (INPUT_ATTRIB, DINPUT_ATTRIB): contents of outlist$inputs, shifted by ParamsBeforeCov attribute if given. (Rarely used.)

int *iattrib (IINPUT_ATTRIB): contents of outlist$iinputs, shifted by ParamsBeforeCov attribute if given. (Rarely used.)

int nstats (N_CHANGE_STATS): value of \(p\), the length of the statistic vector.

double *dstats (CHANGE_STAT): a \(p\)-vector to be overwritten with statistic value.

int ninputparams (N_INPUT_PARAMS, N_DINPUT_PARAMS): length of outlist$inputs.

double *inputparams: contents of outlist$inputs.

int niinputparams (N_IINPUT_PARAMS): length of outlist$iinputs.

int *iinputparams (IINPUT_PARAMS): contents of outlist$iinputs.

void *storage (STORAGE): a pointer managed by the term to its private storage space.

unsigned int n_aux (N_AUX): number of auxiliaries associated with this term; typically the number requested, plus one if the term is itself an auxiliary.

SEXP R: the contents of outlist as an R expression.

SEXP ext_state: Location of the extended state information for the term. See below.

c_, d_, and s_ functions

d_ functions are the original difference statistics. c_ functions are new, while s_ functions have been around for a long time, but never formally documented.

Change statistic (binary): void c_<NAME>(Vertex tail, Vertex head, ModelTerm *mtp, Network *nwp, Rboolean edgestate)

Change statistic (valued): void c_<NAME>(Vertex tail, Vertex head, double weight, WtModelTerm *mtp, WtNetwork *nwp, double edgestate)

Difference statistic (binary): void d_<NAME>(Vertex *tails, Vertex *heads, ModelTerm *mtp, Network *nwp)

Difference statistic (valued): void d_<NAME>(Vertex *tails, Vertex *heads, double *weights, WtModelTerm *mtp, WtNetwork *nwp)

Summary statistic (binary): void s_<NAME>(ModelTerm *mtp, Network *nwp)

Summary statistic (valued): void s_<NAME>(WtModelTerm *mtp, WtNetwork *nwp)

Expects

Parameters

Note: In undirected networks, it can be assumed that tail < head (and similarly with the multiple toggles). In bipartite networks, tails are in the first partition and heads are in the second.

Edge ntgoggles: Number of edges to be toggled or updated.

Vertex tail: Tail of (1) dyad to be toggled or updated.

Vertex *tails: An array of tails of the dyads to be toggled or updated.

Vertex head: Head of (1) dyad to be toggled or updated.

Vertex *heads: An array of heads of the dyads to be toggled or updated.

double weight: New weight for (1) dyad.

double *weights: An array of new weights for the dyads to be toggled or updated.

ModelTerm *mtp: A pointer to the ModelTerm data structure. See inst/include/ergm_changestat.h and inst/include/ergm_wtchangestat.h for details.

Network *nwp: A pointer to the Network of interest before any toggles are applied.

Rboolean edgestate: An indicator of whether edge (tail,head) is in the network nwp pre-toggle.

double edgestate: The weight of dyad (tail,head) in the network nwp pre-update.

Storage

All functions except for s_ expect any storage they need to be initialized and up to date (consistent with nwp). In particular, if their statistic requested \(k\) auxiliary terms, the \(k\) (mtp->n_aux) elements of its mtp->aux_slots vector will be the indexes of mtp->aux_storage where they can find the respective objects.

Macros

It is worth noting that macros defined for d_ functions that refer to a specific toggle, such as TAIL, HEAD, etc. might not be usable in a c_ function, but it’s made up for by c_ function’s reduced need for bookkeeping: tail, head, etc. can be used directly.

Side-effects

These functions overwrite mtp->dstats (often aliased as CHANGE_STAT) with the following:

  • c_ and d_ functions: change of the value of the statistic they implement relative to nwp due to the toggles.
  • s_ the value of the statistic it implements.

Storage API

Every ergm term has private storage, found at void *mtp->storage, which allows it to store arbitrary information about the state of the network, as well as precalculated values of variables, preallocated memory it needs for its calculations, or any other use. It does so by specifying an updating function (and, optionally, an initialization and a finalization function). This updating function is called every time the network is about to change. The API for these functions is defined below.

Public storage is found at void **mtp->aux_storage. Each auxiliary term gets assigned a slot (i.e., void *nwp->mtp->aux_storage[i]) to manage; its slot number is the first element of its aux_slots vector, and terms requesting it are told which slot to look in in a similar fashion. An auxiliary term that requests other auxiliaries will have its own slot as the first element of aux_slots and the slots of auxiliaries it requests as subsequent elements.

R side

A statistic that only references its private storage or is an auxiliary itself does not need to do anything special on the R side.

To request an auxiliary, a term’s InitErgmTerm call’s output list must include an auxiliaries element containing a one-sided ergm-style formula listing the auxiliary terms it wishes to use separated by the + operator.

C side: Modifying Storage

i_ functions: Initializer/Constructor

This function is optional for using storage: if it’s not provided, the model code will call the u_ function with an invalid toggle first, signaling for it to initialize.

Binary: void i_<NAME>(Model *mtp, Network *nwp)

Valued: void i_<NAME>(WtModel *mtp, WtNetwork *nwp)

Expects

In general, i_ function expects to be called after ModelInitialize() and NetworkInitialize(), before any c_ or d_ functions. That is, the network must be populated with the ties of its initial state and have mtp->aux_storage vector allocated.

Private storage

Network populated with initial ties and initialized model.

Public storage

The first element of mtp->aux_slots is the index of the element of mtp->aux_storage to be managed by this auxiliary. That is mtp->aux_storage[mtp->aux_slots[0]] is a void * to point to the data to be public.

The other data passed from the InitErgmTerm. are shifted over to make room for it.

Side-effects

Private storage

Allocates memory for the information to be stored and overwrites mtp->storage with a pointer to it, then updates the stored information to be consistent with *nwp.

Public storage

Allocates memory for the information to be stored and overwrites mtp->aux_storage[mtp->aux_slots[0]] with a pointer to it, then updates the stored information to be consistent with *nwp.

An auxiliary can also use its private storage as needed.

u_ functions: Updater

Binary: void u_<NAME>(Vertex tail, Vertex head, Model *mtp, Network *nwp, Rboolean edgestate)

Valued: void u_<NAME>(Vertex tail, Vertex head, double weight, Model *mtp, Network *nwp, double edgestate)

Expects

Initialized network. If no i_ function was provided, to be called with a \((0,0)\) toggle as a signal to initialize; otherwise, initialized storage. Any statistic or auxiliary can rely on its auxiliaries having been initialized before it.

Side-effects

If called with an a toggle (0,0) and uninitialized storage, initialize. This will never be done if an i_ function is defined for the term.

Update the state of its storage (mtp->storage and/or mtp->aux_storage[mtp->aux_slots[0]]) to match what the state of *nwp would be after the given dyad had been toggled.

f_ functions: Finalizer/Destructor

This function is optional for using storage: if it’s not provided, the model code will free any pointers to mtp->aux_storage and mtp->storage that are not NULL.

Binary: void f_<NAME>(Model *mtp, Network *nwp)

Valued: void f_<NAME>(WtModel *mtp, WtNetwork *nwp)

Expects

Network and a model.

Side-effects

Deallocates its storage (mtp->storage and/or mtp->aux_storage[mtp->aux_slots[0]]) and sets its pointers to NULL.

C side: Accessing Storage

Private storage

c_, d_, and s_ functions can read from, but not write to, their private storage. c_ and d_ functions can rely on initialization having been called before.

Public storage

Auxilaries must not implement c_, d_, and s_ functions.

Terms requesting one or more auxiliaries will be passed the indices of the element of mtp->aux_storage by inserting them at the start of mtp->aux_slots. That is mtp->aux_storage[mtp->aux_slots[0]] is a void * to point to the data public by the first auxiliary term on the auxiliaries formula, mtp->aux_storage[mtp->aux_slots[1]] is the second, etc..

x_ functions: eXtensions

This interface is intended to be used by packages extending ergm to send arbitrary signals to statistics and auxiliaries. For example, for temporal ERGMs, it may be used to signal to the statistic that the clock is about to advance. It is the responsibility of the extension writer to ensure that everything behaves sensibly.

Binary: void x_<NAME>(unsigned int type, void *data, Model *mtp, Network *nwp)

Valued: void x_<NAME>(unsigned int type, void *data, Model *mtp, Network *nwp)

Expects

Parameters

type: a magic constant identifying the type of signal being sent. Based on it, the function can ignore the signal, or determine how to interpret data.

data: arbitrary data to be sent to the function; it is up to the extension writer to determine how it is formatted and interpreted.

Side-effects

There are no restrictions on side-effects. It is up to the extension writer to ensure that everything works.

Macros

The following helper macros have been defined to date, and can be found in storage.h.

Memory management

These functions are defined in ergm_storage.h and exported.

ALLOC_STORAGE(nmemb, stored_type, store_into): Allocate a vector of nmemb elements of type stored_type, save its pointer to private storage and also to a stored_type *store_into which is also declared. Should be used by the i_ function, but may also be used by the u_ function.

GET_STORAGE(stored_type, store_into): Declare stored_type *store_into and assign the pointer to private storage to it. Can be used by all functions.

ALLOC_AUX_STORAGE(nmemb, stored_type, store_into): Allocate a vector of nmemb elements of type stored_type, and save it to the auxiliary storage slot belonging to the calling auxiliary and into a stored_type *store_into which is also declared. Can be used by the i_ function, but may also be used by the u_ function.

GET_AUX_STORAGE(stored_type, store_into): Declare stored_type *store_into and assign the pointer to the auxiliary storage (either for a statistic or for the auxiliary). Can bn used by all functions.

GET_AUX_STORAGE_NUM(stored_type, store_into, ind): Declare stored_type *store_into and assign the pointer to the indth auxiliary). Can be used by clients of auxiliaries.

Miscellaneous helpers

ALLOC_AUX_SOCIOMATRIX(stored_type, store_into): Allocate an array of appropriate dimension with elements of type stored_type, save it to auxiliary storage, and into **store_type, so that store_into[i][j] returns the value associated with dyad \((i,j)\), with vertices indexed from 1. For bipartite and undirected networks, as little space as possible (resp. a rectangle or a triangle) is allocated.

Note that this term assumes that the private and the public storage of the calling term are not used in any other way.

FREE_AUX_SOCIOMATRIX: Frees the sociomatrix allocated by ALLOC_AUX_SOCIOMATRIX.

MHproposal storage API

Auxiliaries

MHproposals may also request auxiliary terms. An InitErgmProposal.<NAME>() or InitWtErgmProposal.<NAME>()with an auxiliaries formula will similarly receive the positions of its auxiliaries’ slots in the network. However, this appears to have a slight cost in speed and a potentially significant cost in memory, since the auxiliary may need to duplicate the information in the MH_ function.

Private storage

Functions prefixed with Mi_, Mu_, and, Mf_ serve as respectively the initializers, the updaters, and the finalizers of the MHproposal storage, though the old-style call with MHp->ntoggles==0 is also supported. Macros in the ergm_MHstorage.h header file can be used to access storage the same way as for the statistics.

The function called to generate the proposal can have a prefix of either MH_ (for backwards compatibility) or Mp_ for consistency.

One important difference is that Mp_ function is permitted to write to its private storage. This may be useful if, say, a systematic sample is desired.