p* Models & Network Dynamics

Networks in the Global World 2014
9:30-11:30, 12:00-14:00 June 27, 2014
Room 67, 3rd Floor
http://goo.gl/l5XTkp
Benjamin Lind, PhD
Faculty of Sociology, Center for Advanced Studies
International Laboratory for Applied Network Research
Higher School of Economics, Moscow

What are Exponential Random Graph (p*) Models?



"Why Model Social Networks?"

Robins et al. 2007. Social Networks.

  1. To understand regularities in tie formation
  2. Can network structures emerge by chance?
  3. Test competing explanations for the same outcome
  4. Efficiently represent complex network data
  5. Address relationship between local and global properties

Modeling Steps

Robins et al. 2007. Social Networks.

  1. Each edge is a random variable
  2. Propose dependence hypothesis
  3. Only certain forms will meet dependence hypothesis
    1. If there are none, the model is "degenerate"
    2. If there are only a few, it is "near degenerate"
  4. Simplify
    1. Reduce parameters
    2. Homogeneity assumption
  5. Estimate and interpret parameters

Formula

Probability that a random graph is isomorphic to the observed

Pr(Y = y) = (1 / k) * exp{ΣA ηA*gA(Y)}

  • Y = A random graph
  • y = Observed graph
  • A = A network statistic
  • g(Y) = Vector of model statistics
  • η = Parameter vector
  • k = Normalizing constant

Dependence Assumptions

(Robins et al. 2007. Social Networks.)

  • Bernoulli graphs
  • Dyadic dependence
  • Markov random graphs

Which Network Statistics to Model?


  • Edges
  • Nodal attributes
    • Popularity effects
    • Homophily effects
  • Reciprocity
  • Triadic terms
    • The number of triangles
    • Shared partners
      • Dyadic or edge
    • Transitivity
  • Path terms
  • Cycles
  • ...

Software



Be sure to use the most recent version!
We'll be using v. 3.1.0.

Software

statnet
  • Suite
    • network
    • ergm
    • sna
    • tergm
    • ...
  • Uses "network" data objects
  • Written in both R and C
  • Not compatible with other R network packages


install.packages("statnet")
library(statnet)

Example Data


Grey's Anatomy "Hook-Up" Network

  • Popular night time medical drama
  • Actors: fictional hospital staff in Seattle, WA
  • Edges: sexual relationships
  • Actor attributes:
    • Sex, race
    • Birth year, astrological sign
    • Position
    • Season
  • Provided courtesy of Weissman and Lind
    • CC-BY-SA

    Read the Data


    Enter the following URLs into your browser in this order
    1. Sociomatrix: http://goo.gl/Qcqd3u
      1. Should be titled "Grey's Anatomy.tsv"
    2. Attributes: http://goo.gl/vVLhrR
      1. Should be titled "Grey's Anatomy (1).tsv"

    (Download should begin automatically)
    setwd("...") # Enter the folder where the above files were downloaded
    ga.mat <- read.table("Grey's Anatomy.tsv", sep= "\t", header = TRUE, row.names = 1, quote = "\"")
    ga.mat <- as.matrix(ga.mat)
    ga.atts <- read.table("Grey's Anatomy (1).tsv", sep="\t", header=T, quote="\"", stringsAsFactors = FALSE, strip.white = TRUE, as.is = TRUE)
    ga.net <- network(ga.mat, vertex.attr = ga.atts, vertex.attrnames = colnames(ga.atts), directed=F, hyper=F, loops=F, multiple=F, bipartite=F)
    

    Plot the Data


    Verify everything is properly read
    Roughly understand the network

    vcol = c("blue", "pink")[1 + (get.vertex.attribute(ga.net, "sex") == "F")]
    plot(ga.net, vertex.col = vcol, label = get.vertex.attribute(ga.net, "name"), label.cex = 0.75)
    


    Hypotheses?

    Documentation


    # Conduct a fuzzy search through the documentation for the term "ergm"
    ??"ergm"
    
    # Read the documentation for the function 'ergm()'
    ?ergm
    
    # Read the documentation for ergm terms in the ergm package
    ?ergm::ergm.terms

    Let's take a few minutes to review the documentation

    Our Baseline Model

    Two Most Salient Network Characteristics
    • Number of edges (i.e., density effect)
    • Actor's sex (i.e., heterophily)
    ga.base <- ergm(ga.net ~ edges + nodematch("sex")) # Estimate the model
    summary(ga.base) # Summarize the model
    ==========================
    Summary of model fit
    ==========================
    Formula:   ga.net ~ edges + nodematch("sex")
    
    Iterations:  20 
    
    Monte Carlo MLE Results:
                  Estimate Std. Error MCMC % p-value    
    edges          -2.3003     0.1581     NA  <1e-04 ***
    nodematch.sex  -3.1399     0.7260     NA  <1e-04 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
         Null Deviance: 1311.4  on 946  degrees of freedom
     Residual Deviance:  320.5  on 944  degrees of freedom
     
    AIC: 324.5    BIC: 334.2    (Smaller is better.)

    Interpretation

     ==========================
    Summary of model fit
    ==========================
    
    Formula:   ga.net ~ edges + nodematch("sex")
    
    Iterations:  20 
    
    Monte Carlo MLE Results:
                  Estimate Std. Error MCMC % p-value    
    edges          -2.3003     0.1581     NA  <1e-04 ***
    nodematch.sex  -3.1399     0.7260     NA  <1e-04 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
         Null Deviance: 1311.4  on 946  degrees of freedom
     Residual Deviance:  320.5  on 944  degrees of freedom
     
    AIC: 324.5    BIC: 334.2    (Smaller is better.)
    • Coefficients are conditional log-odds
      • Log-odds of a tie forming between two actors
      • I.e., edges + nodematch
    • Probability
      • exp(edges + nodematch) / (1 + exp(edges + nodematch))

    Interpretation

    ==========================
    Summary of model fit
    ==========================
    
    Formula:   ga.net ~ edges + nodematch("sex")
    
    Iterations:  20 
    
    Monte Carlo MLE Results:
                  Estimate Std. Error MCMC % p-value    
    edges          -2.3003     0.1581     NA  <1e-04 ***
    nodematch.sex  -3.1399     0.7260     NA  <1e-04 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
         Null Deviance: 1311.4  on 946  degrees of freedom
     Residual Deviance:  320.5  on 944  degrees of freedom
     
    AIC: 324.5    BIC: 334.2    (Smaller is better.)
    • Probability
      • exp(edges + nodematch) / (1 + exp(edges + nodematch))
      • Opposite-sex = exp(-2.3 + -3.1 * 0) / (1 + exp(-2.3 + -3.1 * 0))
      • Same-sex = exp(-2.3 + -3.1 * 1) / (1 + exp(-2.3 + -3.1 * 1))
      • Correspond to 0.09 and 0.004

    Interpretation

    ==========================
    Summary of model fit
    ==========================
    
    Formula:   ga.net ~ edges + nodematch("sex")
    
    Iterations:  20 
    
    Monte Carlo MLE Results:
                  Estimate Std. Error MCMC % p-value    
    edges          -2.3003     0.1581     NA  <1e-04 ***
    nodematch.sex  -3.1399     0.7260     NA  <1e-04 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
         Null Deviance: 1311.4  on 946  degrees of freedom
     Residual Deviance:  320.5  on 944  degrees of freedom
     
    AIC: 324.5    BIC: 334.2    (Smaller is better.)
    • Standard errors can be used for estimate confidence
    • p-values for statistical significance
    • Use either AIC or BIC to gauge model parsimony
      • Principle of Occam's Razor
      • Better fit with fewer parameters is preferable

    Plot a Simulated Model


    Does it look anything like our original network?
    Is it degenerate?

    plot(simulate(ga.base), vertex.col=vcol)

    How is it different?

    Is it degenerate?

    Goodness of Fit


    1. Simulate the network many times
    2. Assess with one or more unmodeled network statistic
    3. Compare the observed network statistics to the simulated


    ga.base.gof <- gof(ga.base) # Simulate and measure. May take a minute or so.
    summary(ga.base.gof) # Print a summary
    par(mfrow=c(3, 1)) # Set up the plotting window
    plot(ga.base.gof) # Plot the summary
    


    Additional Parameter

    • Keep edges and nodematch(sex)
      • Theoretically motivated and significant
    • Account for vertices with a degree of 1
      • Theoretically meaningful and underrepresented
    ga.base.d1 <- ergm(ga.net ~ edges + nodematch("sex") + degree(1))
    summary(ga.base.d1)
    ==========================
    Summary of model fit
    ==========================
    Formula:   ga.net ~ edges + nodematch("sex") + degree(1)
    
    Iterations:  20 
    
    Monte Carlo MLE Results:
                  Estimate Std. Error MCMC % p-value    
    edges          -1.4292     0.2466      0  <1e-04 ***
    nodematch.sex  -3.1766     0.7332      0  <1e-04 ***
    degree1         2.0807     0.4602      0  <1e-04 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
         Null Deviance: 1311.4  on 946  degrees of freedom
     Residual Deviance:  295.8  on 943  degrees of freedom 
    AIC: 301.8    BIC: 316.3    (Smaller is better.) 

    Additional Parameter

    ga.base.d1 <- ergm(ga.net ~ edges + nodematch("sex") + degree(1))
    summary(ga.base.d1)
    ==========================
    Summary of model fit
    ==========================
    Formula:   ga.net ~ edges + nodematch("sex") + degree(1)
    
    Iterations:  20 
    
    Monte Carlo MLE Results:
                  Estimate Std. Error MCMC % p-value    
    edges          -1.4292     0.2466      0  <1e-04 ***
    nodematch.sex  -3.1766     0.7332      0  <1e-04 ***
    degree1         2.0807     0.4602      0  <1e-04 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
         Null Deviance: 1311.4  on 946  degrees of freedom
     Residual Deviance:  295.8  on 943  degrees of freedom 
    AIC: 301.8    BIC: 316.3    (Smaller is better.) 
    • Iteration 1 of at most 20: 
    • Convergence test P-value: 0e+00 
    • The log-likelihood improved by 0.6971 
    • Iteration 2 of at most 20: ...

    Additional Parameter

    ga.base.d1 <- ergm(ga.net ~ edges + nodematch("sex") + degree(1))
    summary(ga.base.d1)
    ==========================
    Summary of model fit
    ==========================
    Formula:   ga.net ~ edges + nodematch("sex") + degree(1)
    
    Iterations:  20 
    
    Monte Carlo MLE Results:
                  Estimate Std. Error MCMC % p-value    
    edges          -1.4292     0.2466      0  <1e-04 ***
    nodematch.sex  -3.1766     0.7332      0  <1e-04 ***
    degree1         2.0807     0.4602      0  <1e-04 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
         Null Deviance: 1311.4  on 946  degrees of freedom
     Residual Deviance:  295.8  on 943  degrees of freedom 
    AIC: 301.8    BIC: 316.3    (Smaller is better.) 
    Interpretation
    • Previous effects still significant
    • Monogamy effect is positive and significant
    • Improved AIC and BIC

    Assess Fit

    plot(simulate(ga.base.d1), vertex.col = vcol)


    Assess Fit

    plot(simulate(ga.base.d1), vertex.col = vcol)
    ga.base.d1.gof <- gof(ga.base.d1)
    summary(ga.base.d1.gof)
    par(mfrow=c(3,1))
    plot(ga.base.d1.gof)
    


    Markov Chain Monte Carlo

    Did it work properly?
    mcmc.diagnostics(ga.base.d1)
    
    # ...
    Are sample statistics significantly different from observed?
                   edges nodematch.sex    degree1 Overall (Chi^2)
    diff.      0.3728000     0.0201000 -0.1174000              NA
    test stat. 1.0569427     0.7868791 -0.7314219        1.563627
    P-val.     0.2905377     0.4313526  0.4645215        0.667665
    

    That looks okay, moving on...

    Markov Chain Monte Carlo

    Did it work properly?
    Chain 1 
                edges nodematch.sex   degree1
    Lag 0   1.0000000    1.00000000 1.0000000
    Lag 100 0.8609309    0.46789330 0.7446234
    Lag 200 0.7542179    0.25234066 0.6214649
    Lag 300 0.6619003    0.15270586 0.5286271
    Lag 400 0.5806592    0.08059618 0.4508049
    Lag 500 0.5136180    0.04619048 0.3917378
    
    Sample statistics burn-in diagnostic (Geweke):
    Chain 1 
    
    Fraction in 1st window = 0.1
    Fraction in 2nd window = 0.5 
            edges nodematch.sex       degree1 
          -0.6463       -1.4147        0.8218 
    P-values (lower = worse):
            edges nodematch.sex       degree1 
        0.5180537     0.1571526     0.4111837 
    • High autocorrelation across the chains
    • Geweke roughly corresponds to Z-statistic
      • Nodematch's p-value is a bit low.

    Markov Chain Monte Carlo

    Troubleshooting
    • Sample statistics differ from observed
      •  MCMC's default sample size is 10,000
      • Increase it (maybe to ~20,000 or 50,000)
        • (Unnecessary here)
    • Autocorrelation
      • MCMC.interval's default is 100
      • Increase it (here, to ~5,000)
    • Improve the Geweke statistics
      • MCMC.burnin's default is 10,000
      • Increase it (here, to ~50,000)


    Though they improve the results and help models converge, these changes increase estimation time.

    Markov Chain Monte Carlo

    Documentation to fine tune
    ?control.ergm

    In addition to adjusting
    • MCMC.burnin
    • MCMC.samplesize
    • MCMC.interval


    Consider setting "parallel = 4"
    • ... or however many processors you'd like
    • Increases speed
    • Requires library(rlecuyer)

    Markov Chain Monte Carlo


    ?control.ergm
    
    # If we wanted to improve the sample statistics
    # ga.base.d1 <- ergm(ga.net ~ edges + nodematch("sex") + degree(1), control = control.ergm(MCMC.burnin = 50000, MCMC.interval = 5000, MCMC.samplesize = 50000))
    
    ga.base.d1 <- ergm(ga.net ~ edges + nodematch("sex") + degree(1), control = control.ergm(MCMC.burnin = 50000, MCMC.interval = 5000))
    summary(ga.base.d1)
    ==========================
    Summary of model fit
    ==========================
    Formula:   ga.net ~ edges + nodematch("sex") + degree(1)
    Iterations:  20 
    Monte Carlo MLE Results:
                  Estimate Std. Error MCMC % p-value    
    edges          -1.4298     0.2483      0  <1e-04 ***
    nodematch.sex  -3.1697     0.7257      0  <1e-04 ***
    degree1         2.0794     0.4587      0  <1e-04 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
         Null Deviance: 1311.4  on 946  degrees of freedom
     Residual Deviance:  295.7  on 943  degrees of freedom
    

    Markov Chain Monte Carlo

    mcmc.diagnostics(ga.base.d1)
    # ...
    Are sample statistics significantly different from observed?
                     edges nodematch.sex degree1 Overall (Chi^2)
    diff.      -0.00150000     0.0174000       0              NA
    test stat. -0.01690472     1.1955138       0       1.6043633
    P-val.      0.98651262     0.2318864       1       0.6584009
    
    Sample statistics auto-correlation:
    Chain 1 
                     edges nodematch.sex      degree1
    Lag 0      1.000000000   1.000000000  1.000000000
    Lag 5000   0.015469475   0.005331195  0.006627886
    Lag 10000 -0.000798288  -0.004256179 -0.004036665
    Lag 15000 -0.012238958  -0.002511807 -0.008747576
    Lag 20000 -0.001466805   0.004430013  0.001075269
    Lag 25000 -0.026182557  -0.010818715 -0.027159351
    
    Sample statistics burn-in diagnostic (Geweke):
    # ...
    
            edges nodematch.sex       degree1 
           1.0343        0.5097       -1.0479 
    
    P-values (lower = worse):
            edges nodematch.sex       degree1 
        0.3010182     0.6102707     0.2946793

    Alternative Theories


    We now have a decent baseline model
    • Approximates degree distribution
    • Approximates geodesic distribution

    Which theories should we now test?
    • Homophily
      • Attributes: age, race, position
      • Forms
        • Continuous vs. categorical
        • General, differential, mixture
    • Attribute-based popularity
      • Could challenge homophily effects
      • Attributes: age, race, position, sex

    Age Homophily

    Continuous measure, use 'absdiff()' term
    ga.base.d1.age <- ergm(ga.net ~ edges + nodematch("sex") + degree(1) + absdiff("birthyear"), control=control.ergm(MCMC.burnin=50000, MCMC.interval=5000))
    summary(ga.base.d1.age)
    
    ==========================
    Summary of model fit
    ==========================
    Formula:   ga.net ~ edges + nodematch("sex") + degree(1) + absdiff("birthyear")
    
    Iterations:  20 
    
    Monte Carlo MLE Results:
                      Estimate Std. Error MCMC % p-value    
    edges             -0.46543    0.28588      0   0.104    
    nodematch.sex     -3.23784    0.73100      0  <1e-04 ***
    degree1            1.99880    0.44231      0  <1e-04 ***
    absdiff.birthyear -0.12465    0.02929      0  <1e-04 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
         Null Deviance: 1311.4  on 946  degrees of freedom
     Residual Deviance:  270.3  on 942  degrees of freedom
     
    AIC: 278.3    BIC: 297.7    (Smaller is better.)

    Age Homophily

    ==========================
    Summary of model fit
    ==========================
    Formula:   ga.net ~ edges + nodematch("sex") + degree(1) + absdiff("birthyear")
    
    Iterations:  20 
    
    Monte Carlo MLE Results:
                      Estimate Std. Error MCMC % p-value    
    edges             -0.46543    0.28588      0   0.104    
    nodematch.sex     -3.23784    0.73100      0  <1e-04 ***
    degree1            1.99880    0.44231      0  <1e-04 ***
    absdiff.birthyear -0.12465    0.02929      0  <1e-04 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
         Null Deviance: 1311.4  on 946  degrees of freedom
     Residual Deviance:  270.3  on 942  degrees of freedom
     
    AIC: 278.3    BIC: 297.7    (Smaller is better.)
    Interpretation
    • Effects comparable to before
    • Age-based homophily present
    • Improved AIC and BIC

    Age Homophily

    Whenever using the MCMC, check its diagnostics
    mcmc.diagnostics(ga.base.d1.age)
    Does everything look okay?
    If not...
    • Which diagnostic looks wrong?
    • How would we adjust the MCMC to correct it?

    For sake of time, we'll pretend it's all okay.

    Challenge to Age Homophily Hypothesis

    What if younger characters/actors form more ties?

      • Reasons
        • Physiological
        • Audience interest
        • Cultural attitudes
      • Implications if true?
        • High probability of within the youth (homophily)
          • If most of the cast is young, homophily more apparent
        • Low probability of ties within the elderly
        • Medium probability of ties between youth and elderly

    Challenge to Age Homophily Hypothesis

    ga.base.d1.age.covage <- ergm(ga.net ~ edges + nodematch("sex") + degree(1) + absdiff("birthyear") + nodecov("birthyear"), control=control.ergm(MCMC.burnin=50000, MCMC.interval=5000))
    summary(ga.base.d1.age.covage)
    
    ==========================
    Summary of model fit
    ==========================
    
    Monte Carlo MLE Results:
                       Estimate Std. Error MCMC % p-value    
    edges             21.068917  50.797099      0   0.678    
    nodematch.sex     -3.239118   0.740633      0  <1e-04 ***
    degree1            1.984977   0.443934      0  <1e-04 ***
    absdiff.birthyear -0.125028   0.029186      0  <1e-04 ***
    nodecov.birthyear -0.005462   0.012884      0   0.672    
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
         Null Deviance: 1311.4  on 946  degrees of freedom
     Residual Deviance:  270.1  on 941  degrees of freedom
     
    AIC: 280.1    BIC: 304.4    (Smaller is better.)

    Challenge to Age Homophily Hypothesis

    ==========================
    Summary of model fit
    ==========================
    
    Monte Carlo MLE Results:
                       Estimate Std. Error MCMC % p-value    
    edges             21.068917  50.797099      0   0.678    
    nodematch.sex     -3.239118   0.740633      0  <1e-04 ***
    degree1            1.984977   0.443934      0  <1e-04 ***
    absdiff.birthyear -0.125028   0.029186      0  <1e-04 ***
    nodecov.birthyear -0.005462   0.012884      0   0.672    
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
         Null Deviance: 1311.4  on 946  degrees of freedom
     Residual Deviance:  270.1  on 941  degrees of freedom
     
    AIC: 280.1    BIC: 304.4    (Smaller is better.)
    Youth are not more popular
      • Effect not significant
      • Worse AIC, BIC

    Race-based Homophily

    Categorical attribute

    ga.base.d1.age.race <- ergm(ga.net~edges + nodematch("sex") + degree(1) + absdiff("birthyear") + nodematch("race"), control = control.ergm(MCMC.burnin = 50000, MCMC.interval = 5000, parallel = 4))
    mcmc.diagnostics(ga.base.d1.age.race) # How do the diagnostics look?
    ga.base.d1.age.race.gof <- gof(ga.base.d1.age.race)
    par(mfrow=c(3, 1)) # Set up the plotting window
    plot(ga.base.d1.age.race.gof)
    


    Race-based Homophily

    Categorical attribute

    ga.base.d1.age.race <- ergm(ga.net~edges + nodematch("sex") + degree(1) + absdiff("birthyear") + nodematch("race"), control = control.ergm(MCMC.burnin = 50000, MCMC.interval = 5000, parallel = 4))
    mcmc.diagnostics(ga.base.d1.age.race) # How do the diagnostics look?
    ga.base.d1.age.race.gof <- gof(ga.base.d1.age.race)
    par(mfrow=c(3, 1)) # Set up the plotting window
    plot(ga.base.d1.age.race.gof)
    plot(simulate(ga.base.d1.age.race), vertex.col=vcol) # Plot a simulated network
    


    Race-based Homophily


    summary(ga.base.d1.age.race)
    ==========================
    Summary of model fit
    ==========================
    
    Monte Carlo MLE Results:
                      Estimate Std. Error MCMC % p-value    
    edges             -0.98160    0.40479      0  0.0155 *  
    nodematch.sex     -3.28508    0.72837      0  <1e-04 ***
    degree1            1.95555    0.44480      0  <1e-04 ***
    absdiff.birthyear -0.13297    0.03116      0  <1e-04 ***
    nodematch.race     0.82943    0.36020      0  0.0215 *  
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
         Null Deviance: 1311.4  on 946  degrees of freedom
     Residual Deviance:  263.9  on 941  degrees of freedom
     
    AIC: 273.9    BIC: 298.1    (Smaller is better.)
    
    • Effects comparable to before
    • Race-based homophily effect significant
    • Better AIC, worse BIC

    Race-Based Differential Homophily

    • Relax the homogeneity assumption
    • Possibility for homophily to differ between groups
    ga.base.d1.age.racediff <- ergm(ga.net~edges + nodematch("sex")+ degree(1) + absdiff("birthyear") + nodematch("race", diff = TRUE, keep = c(1, 3)), control = control.ergm(MCMC.burnin=50000, MCMC.interval=5000, parallel = 4))
    mcmc.diagnostics(ga.base.d1.age.racediff) # Some issues here. We'll ignore solving them for now.
    summary(ga.base.d1.age.racediff)
    
    ==========================
    Summary of model fit
    ==========================
    Monte Carlo MLE Results:
                         Estimate Std. Error MCMC % p-value    
    edges                -0.97137    0.38115      0  0.0110 *  
    nodematch.sex        -3.32002    0.73217      0  <1e-04 ***
    degree1               1.97079    0.43896      0  <1e-04 ***
    absdiff.birthyear    -0.13112    0.03027      0  <1e-04 ***
    nodematch.race.Black  2.26102    0.74285      0  0.0024 ** 
    nodematch.race.White  0.75956    0.35106      0  0.0307 *  
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
         Null Deviance: 1311.4  on 946  degrees of freedom
     Residual Deviance:  259.4  on 940  degrees of freedom 
    AIC: 271.4    BIC: 300.5    (Smaller is better.)

    Race-Based Differential Homophily

    ==========================
    Summary of model fit
    ==========================
    Monte Carlo MLE Results:
                         Estimate Std. Error MCMC % p-value    
    edges                -0.97137    0.38115      0  0.0110 *  
    nodematch.sex        -3.32002    0.73217      0  <1e-04 ***
    degree1               1.97079    0.43896      0  <1e-04 ***
    absdiff.birthyear    -0.13112    0.03027      0  <1e-04 ***
    nodematch.race.Black  2.26102    0.74285      0  0.0024 ** 
    nodematch.race.White  0.75956    0.35106      0  0.0307 *  
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
         Null Deviance: 1311.4  on 946  degrees of freedom
     Residual Deviance:  259.4  on 940  degrees of freedom 
    AIC: 271.4    BIC: 300.5    (Smaller is better.)
    Implications
    • Effects comparable to before
    • Race-based homophily significant
      • Homophily stronger among Black characters than White
      • 0.78 vs. 0.45 probability
    • Better AIC, worse BIC

    Other Models

    For example
    # Are men or women more popular?
    ga.base.d1.age.sex <- ergm(ga.net ~ edges + nodematch("sex") + degree(1) + absdiff("birthyear") + nodefactor("sex"), control = control.ergm(MCMC.burnin = 100000, MCMC.interval = 5000))
    
    # Are certain occupational pairings more likely than others?
    ga.base.d1.age.rolemix<-ergm(ga.net ~ edges + nodematch("sex") + degree(1) + absdiff("birthyear") + nodemix("position", base = c(3:5, 9, 12:15, 17:21, 24, 27, 28)), control = control.ergm(MCMC.burnin = 50000, MCMC.interval = 5000))
    
    

    Estimating Network Properties from Ego Network Data

    Smith. 2012. "Macrostructure from Microstructure" Sociological Methodology.
    • Uses the "target.stats" term in ergm and stergm
    • Simulates network based upon ego network properties, e.g., 
      • Degree distribution
      • Degree correlates and demographic properties
      • Homophily
      • Transitivity
    • Can provide confidence intervals on network measurements, e.g.,
      • Average path length
      • Number of components
      • Cohesive subgroups
      • etc

    General Modeling Advice


    Start very simple
    Select the best models based upon the BIC


    Does the model reflect reality?
    • Plot simulations for quick diagnosis
    • Compare simulated measurements to the observed
     

    Check MCMC diagnostics when MCMC called

    Extensions


    • Directed networks (same, yet some different terms)
    • Weighted networks
    • Bipartite networks (same, yet some different terms)
    • ...
    • Temporal networks

    Separable Temporal Exponential Random Graph Models

    (Hanneke and Xing 2007, Krivitsky and Handcock 2014)

    • Network model at time t can be represented as an ERGM
      • It is conditioned upon the network at time t - 1
      • Discrete time
    • The difference between two time points represents two networks
      • Tie formation network (Y+)
      • Tie dissolution network (Y-)
      • Separablility: if tie formation is independent of tie dissolution
    • Two separate ERGM formulas for each network


    Theoretical Motivation


    Does the process of tie dissolution differ from tie formation? If so, what are their respective processes?

    Important to Bear in Mind...


    The "tie dissolution" mode refers to tie persistence.

    Example Data

    University Student Network
    "Over the past two weeks, who in our class did you communicate with outside of school?"
    • Universe
      • Second-year undergraduates
      • Honors sociological methods class
      • Classroom size of 22 students
    • Asked at three time points ~3 weeks apart
    • Symmetrized
    • Actor attributes
      • Study group
      • Sex

    Read the Data

    stergm can take three types of data
    • network object
    • networkDynamic object
    • network.list object
    • List of network objects

    library(tergm)
    ?stergm
    ?tergm::ergm.terms
    ?control.stergm
    student.nets <- dget("http://pastebin.com/raw.php?i=f0s1DaD9")
    summary(student.nets[[1]])
    plot(student.nets[[1]], vertex.col = student.nets[[1]] %v% "studygroupcol", label = student.nets[[1]] %v% "vertex.names")
    plot(student.nets[[2]], vertex.col = student.nets[[2]] %v% "studygroupcol", label = student.nets[[2]] %v% "vertex.names")
    plot(student.nets[[3]], vertex.col = student.nets[[3]] %v% "studygroupcol", label = student.nets[[3]] %v% "vertex.names")
    

    t1

    t2

    t3

    Homophily Model

    studmod1 <- stergm(student.nets, formation = ~edges + nodematch("studygroup"), dissolution = ~edges + nodematch("studygroup"), estimate = "CMLE", times = c(1:3))
    summary(studmod1)
    ==============================
    Summary of formation model fit 
    ==============================
    Monte Carlo MLE Results:
                         Estimate Std. Error MCMC % p-value    
    edges                 -2.3821     0.1941     NA < 1e-04 ***
    nodematch.studygroup   1.2120     0.4281     NA 0.00488 ** 
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
         Null Deviance: 528.2  on 381  degrees of freedom
     Residual Deviance: 240.4  on 379  degrees of freedom 
    AIC: 244.4    BIC: 252.3    (Smaller is better.)
    ================================
    Summary of dissolution model fit
    ================================
    Monte Carlo MLE Results:
                         Estimate Std. Error MCMC % p-value  
    edges                  0.2412     0.4029     NA  0.5512  
    nodematch.studygroup   1.4118     0.5429     NA  0.0111 *
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
         Null Deviance: 112.29  on 81  degrees of freedom
     Residual Deviance:  83.67  on 79  degrees of freedom 
    AIC: 87.67    BIC: 92.46    (Smaller is better.)

    Homophily + Triangle Model

    studmod2 <- stergm(student.nets, formation = ~edges + nodematch("studygroup") + triangle, dissolution = ~edges + nodematch("studygroup") + triangle, estimate = "CMLE", times = c(1:3))
    mcmc.diagnostics(studmod2) # Some problems here.studmod2 <- stergm(student.nets, formation = ~edges + nodematch("studygroup") + triangle, dissolution = ~edges + nodematch("studygroup") + triangle, estimate = "CMLE", times=c(1:3), control = control.stergm(CMLE.MCMC.interval = 500, CMLE.MCMC.burnin = 50000, parallel = 4))mcmc.diagnostics(studmod2) # Not great on tie formation, but generally better.
    studmod2.gof <- gof(studmod2)
    par(mfrow = c(1, 3))
    plot(studmod2.gof$formation)
    plot(studmod2.gof$dissolution)
    

    Homophily + Triangle Tie Formation

    Homophily + Triangle Tie Dissolution

    Homophily + Triangle Model

    Plot one time slice ahead of each wave
    par(mfrow = c(1, 3))
    plot(simulate.stergm(studmod2, nw.start = student.nets[[1]]), vertex.col = student.nets[[1]] %v% "studygroupcol", label = student.nets[[1]] %v% "vertex.names")
    plot(simulate.stergm(studmod2, nw.start = student.nets[[2]]), vertex.col = student.nets[[2]] %v% "studygroupcol", label = student.nets[[2]] %v% "vertex.names")
    plot(simulate.stergm(studmod2, nw.start = student.nets[[3]]), vertex.col = student.nets[[3]] %v% "studygroupcol", label = student.nets[[3]] %v% "vertex.names")
    


    Homophily + Triangle Model

    Plot three time slices ahead of the final wave
    par(mfrow = c(1, 3))studmod2.w3sim <- simulate.stergm(studmod2, nw.start = student.nets[[3]], time.slices = 1)plot(studmod2.w3sim, vertex.col = student.nets[[3]] %v% "studygroupcol", label = student.nets[[3]] %v% "vertex.names")studmod2.w3sim <- simulate.stergm(studmod2, nw.start = as.network(studmod2.w3sim), time.slices = 1)
    plot(studmod2.w3sim, vertex.col = student.nets[[3]] %v% "studygroupcol", label = student.nets[[3]] %v% "vertex.names")
    studmod2.w3sim <- simulate.stergm(studmod2, nw.start = as.network(studmod2.w3sim), time.slices = 1)
    plot(studmod2.w3sim, vertex.col = student.nets[[3]] %v% "studygroupcol", label = student.nets[[3]] %v% "vertex.names")

    t4, t5, t6 Simulated from t3

    Homophily + Triangle Model

    summary(studmod2)
    
    ==============================
    Summary of formation model fit 
    ==============================
    Formula:   ~edges + nodematch("studygroup") + triangle
    Monte Carlo MLE Results:
                         Estimate Std. Error MCMC % p-value    
    edges                 -2.6000     0.2794      0  <1e-04 ***
    nodematch.studygroup   1.0710     0.4228      0  0.0117 *  
    triangle               0.1801     0.1585      0  0.2566    
    ---
         Null Deviance: 528.2  on 381  degrees of freedom
     Residual Deviance: 239.2  on 378  degrees of freedom 
    AIC: 245.2    BIC: 257.1    (Smaller is better.)
    ================================
    Summary of dissolution model fit
    ================================
    Formula:   ~edges + nodematch("studygroup") + triangle
    Monte Carlo MLE Results:
                         Estimate Std. Error MCMC % p-value  
    edges                 -0.2232     0.4231      0  0.5993  
    nodematch.studygroup   0.7851     0.5204      0  0.1355  
    triangle               1.0938     0.4417      0  0.0154 *
    ---
         Null Deviance: 112.29  on 81  degrees of freedom
     Residual Deviance:  77.18  on 78  degrees of freedom 
    AIC: 83.18    BIC: 90.36    (Smaller is better.)
    

    Homophily + Triangle Model

    Interpretation
    • AIC, BIC
      • Worse for formation
      • Better for dissolution
    • Homophily by study group
      • Forms ties
      • No effect on preserving ties
    • Triangle
      • No effect on tie formation
      • Preserves ties


    Goodness of Fit Troubles
    • Overestimated 0 edgewise shared partners in formation
    • Underestimated geodesics in dissolution

    "Augmented" Model

    Changes
    • Formation model
      • Drop triangle, add esp(0)
    • Dissolution model
      • Drop nodematch(), add twopath
    studmod3 <- stergm(student.nets, formation = ~edges + nodematch("studygroup") + esp(0), dissolution = ~edges + triangle + twopath, estimate = "CMLE", times = c(1:3), control = control.stergm(CMLE.MCMC.interval = 500, CMLE.MCMC.burnin = 50000, parallel = 4))
    mcmc.diagnostics(studmod3) # Looks okay!
    par(mfrow = c(1, 3))
    studmod3.w3sim <- simulate.stergm(studmod3, nw.start = student.nets[[3]], time.slices = 1)
    plot(studmod3.w3sim, vertex.col = student.nets[[3]] %v% "studygroupcol", label = student.nets[[3]] %v% "vertex.names")
    studmod3.w3sim <- simulate.stergm(studmod3, nw.start = as.network(studmod3.w3sim), time.slices = 1)
    plot(studmod3.w3sim, vertex.col = student.nets[[3]] %v% "studygroupcol", label = student.nets[[3]] %v% "vertex.names")
    studmod3.w3sim <- simulate.stergm(studmod3, nw.start = as.network(studmod3.w3sim), time.slices = 1)
    plot(studmod3.w3sim, vertex.col = student.nets[[3]] %v% "studygroupcol", label = student.nets[[3]] %v% "vertex.names")
    

    t4, t5, t6 Simulated from t3

    "Augmented" Model

    studmod3.gof <- gof(studmod3)
    par(mfrow = c(1, 3))
    plot(studmod3.gof$formation)
    plot(studmod3.gof$dissolution)
    

    "Augmented" Model Formation

    "Augmented" Model Dissolution


    "Augmented" Model Summary

    summary(studmod3)
    ==============================
    Summary of formation model fit 
    ==============================
    Formula:   ~edges + nodematch("studygroup") + esp(0)
    Monte Carlo MLE Results:
                         Estimate Std. Error MCMC % p-value    
    edges                 -2.4521     0.1854      0 < 1e-04 ***
    nodematch.studygroup   1.0978     0.4190      0 0.00915 ** 
    esp0                  -0.7903     0.3546      0 0.02641 *  
    ---
         Null Deviance: 528.2  on 381  degrees of freedom
     Residual Deviance: 234.0  on 378  degrees of freedom 
    AIC: 240    BIC: 251.9    (Smaller is better.)
    ================================
    Summary of dissolution model fit
    ================================
    Formula:   ~edges + triangle + twopath
    Monte Carlo MLE Results:
             Estimate Std. Error MCMC %  p-value    
    edges      1.6963     0.7737      0 0.031326 *  
    triangle   1.7505     0.4486      0 0.000201 ***
    twopath   -0.3670     0.1376      0 0.009285 ** 
    ---
         Null Deviance: 112.29  on 81  degrees of freedom
     Residual Deviance:  69.35  on 78  degrees of freedom 
    AIC: 75.35    BIC: 82.53    (Smaller is better.)

    "Augmented" Model Summary

    Interpretation
    • AIC and BIC improved
    • Previous effects retained
    • Tie formation avoids creating edges with 0 shared partners
    • Two-paths are unlikely to last

    Additional Capacities of STERGM

    • Cross-sectional network modeling
      • Tie formation, dissolution inferred from a single network
      • Requires estimate of mean relationship perseverance
        • Edges' term equals log(mean perseverance - 1)
    • Estimation of network properties from ego network statistics
      • See discussion earlier
      • Requires estimation of mean relationship perseverance

    Alternative Approaches to Longitudinal Networks

    Siena

    Snijders, van de Bunt, Steglich. 2010. Social Networks.

    Software: RSiena

    • Stochastic actor-based models
      • Models how actors can change their outgoing ties
    • Network panel data
    • Closely in line with agent-based models
    • Can model social behavior in addition to tie formation

    Relational Event Models

    Butts. 2008. Sociological Methodology.
    Software: relevent

    • Events occur in real time
    • Events are of a certain type
    • An event's occurrence can affect the likelihood of future events
      • Increase or decrease
    • Tie formation can be one such event
    • Events can be transitive, accumulate, reciprocal, etc.
    • Highly useful for communication networks (among others)

    networkDynamic


    Part of statnet

    • Visualization of dynamic networks
    • Storing fully dynamic network data

    NetModelsWS20140627

    By Benjamin Lind

    NetModelsWS20140627

    • 3,163