Saturday, September 15, 2012

Preferential attachment for network

I am currently taking the networked life course on Coursera.org offered by Professor Michael Kearns from the University of Pennsylvania.  I have been took several courses including machine learning, natural language processing since the platform was launched late last year. I have to give my biggest praise to Andrew Ng and his team for creating such platform and enable knowledge sharing to those who have no access to it.

Nevertheless, I want to talk about and demonstrate a type of simulated network model called preferential attachment introduced in the course.

The idea of this particular network is that individuals are more likely to be connected to others who already has a large connection (degree), this is typical for social network and many real life situation. The more connection one has, the more exposed the individual is to the rest of the world.

This model is able to mimic the small diameter, high cluster and heavy-tailed degree distribution of typical large networks.

For brevity, we will just focus on the heavy-tail property and power-law decay of the degree distribution of the network. Where a large number of individuals has small number of connection while a small number of individuals have a much much greater number of connections.

 The following simulation simulates a preferential attachment network and plot the histogram and the log-log relationship.  The simulation is based on the same example that N number of individuals chooses from 1000 night clubs.



library(animation)
saveHTML({
  N = 1000
  for(i in 1:100){
    m = 100 * i
    rlt = rep(1, N)
    for(i in 1:m){
      selection = sample(x = 1:N, size = 1, prob = rlt)
      rlt[selection] = rlt[selection] + 1
    }
    rlt = rlt - 1
        par(mfrow = c(1, 2))
    bin = hist(rlt, breaks =  m/10,
      main = paste("Histogram of preferential network\n with ", m,
        " individuals choosing\nbetween 1000 clubs", sep = ""),
      xlab = "Number of Individuals", ylab = "Frequency")
    plot(log(bin$breaks[-1]), log(bin$counts),
         main = "log-log histogram", xlab = "", ylab = "")
    lm.df = data.frame(lbin = log(bin$breaks[-1]), lcounts = log(bin$counts))
    lm.df = lm.df[lm.df$lcounts != -Inf & lm.df$lcounts != 0 , ]
    fit = lm(lcounts ~lbin, data = lm.df)
    abline(coef = coef(fit))
  }
})



From the simulation, we can see the histogram becomes more and more heavy-tailed as we increase the number of individuals/vertex in the network, but I remain sceptical about the linear relationship in the log-log plot which is an indicator of the power law decay.

No comments:

Post a Comment