Ising models (10) GMRF (8) Probability of error (5) Error exponent (4) Geo (4) Hypothesis testing (3) Urban data (3) p-value (3) Ising models. (2) MAP detector (2) MSE (2) Partial partition functions (2) Star network (2) CAR (1) Chain graph (1) Complete graph (1) Curie temp (1) ERGM (1) Imputation (1) Intro (1) KDD (1) KDD Cup 2012 (1) Large deviation analysis (1) MRF (1) Majority vote detection (1) R (1) SAR (1) SVR (1) Spectral radius (1) Stability analysis (1) Stat physics (1) Sufficient Stats (1) magnetization (1)
On KDD Cup-2012!
Well, the problem stated in track-1 seems really interesting and data-rich (Although Track-2 is more pertinent to my Ph.D research). Not yet sure what I will eventually focus on, but this time around, I am sorta hell-bent on giving it a shot.
The social network presented in the challenge is a twitter-like network (Tencent Weibo, one of the largest micro-blogging websites in China) that is, 'in-degree heavy' which hints towards the presence of super-users with large fan following. I have included the plots of the 'IN' and 'OUT'-degree distributions which amply illustrates this common user|super-user dichotomy.
In case, some one is interested about the exact stats, the graph has 2421058 Vertices (~2.5 million) and 50655143(~50 million) Edges. Here is an R snippet that will help you plot these.
PS:"user_sns.txt" is the edge list you can download with the data.
plot(degree.distribution(g, mode="in"), log="xy")
plot(degree.distribution(g, mode="out"), log="xy")