Community Detection in Networks (Modularity) 1

DC大冒险

鲁菜粤菜爱好者，天天运动实践家。

Part 1: History, Keywords, Metric, and Non-overlapping graphs

Granovetter’s theory

To study this theory, we need to know the following concepts.

Strong-Tie: a close circle of family, friends, partners. There are lots of connections and interactions with close social space, which represents distance or space scope.

Weak-Tie: a loose circle of acquaintance. There are many objects with broad social space, but rarely communication.

The strong tie is more stable than the weak tie, but it contributes less than the weak tie to people's careers. Nowadays, the weak tie has high interest and return but costs less.

To better words, Granovetter’s theory, also known as the Weak tie theory, is the proposition that acquaintances are likely to be more influential than close friends, particularly in social networks .

Weak tie theory derives from Nick Granovetter's 1973 article "The Strength of Weak Ties," which was about the spread of information through social networks.

Triadic closure

The principle of triadic closure is that if two people in a social network have a friend in common, then there is an increased probability that they will become friends at some point in the future. For example, if there were three people within a network, A, B, and C, if A and B are friends, and B and C are friends, there is a high possibility that B and C are also friends. If each person were represented as a node, these 3 nodes combined would form a triadic closure.

Clustering Coefficient
Within graph theory, a clustering coefficient is the measure of the degree to which individual nodes within a graph cluster together. For example, the clustering coefficient of a node titled A is the probability that two randomly selected friends of node A, are friends with each other. This is a social network metric used to measure the prevalence of triadic closure.

Another field derives from here, Network Economics .

Modularity

It is proposed by Newman in 2003. He also designed the Girvan-Newman (G-N) algorithm and the Fast Newman (F-B) algorithm, both of which belong to the community division algorithm. Modularity is included in the F-B.

Here, Community is a set of nodes in the network, which have more links or connections within them than that outside nodes. Its drawback is to suit the bigger community.

According to the theory, Modularity is equal to the number of connected edge numbers within the group subtract the expect of that. The sum of the Expect of all node pairs is connected edge number is equal to the total edge number.

The drawback of earlier work: it is hard to decide the best community division. Here, modularity is to judge the quality of the division results.

The following is the calculation formula.

Q=\frac{1}{2m}\sum_{vw}[A_{vw}-\frac{k_{v}k_{w}}{2m}]\delta(C_{v},C_{w}) , the higher the Q is, the better the division is. Usually, Q\in [-0.5, 1) .

Let e_{ij}=\sum_{vw}\frac{A_{vw}}{2m} , a_{i}=\frac{k_{i}}{2m}=\sum_{j}C_{ij} , then Q=\sum^{c}_{i=1}(e_{ij}-a^{2}_{i}) .

Or let B_{ij}=A_{ij}-\frac{k_{i}k_{j}}{2m} , then Q=\frac{1}{2m}Tr(S^{T}BS) .

Currently, Louvain is one of the best community detection algorithms. It derives from the Greedy algorithm and supports the weighted graph. Moreover, it has better division performance and converges faster. The complexity is O(N(logN)) .

The other community detection algorithms include the Kernighan-Lin algorithm, Surprise Community Detection, Leiden Community Detection, and Walktrap Community Detection.

How to Dividing Networks into More than Two Communities

Step 1: Divide the nodes into two parts, delete the links between two groups.

Step 2: Split the groups into four parts until all communities become indivisible subgraphs.

If there aren't better subgraphs with higher network modularity, or they cannot equally provide \Delta Q positive value, the graph doesn't need extra division.

模块度(Modularity)与Fast Newman算法讲解与代码实现

Community Detection – Modularity的概念

Modularity（模块性） - 极客分享

小李飞镖：CS224W 图机器学习自学笔记5 - Communities

网络中的模块化和社区结构(Modularity and community structure in networks)

图算法之Community Detection

Modularity and community structure in networks

Walk modularity and community structure in networks

Community Detection Algorithms - Towards Data Science

Louvain algorithm

We briefly introduced the Louvain algorithm in the above section. Here are more details.

The Louvain algorithm has two steps: Community Merging and Graph Reconstruction. They run alternately unless the whole graph gets merged into one community or the modularity doesn't change any more.

To better words,

Step 1: View each original node as an independent and unique community. The weight of the connection edge within the community is zero too.

Step 2: The algorithm firstly scans all the nodes, then traverses all neighbored nodes and calculates the change of modularity Δ before and after assigning this node into this community. Next, if max \Delta Q>0 , adding the node with the maximum modularity change joins the community, otherwise, we switch the next node. We do this repeatedly on every node until reaching the steady-state.

Step 3: The algorithm folds and compresses the communities generated from step 2 as a node, then calculates the connected edge weight within and between the newly generated communities for the next round.

Trying another format:

First, the method looks for "small" communities by optimizing modularity locally. Second, it aggregates nodes belonging to the same community and builds a new network whose nodes are the communities. These steps are repeated iteratively until a maximum of modularity is attained and a hierarchy of communities is produced.

Related Formula

Q=\sum_{c}[\frac{\sum in}{2m}-(\frac{\sum tot}{2m})^{2}]=\sum_{c}(e_{c}-a^{2}_{c})

\sum in represents the sum of edge weight within the community; \sum tot represents the sum of edge weight that links with the node in the community.

Case: Modularity for Overlapping Community

Q=\frac{1}{2m}\sum_{ij}\frac{1}{O_{i}O_{j}}(A_{ij}-\frac{k_{i}k_{j}}{2m})\delta(C_{i},C_{j}) . This is for undirected unweighted graphs (Shen et al, 2009).

Q=\frac{1}{m}\sum^{n_{c}}_{c=1}\sum_{ij}[r_{ijc}A_{ij}-s_{ijc}\frac{k^{out}_{i}k^{in}_{j}}{m}] . This is for directed unweighted graphs (Nicosia et al, 2009).

Difference between Louvain and Hierarchical clustering

Different traversing orders may generate different community merging results. Experiments show that traversing orders has little influence on the community division result of the whole network.

模块度与Louvain社区发现算法 - CodeMeals - 博客园

马东什么：louvain算法

Louvain 算法原理

Louvain 算法原理及设计实现

An Improved Louvain Algorithm for Community Detection

The Louvain method for community detection in large networks