Community Detection in Networks (Modularity) 1
Part 1: History, Keywords, Metric, and Non-overlapping graphs
Granovetter’s theory
To study this theory, we need to know the following concepts.
Strong-Tie: a close circle of family, friends, partners. There are lots of connections and interactions with close social space, which represents distance or space scope.
Weak-Tie: a loose circle of acquaintance. There are many objects with broad social space, but rarely communication.
The strong tie is more stable than the weak tie, but it contributes less than the weak tie to people's careers. Nowadays, the weak tie has high interest and return but costs less.
To better words, Granovetter’s theory, also known as the Weak tie theory, is the proposition that acquaintances are likely to be more influential than close friends, particularly in social networks .
Weak tie theory derives from Nick Granovetter's 1973 article "The Strength of Weak Ties," which was about the spread of information through social networks.
Triadic closure
The principle of triadic closure is that if two people in a social network have a friend in common, then there is an increased probability that they will become friends at some point in the future. For example, if there were three people within a network, A, B, and C, if A and B are friends, and B and C are friends, there is a high possibility that B and C are also friends. If each person were represented as a node, these 3 nodes combined would form a triadic closure.
Clustering Coefficient
Within graph theory, a clustering coefficient is the measure of the degree to which individual nodes within a graph cluster together. For example, the clustering coefficient of a node titled A is the probability that two randomly selected friends of node A, are friends with each other. This is a social network metric used to measure the prevalence of triadic closure.
Another field derives from here, Network Economics .
Modularity
It is proposed by Newman in 2003. He also designed the Girvan-Newman (G-N) algorithm and the Fast Newman (F-B) algorithm, both of which belong to the community division algorithm. Modularity is included in the F-B.
Here, Community is a set of nodes in the network, which have more links or connections within them than that outside nodes. Its drawback is to suit the bigger community.
According to the theory, Modularity is equal to the number of connected edge numbers within the group subtract the expect of that. The sum of the Expect of all node pairs is connected edge number is equal to the total edge number.
The drawback of earlier work: it is hard to decide the best community division. Here, modularity is to judge the quality of the division results.
The following is the calculation formula.
Q=\frac{1}{2m}\sum_{vw}[A_{vw}-\frac{k_{v}k_{w}}{2m}]\delta(C_{v},C_{w}) , the higher the Q is, the better the division is. Usually, Q\in [-0.5, 1) .
Let e_{ij}=\sum_{vw}\frac{A_{vw}}{2m} , a_{i}=\frac{k_{i}}{2m}=\sum_{j}C_{ij} , then Q=\sum^{c}_{i=1}(e_{ij}-a^{2}_{i}) .
Or let B_{ij}=A_{ij}-\frac{k_{i}k_{j}}{2m} , then Q=\frac{1}{2m}Tr(S^{T}BS) .
Currently, Louvain is one of the best community detection algorithms. It derives from the Greedy algorithm and supports the weighted graph. Moreover, it has better division performance and converges faster. The complexity is O(N(logN)) .
The other community detection algorithms include the Kernighan-Lin algorithm, Surprise Community Detection, Leiden Community Detection, and Walktrap Community Detection.
How to Dividing Networks into More than Two Communities
Step 1: Divide the nodes into two parts, delete the links between two groups.
Step 2: Split the groups into four parts until all communities become indivisible subgraphs.
If there aren't better subgraphs with higher network modularity, or they cannot equally provide \Delta Q positive value, the graph doesn't need extra division.
模块度(Modularity)与Fast Newman算法讲解与代码实现
Community Detection – Modularity的概念
小李飞镖:CS224W 图机器学习 自学笔记5 - Communities
网络中的模块化和社区结构(Modularity and community structure in networks)
Modularity and community structure in networks
Walk modularity and community structure in networks
Community Detection Algorithms - Towards Data Science
Louvain algorithm
We briefly introduced the Louvain algorithm in the above section. Here are more details.
The Louvain algorithm has two steps: Community Merging and Graph Reconstruction. They run alternately unless the whole graph gets merged into one community or the modularity doesn't change any more.
To better words,
Step 1: View each original node as an independent and unique community. The weight of the connection edge within the community is zero too.
Step 2: The algorithm firstly scans all the nodes, then traverses all neighbored nodes and calculates the change of modularity Δ before and after assigning this node into this community. Next, if max \Delta Q>0 , adding the node with the maximum modularity change joins the community, otherwise, we switch the next node. We do this repeatedly on every node until reaching the steady-state.
Step 3: The algorithm folds and compresses the communities generated from step 2 as a node, then calculates the connected edge weight within and between the newly generated communities for the next round.
Trying another format:
First, the method looks for "small" communities by optimizing modularity locally. Second, it aggregates nodes belonging to the same community and builds a new network whose nodes are the communities. These steps are repeated iteratively until a maximum of modularity is attained and a hierarchy of communities is produced.
Related Formula
Q=\sum_{c}[\frac{\sum in}{2m}-(\frac{\sum tot}{2m})^{2}]=\sum_{c}(e_{c}-a^{2}_{c})
\sum in represents the sum of edge weight within the community; \sum tot represents the sum of edge weight that links with the node in the community.
Case: Modularity for Overlapping Community
Q=\frac{1}{2m}\sum_{ij}\frac{1}{O_{i}O_{j}}(A_{ij}-\frac{k_{i}k_{j}}{2m})\delta(C_{i},C_{j}) . This is for undirected unweighted graphs (Shen et al, 2009).
Q=\frac{1}{m}\sum^{n_{c}}_{c=1}\sum_{ij}[r_{ijc}A_{ij}-s_{ijc}\frac{k^{out}_{i}k^{in}_{j}}{m}] . This is for directed unweighted graphs (Nicosia et al, 2009).
Difference between Louvain and Hierarchical clustering
Different traversing orders may generate different community merging results. Experiments show that traversing orders has little influence on the community division result of the whole network.
模块度与Louvain社区发现算法 - CodeMeals - 博客园
An Improved Louvain Algorithm for Community Detection
The Louvain method for community detection in large networks
Glossary
Contemplate | 思考 |