Given a connected graph G=(V,E), where V denotes the set of nodes and E the set of edges of the graph, the length (that is, the number of edges) of the shortest path between two nodes v and w is denoted by d(v,w). The closeness centrality of a vertex v is then defined as the sum of its distances from all other nodes divided by the number of nodes minus one. This measure is widely used in the analysis of real-world complex networks, and the problem of selecting the k most central vertices has been deeply analysed in the last decade. However, this problem is computationally not easy, especially for large networks: in the first part of the paper, we prove that it is not solvable in time O(|E|^{2-epsilon}) on directed graphs, for any constant epsilon>0, under reasonable complexity assumptions. Furthermore, we propose a new algorithm for selecting the k most central nodes in a graph: we experimentally show that this algorithm improves significantly both the textbook algorithm, which is based on computing the distance between all pairs of vertices, and the state of the art. For example, we are able to compute the top k nodes in few dozens of seconds in real-world networks with millions of nodes and edges. Finally, as a case study, we compute the 10 most central actors in the IMDB collaboration network, where two actors are linked if they played together in a movie, and in the Wikipedia citation network, which contains a directed edge from a page p to a page q if p contains a link to q.
Computing top-k Closeness Centrality Faster in Unweighted Graphs
Crescenzi, Pierluigi;
2019-01-01
Abstract
Given a connected graph G=(V,E), where V denotes the set of nodes and E the set of edges of the graph, the length (that is, the number of edges) of the shortest path between two nodes v and w is denoted by d(v,w). The closeness centrality of a vertex v is then defined as the sum of its distances from all other nodes divided by the number of nodes minus one. This measure is widely used in the analysis of real-world complex networks, and the problem of selecting the k most central vertices has been deeply analysed in the last decade. However, this problem is computationally not easy, especially for large networks: in the first part of the paper, we prove that it is not solvable in time O(|E|^{2-epsilon}) on directed graphs, for any constant epsilon>0, under reasonable complexity assumptions. Furthermore, we propose a new algorithm for selecting the k most central nodes in a graph: we experimentally show that this algorithm improves significantly both the textbook algorithm, which is based on computing the distance between all pairs of vertices, and the state of the art. For example, we are able to compute the top k nodes in few dozens of seconds in real-world networks with millions of nodes and edges. Finally, as a case study, we compute the 10 most central actors in the IMDB collaboration network, where two actors are linked if they played together in a movie, and in the Wikipedia citation network, which contains a directed edge from a page p to a page q if p contains a link to q.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.