此作业存在三个问题。

对于问题1,您将使用四个真实世界的网络。其中三个是从Mark Newman收集的网络数据(http://www-personal.umich。edu /〜mejn / netdata /)中选择的网络。集合中的网络具有特定的描述性名称。为此任务选择的三个网络称为:(1)政治博客,(2)神经网络和(3)互联网。阅读那里的简要说明,以便您了解网络正在建模的内容。数据集采用GML格式,这是igraph包可以理解的格式。

您可以为问题1自由选择第四个现实世界网络,但建议将其作为社交或信息网络。示例包括特定社区/上下文(学校,工作场所,监狱等)中的社交网络,Facebook网络,Twitter网络,who-calls-whom网络,who-trusts-whom网络,引文网络等。

对于问题1和2,您应该使用软件工具igraph。对于某些计算和绘图,您可能需要使用Numpy(这是使用Python进行科学计算的程序包),MatLab或类似的环境。问题3不需要任何软件工具。这是一种基于阅读论文的论文型问题。

您提交的文件:将是一个PDF文件,其中包含您的解决方案和一个描述您的过程的附录(例如Python / R / Matlab代码清单)。

问题1(45%)。首先,简要描述您选择用于解决此问题的第四个现实世界网络,并提供指向其位置的URL。对于您研究的四个真实世界网络中的每一个,根据以下集中度度量来确定网络中得分最高的节点:(i)度,(ii)偏心度,(iii)紧密度,(iv )之间的间隔,(v)卡茨指数,(vi)PageRank,(vii)克莱因伯格的权威度得分和(viii)克莱因伯格的Hub得分。

尝试找出通过这些措施确定为最重要的节点在网络中可能很重要的方式。在某些情况下,不同的中心度度量将同一节点识别为最重要的节点吗?讨论您的结果。

问题2(45%)。使用igraph的方法为Erdos-Renyi和Barabasi-Albert(优先附件)模型生成两种类型的随机图。 (Barabasi-Albert模型是一种简单的随机算法,可生成无标度图,该图具有幂律分布。)在每种情况下,使节点数为20。在两个模型中选择参数,以便您生成的两个图的边数大致相同。以相同的方式,生成另一组两个图,这次分别是节点数为40,而不是20。

1.对于您生成的四个图形中的每个图形,计算图形的拉普拉斯算子的所有特征值和相应的特征向量(例如,使用MatLab(或Numpy)的eig

1个

功能)。不要在提交中包含此计算的结果,但是您需要后续子问题的结果。

为每个图形填写下表。 (如果生成的图形包含多个组件,则在填充表时,请考虑最大的连接组件。)

图n m dmin dmax l D ccgλ2λn

其中n是节点数; m是边数; dmin是图中的最小度; dmax是图中的最大度数; l是平均路径长度; D是直径; ccg是全局聚类系数; λ2是第二小的特征值(代数连通性); λn是最大特征值。讨论您从这些数据中得出的观察结果。

对于此子问题,您将专注于两个模型中每个模型的20个节点的图形。您将生成两个图形,每个模型一个。在每个图中,将两个量一起绘制:对应于第二最小特征值(λ2)的特征向量和对应于最大特征值(λn)的特征向量。在图中,x轴将显示顶点id,y轴将显示特征向量的值。比较和对比两个图的图,并讨论您所做的任何观察。

问题3(10%)。阅读David Gleich撰写的SIAM评论文章“超越网络的PageRank”的前四部分(请注意,该文章的副本已发布在Canvas的课程页面上的模块PageRank下)。从本文第4节中讨论的PageRank的广泛应用中,选择三个您个人认为最有趣(或有趣)的内容,并告诉我原因。对于您选择的每个应用程序,可以使用几句话说明您的理由,但是如果您愿意,欢迎再多写下几句话。

CptS 591: Elements of Network Science Spring 2021

Assignment 2: Centrality and Ranking. Due by: Tue, Mar. 16, 2021, 11:59pm

There are three problems in this assignment.

For Problem 1, you will work with four real-world networks. Three of these are networks selected from Mark Newman’s collection of Network Data (http://www-personal.umich. edu/~mejn/netdata/). The networks in the collection are given certain descriptive names. The three networks chosen for this assignment are called: (1) Political blogs, (2) Neural network, and (3) Internet. Read the brief descriptions there so you get an idea for what the networks are modeling. The data sets are in GML format, which is a format the package igraph understands.

You are free to choose the fourth real-world network for Problem 1, but it is recommended to be a social or information network. Examples include a social network in a certain community/context (school, workplace, prison, etc ), a Facebook network, a twitter network, a who-calls-whom network, who-trusts-whom network, a citation network, etc.

For Problems 1 and 2, you are expected to use the software tool igraph. For some of the calculations and plots you may need to use Numpy (which is a package for scientific computing with Python), MatLab or a similar environment. Problem 3 does not require any software tool. It is an essay-type question based on reading a paper.

Your submission: will be one PDF file consisting of your solutions and an appendix describing your procedures (e.g. listing of Python/R/Matlab codes).

Problem 1 (45%). First, briefly describe the fourth real-world network you chose to work with for this problem and provide a URL to where it is located. For each of the four real-world networks you study, identify the node(s) in the network with the highest scores in terms of the following centrality measures: (i) Degree, (ii) Eccentricity, (iii) Closeness, (iv) Betweenness, (v) Katz index, (vi) PageRank, (vii) Kleinberg’s Authority score, and (viii) Kleinberg’s Hub score.

Try to find out in what way the nodes identified by these measures as the most important might be important in the network. Are there cases in which the different centrality measures identify the same node(s) as the most important? Discuss your results.

Problem 2 (45%). Generate two types of random graphs using igraph’s methods for the Erdos-Renyi and Barabasi-Albert (preferential attachment) models. (The Barabasi-Albert model is a simple stochastic algorithm that generates a scale-free graph, a graph with a Power-Law degree distribution.) Let the number of nodes in each case be 20. Choose the parameters in the two models such that the two graphs you generate have roughly the same number of edges. In the same fashion, generate another set of two graphs, this time with the number of nodes in each case being 40, instead of 20.

1. For each of the four graphs you generated, compute all eigenvalues and corresponding eigenvectors of the Laplacian of the graph (e.g. using MatLab’s (or Numpy’s) eig

1

function). Do not include the results of this computation in your submission, but you will need the result for the subsequent subproblems.

  1. Complete the following table for each graph. (In case the graph you generated consists of more than one component, in populating the table, consider the largest connected component.) Graph n m dmin dmax l D ccg λ2 λn

    where n is the number of nodes; m is the number of edges; dmin is the minimum degree in the graph; dmax is the maximum degree in the graph; l is the average path length; D is the diameter; ccg is the global clustering coefficient; λ2 is the second smallest eigenvalue (the algebraic connectivity); and λn is the largest eigenvalue. Discuss the observations you make out of these data.

  2. For this subproblem you will focus on the 20-nodes graphs in each of the two models. You will generate two figures, one for each model. In each figure, plot two quantities together: the eigenvector corresponding to the second-smallest eigenvalue (λ2) and the eigenvector corresponding to the largest eigenvalue (λn). In the plots, the x-axis would show vertex ids and the y-axis shows value of the eigenvector. Compare and contrast the plots of the two graphs, and discuss any observations you make.

Problem 3 (10%). Read the first four sections of the SIAM Review article “PageRank Beyond the Web” by David Gleich (note a copy of the article is posted on the course’s page on Canvas under the Module PageRank). From the very wide range of applications of PageRank discussed in section 4 of the article, pick three that you personally found most fascinating (or interesting) and tell me why. For each application you picked, a couple of sentences stating your reasons is adequate, but you are welcome to write a few more sentences, if you wish.