Graph Exercise

Overview

In this part of the workshop, you will:

Facebook

Taken from Prof. Rodger's problem solving course and Prof. Forbes's CS0 course.

Consider Facebook, an online directory that connects people.

Lets consider the connection by friends. On Facebook, a friend is someone who mutually agrees with you that the two of you are friends. You are listed as one of their friends and they are listed as one of your friends.

Suppose we draw a graph in which each node in the graph is a person at Duke and each edge in the graph is a connection between two people at Duke who are friends (determined by Facebook). If we could collect the data of everyone's list of friends at Duke, we could show a large graph of all the Duke friend connections. Our goal is to determine who is in the center of the graph, and to determine if the center person has the most friends or not.

We will work with a much smaller graph to study this problem. Consider the following eight people and their friend connections.

A: Number of friends

In a graph the degree of a node is the number of edges attached to the node. When the graph represents the connections between friends, then the degree is the same as the number of friends.

For each person in the graph, determine their number of friends.

  1. Arin
  2. Betsy
  3. Chris
  4. Daoxin
  5. Erich
  6. Fran
  7. Geoff
  8. Hannah





B: Shortest Path Distance

The shortest path distance between two nodes in a graph is the fewest number of edges one must walk along to get from one node to the next. For example, to get from Chris to Erich, there are several paths. One path goes through Betsy and then Geoff and is of length 3 (3 edges are traversed: Chris to Betsy, Betsy to Geoff, Geoff to Erich). There is a shorter path of length 2 through Fran (Chris to Fran, Fran to Erich). There are many more paths.

List the shortest path distance between every pair of names. The shortest path between a name and itself is zero and has been labeled for you.
Names: ArinBetsy Chris Daoxin Erich Fran Geoff Hannah
Arin.0.......
Betsy ..0......
Chris...0.....
Daoxin....0....
Erich.....0...
Fran......0..
Geoff.......0.
Hannah........0








C: Algorithm for shortest path

Write an algorithm to find the shortest path from one node in a graph to all the other nodes. Assume there exists a path from that node to all the other nodes in the graph.






















D: Center of a graph

For each node in the graph above, calculate the sum of the shortest distances to all the other nodes in the graph. The node with the smallest sum is the center of the graph.

  1. Arin
  2. Betsy
  3. Chris
  4. Daoxin
  5. Erich
  6. Fran
  7. Geoff
  8. Hannah

Which node is the center of the graph?

Does the center of the graph have the most friends (i.e. more than everybody else)?









E: Does Center of a graph mean the most friends?

Is the center of the graph always the person with the most friends? Draw a graph in which the center of the graph does not have the most friends. Your graph should have fewer than 12 nodes.

















F: Diameter

The diameter of a graph is the longest shortest path between any two vertices. In other words, the diameter of a graph is the greatest distance between any two vertices. The diameter gives a measure of the breadth of a network. For example, the following paper discusses the diameter of the World Wide Web.

Réka Albert, Hawoong Jeong and Albert-László Barabási, Diameter of the World Wide Web, Nature 401, 130 (1999)

What is the diameter of the graph above? How can you efficiently calculate the diameter of a graph?




G: Degree Distribution

Students from the Computer Science 1 course submitted lists of their friends on Facebook. From that, we created a network. Each vertex is a different person/friend on Facebook and there exists an edge between two people if those people are friends. The graph of connections is listed in the friends.gdf file in the DukeGUESS folder. From the 57 students that submitted, there were 12,241 total distinct friend IDs and 15,710 connections among those friends. 4,552 of those IDs belong to Duke people.

  • The degree distribution of a network tells us something about how connected individual vertices and can give us insight into how the relationships between the entities that the vertices represent.

    Below is a list of all of the degrees for each of the 57 submitted vertices from the class.

    13   133   223   274   345   500
    26   168   232   280   345   500
    45   171   246   284   361   501
    45   178   249   296   364   501
    75   182   250   297   404   502
    75   193   253   310   417   502
    86   193   266   317   470   503
    92   194   268   322   473
    98   213   273   323   497
    110  218   273   325   500
    

    This data can also be viewed as a histogram in graphical form.

    Histogram for Duke friends' degree distribution:

    Degree	Frequency
    1	2482
    2	1163
    3	552
    4	198
    5	67
    6	23
    7	6
    8	2
    9	1
    10	0
    11	1
    

    Histogram for non-Duke friends' degree distribution:

    Degree	Frequency
    1	7522
    2	164
    3	3
    
    1. Explain why the degree distribution is so different from the vertices from people in the class, their friends at Duke, and their friends outside of Duke.
      
      
      
      
      
    2. How would you expect the degree distributions to change, if at all, if the number of people in the course grew to 300?
      
      
      
      
  • There are approximately 6,000 students at Duke. Estimate what the degree distribution if all students from the university submitted their list of friends.
    
    
    
    
    
    
    
  • Now consider a graph of just links within the class pictured in the graph below.

    1. Can a student determine what vertex they are within this graph? Why or why not?
      
      
      
      
      
      
    2. A clique is a set of nodes where each is directly connected to another. What is the largest clique in this graph (# nodes)?
      
      
      
      
    3. A social circle or connected component is a set of nodes where there exists a path between all nodes in the component. What is the largest component (i.e. how many nodes?)? What factors, technical or sociological, do you think determine whether someone is in the large connected component?
      
      
      
      

    Jeffrey R.N. Forbes
    Last modified: Tue Jul 11 08:29:20 EDT 2006