Basic Python & Guess

Overview

In this part of the workshop, you will:
  1. Learn some of the basics of Python, so that you can manipulate graphs in GUESS
  2. Write Gython code to calculate shortest paths between nodes and the diameter of the graph

Acknowledgment

This tutorial is taken mostly from Magnus Lie Hetland's excellent Instant Python tutorial and the Python Software foundation's Implementing Graphs article.

Setup

  1. Open up Eclipse.
  2. Open up DukeGUESS
  3. Create a new PyDev project in Eclipse

The Basics

To begin with, think of Python as pseudo-code. It's almost true. Variables don't have types, so you don't have to declare them. They appear when you assign to them, and disappear when you don't use them anymore. Assignment is done by the = operator. Equality is tested by the == operator. You can assign several variables at once:

x,y,z = 1,2,3 first, second = second, first a = b = 123

Blocks are indicated through indentation, and only through indentation. (No BEGIN/END or braces.) Some common control structures are:

    if x < 5 or (x > 10 and x < 20):
        print "The value is OK."

    if x < 5 or 10 < x < 20:
        print "The value is OK."

    for i in [1,2,3,4,5]:
        print "This is iteration number", i


    x = 10
    while x >= 0:
        print "x is still not negative."
        x = x-1

The first two examples are equivalent.

The index variable given in the for loop iterates through the elements of a list (written as in the example). To make an "ordinary" for loop (that is, a counting loop), use the built-in function range().

    # Print out the values from 0 to 99 inclusive.
    for value in range(100):
        print value

(The line beginning with "#" is a comment, and is ignored by the interpreter.)

Okay; now you know enough to (in theory) implement any algorithm in Python. Let's add some basic user interaction. To get input from the user (from a text prompt), use the builtin function input.

    x = input("Please enter a number: ")
    print "The square of that number is", x*x

The input function displays the prompt given (which may be empty) and lets the user enter any valid Python value. In this case we were expecting a number – if something else (like a string) is entered, the program would crash. To avoid that we would need some error checking. I won't go into that here; suffice it to say that if you want the user input stored verbatim as a string (so that anything can be entered), use the function raw_input instead. If you wanted to convert the input string s to an integer, you could then use int(s).

Note: If you want to input a string with input, the user has to write the quotes explicitly. In Python, strings can be enclosed in either single or double quotes.

So, we have control structures, input and output covered – now we need some snazzy data structures. The most important ones are lists and dictionaries. Lists are written with brackets, and can (naturally) be nested:

    name = ["Cleese", "John"]

    x = [[1,2,3],[y,z],[[[]]]]

One of the nice things about lists is that you can access their elements separately or in groups, through indexing and slicing. Indexing is done (as in many other languages) by appending the index in brackets to the list. (Note that the first element has index 0).

    print name[1], name[0] # Prints "John Cleese"

    name[0] = "Smith"

Slicing is almost like indexing, except that you indicate both the start and stop index of the result, with a colon (":") separating them:

    x = ["spam","spam","spam","spam","spam","eggs","and","spam"]

    print x[5:7] # Prints the list ["eggs","and"]

Notice that the end is non-inclusive. If one of the indices is dropped, it is assumed that you want everything in that direction. I.e. list[:3] means "every element from the beginning of list up to element 3, non-inclusive." (It could be argued that it would actually mean element 4, since the counting starts at 0... Oh, well.) list[3:] would, on the other hand, mean "every element from list, starting at element 3 (inclusive) up to, and including, the last one." For really interesting results, you can use negative numbers too: list[-3] is the third element from the end of the list...

While on the subject of indexing, you might be interested to know that the built-in function len gives you the length of a list.

Now, then – what about dictionaries? To put it simply, they are like lists, only their contents are not ordered. How do you index them then? Well, every element has a key, or a "name" which is used to look up the element just like in a real dictionary. A couple of example dictionaries:

    { "Alice" : 23452532, "Boris" : 252336,
      "Clarice" : 2352525, "Doris" : 23624643}

    person = { 'first name': "Robin", 'last name': "Hood",
               'occupation': "Scoundrel" }

Now, to get person's occupation, we use the expression person["occupation"]. If we wanted to change his last name, we could write:

    person['last name'] = "of Locksley"

Simple, isn't it? Like lists, dictionaries can hold other dictionaries. Or lists, for that matter. And naturally lists can hold dictionaries too. That way, you can easily make some quite advanced data structures.

Functions

Next step: Abstraction. We want to give a name to a piece of code, and call it with a couple of parameters. In other words – we want to define a function (or "procedure"). That's easy. Use the keyword def like this:

    def square(x):
        return x*x

    print square(2) # Prints out 4

For those of you who understand it: When you pass a parameter to a function, you bind the parameter to the value, thus creating a new reference. If you change the "contents" of this parameter name (i.e. rebind it) that won't affect the original. This works just like in Java, for instance. Let's take a look at an example:

    def change(some_list):
        some_list[1] = 4

    x = [1,2,3]
    change(x)
    print x # Prints out [1,4,3]

As you can see, it is the original list that is passed in, and if the function changes it, these changes carry over to the place where the function was called. Note, however the behavior in the following example:

    def nochange(x):
        x = 0

    y = 1
    nochange(y)
    print y # Prints out 1

Why is there no change now? Because we don't change the value! The value that is passed in is the number 1 — we can't change a number in the same way that we change a list. The number 1 is (and will always be) the number 1. What we did do is change the contents of the local variable (parameter) x, and this does not carry over to the environment.

One thing that might be useful to know, however, is that functions are values in Python. So if you have a function like square, you could do something like:

    queeble = square
    print queeble(2) # Prints out 4

Representing graphs

Consider the following graph (in simplegraph.gdf):

This graph has six nodes (A-F) and eight edges. It can be represented by the following dictionary:

graph = {'A': ['B', 'C'], 'B': ['C', 'D'], 'C': ['D'], 'D': ['C'], 'E': ['F'], 'F': ['C']} This is a dictionary whose keys are the nodes of the graph. For each key, the corresponding value is a list containing the nodes that are connected by a direct arc from this node.

In Gython, the extension of Python used in GUESS, there is a -> operator that accesses a directed edge between two nodes. Therefore, you can create the graph above using with the following code.

> addNode(A) > addNode(B) > addNode(C) > addNode(D) > addNode(E) > addNode(F) > addDirectedEdge(A,B) # A -> B > addDirectedEdge(A,C) # A -> C > addDirectedEdge(B,C) # B -> C > addDirectedEdge(B,D) # B -> D > addDirectedEdge(C,D) # C -> D > addDirectedEdge(D,C) # D -> C > addDirectedEdge(E,F) # E -> F > addDirectedEdge(F,C) # F -> C

GUESS automatically defines a variable g that represents the entire graph. g.nodes and g.edges are the list of of all nodes and edges in the graph, respectively.

There are actually five different operators that can be used to represent an edge between node1 and node2.

	node1->node2 # directed edge (node 1 to node 2)
	node1<-node2 # directed edge (node2 to node1)
	node1-node2  # undirected edge
	node1<->node2 # undirected edge (or bidirectional)
	node1?node2 # any type of edge between node1 and node2

Nodes are objects. To access node and edge attributes, use the dot operator (".").

node1.color (node1?node2).color (node1->node2).weight node1.name (node2-node1).visible To change Node and Edge attributes, use the dot operator followed by the assignment operator.
node1.color = blue               # Make node1 blue
(node1?node2).color = cyan       # Make all types of edges between node1 and node2 cyan
(node1->node2).weight = 5        # Make the edge from node1 to node2 have a weight of 5
node1.visible = 0                # Make node1 invisible
For numerical attributes, you may access the max or min for that attribute over all nodes or edges by typing the attribute name plus the dot operator and min or max.

For example, weight.min returns the minimum weight for all edges, and salary.max returns the max salary for all nodes. See the GUESS Quick Reference Sheet for more information on node and edge attributes.

Path Finding

Let's write a simple function to determine a path between two nodes. It takes the start and end nodes as arguments. It will return a list of nodes (including the start and end nodes) comprising the path. When no path can be found, it returns None. The same node will not occur more than once on the path returned (i.e. it won't contain cycles). The algorithm uses an important technique called backtracking: it tries each possibility in turn until it finds a solution.

    # returns list of nodes in path from start to end
    #  or None if no path exists
    def find_path(start, end, path=[]):
        path = path + [start]
        if start == end:
            return path
        if start not in g.nodes:
            return None
        adj = start.getNeighbors()
        for node in adj:
            if node not in path:
                newpath = find_path(node, end, path)
                if newpath: return newpath
        return None

One new feature here is the use of default arguments with path=[]. The function can now be called with only two arguments as in: findpath(node1, node2). A sample run (using the graph above):

    > find_path(A, D)
    [A, B, C, D]
    > 
The second 'if' statement is necessary only in case there are nodes that are listed as end points for arcs but that don't have outgoing arcs themselves, and aren't listed in the graph at all. Such nodes could also be contained in the graph, with an empty list of outgoing arcs, but sometimes it is more convenient not to require this.

Note that while the user calls find_graph() with two arguments, it calls itself with a third argument: the path that has already been traversed. The default value for this argument is the empty list, '[]', meaning no nodes have been traversed yet. This argument is used to avoid cycles (the first 'if' inside the 'for' loop). The 'path' argument is not modified: the assignment path = path + [start] creates a new list. If we had written path.append(start) instead, we would have modified the variable path in the caller, with disastrous results.

Exercises

  1. It is simple to change this function to return a list of all paths (without cycles) instead of the first path it finds. Write find_all_paths
    
    A sample run:
    
    
        > find_all_paths(A, D)
        [[A, B, C, D], [A, B, D], [A, C, D]]
        > 
    
  2. Another variant would finds the shortest path, write find_shortest_path based off of the function above to return the shortest path from start to end

    A sample run is below:

        > find_shortest_path(graph, A, D)
        [A, C, D]
        > 
    
  3. The diameter of a graph is the longest shortest path between any two vertices. Write a function findDiameter that returns the longest path between any two vertices.

  4. Consider the following depth-first search function dfs that takes a node as an argument and then visits every node reachable from
    # visit each connected node in the graph (g) start from u in a depth-first manner 
    def dfs(u):
      stack = [] 
      addNodeField("visited",Types.BOOLEAN,FALSE) # add visited field if necessary
      stack.append(u) # append to back works as push
      while len(stack) > 0: 
        v = stack.pop() 
        v.visited = true
        adj = v.getNeighbors()
        for w in adj:
          if not w.visited:
            stack.append(w)
    

    Now, write a function bfs that traverses the graph from a node in a breadth-first manner. Hint: use a queue. You should also be able to keep track of the unweighted distance from the source as you proceed. We have started it for you below.

    # visit each connected node in the graph (g) start from u in a breadth-first manner 
    def bfs(u):
      queue = []
      maxDist = len(g.nodes)   # no path can be longer than the number of vertices
      addNodeField("dist",Types.INTEGER,Integer(maxDist)) # add field to store distance
    
    
Solutions

Extra Material: Objects and Stuff

In Python you define classes with the class keyword, like this:.

    class Basket:

        # Always remember the *self* argument
        def __init__(self,contents=None):
            self.contents = contents or []

        def add(self,element):
            self.contents.append(element)

        def print_me(self):
            result = ""
            for element in self.contents:
                result = result + " " + `element`
            print "Contains:"+result

New things here:

  1. All methods (functions in an object) receive an additional argument at the start of the argument list, containing the object itself. (Called self in this example, which is customary.)
  2. Methods are called like this: object.method(arg1,arg2).
  3. Some method names, like __init__ (with two underscores before and after), are pre-defined, and mean special things. __init__ is the name of the constructor of the class, i.e. it is the function that is called when you create an instance.
  4. Some arguments can be optional and given a default value. This is done by writing the definition like:
            def spam(age=32): ...
    		
    Here, spam can be called with one or zero parameters. If none is used, then the parameter age will have the value 32.
  5. "Short circuit logic." This is a clever bit... See below.
  6. Backquotes convert an object to its string representation. (So if element contains the number 1, then `element` is the same as "1" whereas 'element' is a literal string.)
  7. The addition sign + is used also for concatenating lists, and strings are really just lists of characters (which means that you can use indexing and slicing and the len function on them. Cool, huh?)

No methods or member variables are protected (or private or the like) in Python. Encapsulation is pretty much a matter of programming style. (If you really need it, there are naming-conventions that will allow some privacy :)).

Now, about that short-circuit logic...

All values in Python can be used as logic values. Some of the more "empty" ones, like [], 0, "" and None represent logical falsity, while most other values (like [0], 1 or "Hello, world") represent logical truth.

Now, logical expressions like a and b are evaluated like this: First, check if a is true. If it is not, then simply return it. If it is, then simply return b (which will represent the truth value of the expression.) The corresponding logic for a or b is: If a is true, then return it. If it isn't, then return b.

This mechanism makes and and or behave like the boolean operators they are supposed to implement, but they also let you write short and sweet little conditional expressions. For instance, the statement

    if a:
        print a
    else:
        print b

Could instead be written:

    print a or b

Actually, this is somewhat of a Python idiom, so you might as well get used to it. This is what we do in the method Basket.__init__. The argument contents has a default value of None (which is, among other things, false). Therefore, to check if it had a value, we could write:

    if contents:
        self.contents = contents
    else:
        self.contents = []

Of course, now you know there is a better way. And why don't we give it the default value of [] in the first place? Because of the way Python works, this would give all the Baskets the same empty list as default contents. As soon as one of them started to fill up, they all would contain the same elements, and the default would not be empty anymore... To learn more about this, you should read the documentation and look for the difference between identity and equality.

Another way of doing the above is:

    def __init__(self, contents=[]):
        self.contents = contents[:]

Can you guess how this works? Instead of using the same empty list everywhere, we use the expression contents[:] to make a copy. (We simply slice the entire thing.)

So, to actually make a Basket and to use it (i.e. to call some methods on it) we would do something like this:

    b = Basket(['apple','orange'])
    b.add("lemon")
    b.print_me()

There are other magic methods than __init__. One such method is __str__ which defines how the object wants to look if it is treated like a string. We could use this in our basket instead of print_me:

    def __str__(self):
        result = ""
        for element in self.contents:
            result = result + " " + `element`
        return "Contains:"+result

Subclassing is done like this:

    class SpamBasket(Basket):
        # ... 

Python allows multiple inheritance, so you can have several superclasses in the parentheses, separated by commas. Classes are instantiated like this: x = Basket(). Constructors are, as I said, made by defining the special member function __init__. Let's say that SpamBasket had a constructor __init__(self,type). Then you could make a spam basket thus: y = SpamBasket("apples").