Tuesday 26 May 2015

Learning Python (6) - Mars Rover

My next Kata was from a set of exercises that Amir Rajan describes, the mars rover. The idea is to model a planet as a grid, place a rover there, and send it instructions for moving about. The instructions are a limited set: forward, backwards, turn left, turn right. The planet is a sphere, so once the basics are working the grid needs to wrap around. And then, being Mars, it has rocks in the way. So sometimes the rover will stop and report where it has got to.

This is a nice problem for practising TDD. As I started it felt a lot like revising the skills I'd been developing already. This is a good thing: Katas are supposed to be about practising skills, and practise is really important for reenforcing learning. The tests flow naturally from unpicking the problem, and are a perfect fit to incrementally developing a function because the turns, moves and grid all offer at least two similar possibilities for each action.

'''
Created on 24 May 2015

@author: Dan Chalmers 
'''
import unittest
from mars_rover import Planet, Direction

class Test(unittest.TestCase):

    def setUp(self):
        self.grid = Planet(4)
        self.grid.landRover(1,1,Direction.NORTH)
        self.grid2 = Planet(4)
        self.grid2.setObstacle(1,3)
        self.grid2.landRover(1,1,Direction.NORTH)
      
    def testPutRoverOnPlanet(self):
        self.assertEqual((1,1), self.grid.getLocation())
        
    def testMoveForwardNorth(self):
        (ok, position) = self.grid.move("f")
        self.assertEqual((True,(1,2)), (ok, position))

    def testTurnRightMoveForwardEast(self):
        (ok, position) = self.grid.move("rf")
        self.assertEqual((True,(2,1)), (ok, position))
        
    def testTurnR2MoveForwardSouth(self):
        (ok, position) = self.grid.move("rrf")
        self.assertEqual((True,(1,0)), (ok, position))

    #4 right turns wrap round to north again
    def testTurnR4MoveForwardSouth(self):
        (ok, position) = self.grid.move("rrrrf")
        self.assertEqual((True,(1,2)), (ok, position))
        
    def testMoveBackwardSouth(self):
        (ok, position) = self.grid.move("b")
        self.assertEqual((True,(1,0)), (ok, position))
        
    def testTurnLeftMoveForwardEast(self):
        (ok, position) = self.grid.move("lf")
        self.assertEqual((True,(0,1)), (ok, position))

    def testTurnLeftMoveBackwardWest(self):
        (ok, position) = self.grid.move("lb")
        self.assertEqual((True,(2,1)), (ok, position))
        
    def testWorldWrapsEast(self):
        (ok, position) = self.grid.move("rfff")
        self.assertEqual((True,(0,1)), (ok, position))

    def testWorldWrapsWest(self):
        (ok, position) = self.grid.move("lfff")
        self.assertEqual((True,(2,1)), (ok, position))

    #Poles wraps across the pole, not a jump to opposite end of the world!
    #Wrap over poles have an extra move after wrap to test direction swap
    def testWorldWrapsNorth(self):
        (ok, position) = self.grid.move("ffff")
        self.assertEqual((True,(3,2)), (ok, position))

    def testWorldWrapsSouth(self):
        (ok, position) = self.grid.move("rflbbb")
        self.assertEqual((True,(0,1)), (ok, position))
        
    def testBlockedByObstacle(self):
        (ok, position) = self.grid2.move("ff")
        self.assertEqual((False,(1,2)), (ok, position))

    #Written when debugging the not blocked, passed immediately
    def testWiggle(self):
        (ok, position) = self.grid.move("frflf")
        self.assertEqual((True,(2,3)), (ok, position))
      
    def testNotBlockedByObstacle(self):
        (ok, position) = self.grid2.move("frflf")
        self.assertEqual((True,(2,3)), (ok, position))
     

if __name__ == "__main__":
    #import sys;sys.argv = ['', 'Test.testName']
    unittest.main()


The code, similarly, flowed easily from the tests. Much of my earlier looking things up and use of explicit loops was being bypassed. I also found that this Kata led to a simple solution, that was fertile ground for refactoring. The similarity of the left / right, forward / backward, 2 poles, wrap around etc pairings naturally pushed me into eliminating duplication and separating little behaviours out. I was pleased that my tests described the behaviours in the spec, even as my code refactored into something much more subdivided.
'''
Mars Rover Kata
Based on Kata defined at http://amirrajan.net/Blog/code-katas-mars-rover/
Created on 24 May 2015
@author: Dan Chalmers 
'''

class Planet( object ):
  def __init__(self, size):
    self.size = size
    self.obstacles =  [[False for _ in range(size)] for _ in range(size)]  #not the same as [[False]*3]*3]
    
  def landRover(self,x,y,direction):
    self.x = x
    self.y = y
    self.direction = direction
    
  def getLocation(self):
    return (self.x, self.y)
  
  def move(self, instructions):
    for i in instructions:
      ok = self._moveStep_(i)
      if not ok: break # Hit an obstacle, so stop
    return (ok,self.getLocation())
      
  def _moveStep_(self, instruction):
    ok = True
    (dx,dy) = Direction.getMove(self.direction)
    if instruction == 'f': ok = self._moveForward_(dx,dy)
    if instruction == 'b': ok = self._moveBackward_(dx,dy)
    if instruction == 'r': self.direction = Direction.turnRight(self.direction)
    if instruction == 'l': self.direction = Direction.turnLeft(self.direction)
    return ok
    
  def _testAndMove_(self,x,y):
    (x,y, direction) = self._checkCoordinatesWrap_(x,y)
    if self.obstacles[x][y]: return False
    self.x = x
    self.y = y
    self.direction = direction
    return True
      
  def _moveForward_(self,dx,dy):
    x = self.x + dx
    y = self.y + dy
    return self._testAndMove_(x,y)

  def _moveBackward_(self,dx,dy):
    x = self.x - dx
    y = self.y - dy
    return self._testAndMove_(x,y)
    
  def _checkCoordinatesWrap_(self,x,y):
    # Because y coordinate wrap may affect X, checkY has to go first
    (x,y,direction) = self._checkYCoordinateWrap_(x,y)  
    return (self._checkXCoordinateWrap_(x), y, direction)
  
  def _checkXCoordinateWrap_(self,c):
    return c % self.size
  
  def _checkYCoordinateWrap_(self, x, y):
    if y == self.size or y == -1: #Over a pole
      y = self._yOverPole_(y)
      x = (x + self.size/2)%self.size
      direction = Direction.swapDirection(self.direction)
    else:
      direction = self.direction
    return (x, y, direction)

  def _yOverPole_(self, y):
      if y == self.size: y = self.size-1
      elif y == -1: y = 0
      return y
    
  def setObstacle(self,x,y):
    self.obstacles[x][y] = True
    
#Enumeration of directions, with a number behind it allows simple changes    
class Direction:
  NORTH = 0
  EAST = 1
  SOUTH = 2
  WEST = 3
  
  @staticmethod
  def turnRight(direction):
    return (direction+1)%4

  @staticmethod
  def turnLeft(direction):
    return (direction-1)%4
  
  @staticmethod
  def swapDirection(direction):
    return (direction+2)%4

  #Get the delta a move forward makes given a direction the rover faces
  @staticmethod
  def getMove(direction):
    if direction == Direction.NORTH: return (0,1)
    if direction == Direction.EAST: return (1,0)
    if direction == Direction.SOUTH: return (0,-1)
    if direction == Direction.WEST: return (-1,0)


Although the focus was on practise, I did do some exploring, and found three things of note:
The first was the Direction class. Coming from a Java background this felt like an enum. However, in 2.7 Python doesn't have a built in enum. So the labels approach was the next best fit. In the end this facilitated the turn, get move delta, and swap direction methods.
The second was the use of @staticmethod.
The third was that
[[False]*3]*3
and
[[False for _ in range(3)] for _ in range(3)]
build subtly different things. When used to create the obstacles map this difference matters: setObstacle is broken for the first, as shown below.

>>> obstacles = [[False]*3]*3
>>> obstacles
[[False, False, False], [False, False, False], [False, False, False]]
>>> obstacles[1][2] = True
>>> obstacles
[[False, False, True], [False, False, True], [False, False, True]]

>>> obstacles2 = [[False for _ in range(3)] for _ in range(3)]
>>> obstacles2
[[False, False, False], [False, False, False], [False, False, False]]
>>> obstacles2[1][2] = True
>>> obstacles2
[[False, False, False], [False, False, True], [False, False, False]]

Friday 22 May 2015

Learning Python (5) - Monty Hall

This is another Kata from Emily Bache's book. In essence simulating a game:
There are three doors, behind one is a prize you want (I chose a bicycle), behind the other two something you don't (a goat). You pick a door, but don't see behind it. Then the game reveals the goat behind one of the other two doors. You may then either win what is behind your original choice of door, or swap to the other closed door and win what is behind that. The best strategy is to swap doors, (because probability).

As a programming exercise this is interesting. You have to create and manipulate some state: the prizes, the choice, the open door, the final choice. More importantly there is randomisation: where do the prizes go, which door do you choose, which goat is revealed. Because of the randomisation, the acceptance test of "on average, do you win more often if you swap choice" requires many iterations of the swap / don't swap to be a safe test. So, this Kata is perfect for some use of mocks / fakes in testing to make the smaller tests predictable while still using the code under test which relies on random behaviour.

I got to the solution quite quickly. As an observation of how I'm becoming more comfortable with Python as a tool I got to the end result with less refactoring for code style, which is pleasing. There were still techniques I needed to look up - mostly about the random library and passing methods as arguments.

So, the tests. The fake Game class got created early on, once I'd written a first test against the Game. The fake deals with the random stuff by giving just enough predictability without replacing too much real code. Overriding choosePrizeDoor means that the fake is a really thin shim over the real Game implementation. The reveal is still random, but the initial choice and prize are controlled in the tests, which makes the test predictable.
The method passing into test_swappingIsBetter took a couple of passes to get right, I haven't used this much in Python before, but knew I wanted to use it very quickly as the alternative was too much duplication.
'''
Created on 21 May 2015
@author: Dan Chalmers 
'''

import unittest
from montyhall.prizes import Goat, Bike, Game
import random #ONLY for the swapping is better test! *shudder*

class Test(unittest.TestCase):

    def test_isprize(self):
      prize = Goat()
      self.assertFalse(prize.isWin());
      prize = Bike()
      self.assertTrue(prize.isWin());
      
    def test_getRevealChoiceOfTwo(self):
      doors = FakeGameZero()
      doors.choose(0)
      (doorNum, prize) = doors.reveal()
      self.assertNotEqual(0, doorNum)
      self.assertFalse(prize.isWin())

    def test_getRevealChoiceOfOne(self):
      doors = FakeGameOne()
      doors.choose(0)
      (doorNum, prize) = doors.reveal()
      self.assertEqual(2, doorNum)
      self.assertFalse(prize.isWin())

    def test_swapAndGetPrize(self):
      doors = FakeGameOne()
      doors.choose(0)
      doors.reveal()
      (doorNum, prize) = doors.swapChoice()
      self.assertEqual(1, doorNum)
      self.assertTrue(prize.isWin())
      
    def test_swapAndLoose(self):
      doors = FakeGameZero()
      doors.choose(0)
      doors.reveal()
      (doorNum, prize) = doors.swapChoice()
      self.assertNotEqual(0, doorNum)
      self.assertFalse(prize.isWin())
      
    def test_notSwapAndGetPrize(self):
      doors = FakeGameOne()
      doors.choose(1)
      doors.reveal()
      (doorNum, prize) = doors.keepChoice()
      self.assertEqual(1, doorNum)
      self.assertTrue(prize.isWin())
      
    def test_notSwapAndLoose(self):
      doors = FakeGameZero()
      doors.choose(2)
      doors.reveal()
      (doorNum, prize) = doors.keepChoice()
      self.assertEqual(2, doorNum)
      self.assertFalse(prize.isWin())
      
    '''
    The story test of does swapping mean you win more often.  
    This is the first time with Python that passing a method as a parameter has 
    clearly been the right thing to do. I do like higher order functions!  
    '''
    def test_swappingIsBetter(self):  
      swapCount = self.getWinCount(Game.swapChoice)
      keepCount = self.getWinCount(Game.keepChoice)
      message = "swapped wins: %d,  kept wins: %d" % (swapCount, keepCount)
      self.assertTrue(swapCount > keepCount)
      print(message)
      
    #1000 iterations, to avoid being bitten on the bum by random too frequently (and so seeing a fail)  
    def getWinCount(self, method):
      wins = 0
      for _ in range(1000):
        (_, prize) = method(self.makeGame())
        if prize.isWin(): wins+=1
      return wins
    
    def makeGame(self):
        doors = Game()
        doors.choose(random.randrange(3))
        doors.reveal()
        return doors
      
'''
Initially written as part of refactoring of the first test of Game
I had in mind that a mock / fake was part of what I wanted to explore in this Kata, 
but in the end this was a natural way to make a reliable test from a class which normally has some random behaviour
'''
class FakeGameZero(Game):
  def _choosePrizeDoor_(self):
    return 0
  
class FakeGameOne(Game):
  def _choosePrizeDoor_(self):
    return 1
  
if __name__ == "__main__":
    unittest.main()
        

The code is quite straightforward. Two classes to represent the prizes. In Java this would probably have come out as an enum and I'm still not convinced that this is the nicest approach in Python. I avoided having all the state as separate variables in the Game for a while, but in the end having the three variables led to much simpler methods. As state was unavoidable, going with the flow seemed to be the way to go.
'''
Created on 21 May 2015
The Monty Hall Game Kata after Emily Bache's description
@author: Dan Chalmers 
'''

import random

class Goat(object):
  def isWin(self):
    return False
  
class Bike(object):
  def isWin(self):
    return True
  
  
class Game(object):
  DOOR_COUNT = 3
  
  def __init__(self):
    self.doors = [Goat()] * self.DOOR_COUNT
    self.choice = None
    self.revealed = None
    #self.prizeDoor = None #prize door is set next
    self._setPrize_(self._choosePrizeDoor_())
    
  def choose(self, doorNum):
    self.choice = doorNum
      
  def _choosePrizeDoor_(self):
    return random.randrange(self.DOOR_COUNT)
    
  def _setPrize_(self, prizeDoor):
    self.prizeDoor = prizeDoor
    self.doors[prizeDoor] = Bike()
    
  def reveal(self):
    cands = [x for x in range(self.DOOR_COUNT) if x != self.choice and x != self.prizeDoor]
    self.revealed = cands[random.randrange(len(cands))]
    return (self.revealed, self.doors[self.revealed])
  
  def getPrizeDoorNum(self):
    return self.prizeDoor
  
  def swapChoice(self):
    cands = [x for x in range(self.DOOR_COUNT) if x != self.choice and x != self.revealed]
    self.choice = cands[0]
    return self._getChoice_()
  
  def keepChoice(self):
    return self._getChoice_()
    
  def _getChoice_(self):
    return (self.choice, self.doors[self.choice])

This was quite a quick problem to solve, but still taught me new things (random, passing functions). I was pleased with how easy using the fake was in testing. I thought it worked well as a follow-on from Medicine Clash, practising the list comprehensions and use of classes.

Wednesday 20 May 2015

Learning Python (4) - Medicine Clash Kata

Another Kata as part of my learning Python project (I do have some other things going on, just not quite done). Another one from Emily Bache, documented on GitHub. The basic premise is that a patient takes a series of medicines. Each medicine may have one or more prescriptions, and so get taken from one date for a number of days. This is all very well, but some medicines combine in strange and interesting ways. Medicine tries to avoid interesting treatments. So we have a program which looks for clashes (in the past). The exercise comes recommended as practice for building an object graph and assigning responsibilities, use of test doubles (I didn't) and use of dates. A bare bones set of three classes are provided: Patient, Medicine and Prescription. The latter two could be combined for this problem, but part of the set-up is that these classes reflect an underlying relational model and we're just ignoring other details so these classes are fixed.

My solution evolved a bit as the development went on, so the tests don't quite document the development linearly. The first behaviour to test was clearly a simple clash of overlapping dates. After a quick think it was clear that the starting point was going to be comparing lists of dates. So my first passing unit tests extracted these without making tests for a clash. I also set up some prescriptions for our user of dangerous medicines as a test fixture, although much of the setUp code followed later.

import unittest
from medicineclash.medicine import Medicine
from medicineclash.patient import Patient
from medicineclash.prescription import Prescription
from datetime import date, timedelta

class Test(unittest.TestCase):


    def setUp(self):
      prescribe_yesterday = Prescription(date.today()-timedelta(days=1))
      prescribe_one_week_ago = Prescription(date.today()-timedelta(days=7), 14)
      prescribe_two_weeks_ago = Prescription(date.today()-timedelta(days=14), 7)
      prescribe_four_weeks_ago = Prescription(date.today()-timedelta(days=28), 30)
      
      self.patient = Patient()
      self.medicines = []
      
      medicine = Medicine("Aconite")
      medicine.add_prescription(prescribe_yesterday)
      self.medicines.append(medicine)
      
      medicine = Medicine("Belladonna")
      medicine.add_prescription(prescribe_two_weeks_ago)
      self.medicines.append(medicine)      

      medicine = Medicine("Celandine")
      medicine.add_prescription(prescribe_one_week_ago)
      self.medicines.append(medicine)      
      
      medicine = Medicine("Duboisia")
      medicine.add_prescription(prescribe_four_weeks_ago)
      self.medicines.append(medicine)      

      medicine = Medicine("Ergot")
      medicine.add_prescription(prescribe_two_weeks_ago)
      medicine.add_prescription(prescribe_yesterday)
      self.medicines.append(medicine)      
      
      medicine = Medicine("Foxglove")
      medicine.add_prescription(prescribe_four_weeks_ago)
      medicine.add_prescription(prescribe_one_week_ago)
      self.medicines.append(medicine)      
      
     
    def test_simpleclash(self):
      self.patient.add_medicine(self.medicines[0])
      self.patient.add_medicine(self.medicines[1])
      self.patient.add_medicine(self.medicines[2])
      result = self.patient.clash(["Aconite","Celandine"])
      self.assertSetEqual(set([date.today()-timedelta(days=1)]), result)

    def test_get_med_date_lists_single(self):
      self.patient.add_medicine(self.medicines[0])
      result = self.patient.get_med_date_lists(["Aconite"])
      self.assertEqual(1, len(result))
      self.assertEqual(1, len(result[0]))
        
    def test_get_med_date_lists_double(self):
      self.patient.add_medicine(self.medicines[0])
      self.patient.add_medicine(self.medicines[1])
      result = self.patient.get_med_date_lists(["Aconite","Belladonna"])
      self.assertEqual(2, len(result))
      self.assertEqual(1, len(result[0]))
      self.assertEqual(7, len(result[1]))


I got these passing, with some quite long code with explicit loops. There was another unit (rather than behaviour in the spec) test by now. I won't reproduce the extra test or the first version code here, but it is worth noting that there was some refactoring at this point and the test of the removed function went with it. The rest of the tests follow, written one at a time.
    def test_nonoverlap_noclash(self):
      self.patient.add_medicine(self.medicines[0])
      self.patient.add_medicine(self.medicines[1])
      result = self.patient.clash(["Aconite","Belladonna"])
      self.assertSetEqual(set(), result)
 
    def test_doubleclash(self):
      self.patient.add_medicine(self.medicines[0])
      self.patient.add_medicine(self.medicines[1])
      self.patient.add_medicine(self.medicines[2])
      self.patient.add_medicine(self.medicines[3])
      self.patient.add_medicine(self.medicines[4])
      result = self.patient.clash(["Duboisia","Ergot"])
      self.assertEqual(8, len(result))
      self.assertTrue(date.today()-timedelta(days=14) in result)
      self.assertTrue(date.today()-timedelta(days=1) in result)
      
    def test_tripleclash(self):
      self.patient.add_medicine(self.medicines[0])
      self.patient.add_medicine(self.medicines[1])
      self.patient.add_medicine(self.medicines[2])
      self.patient.add_medicine(self.medicines[3])
      self.patient.add_medicine(self.medicines[4])
      result = self.patient.clash(["Aconite","Duboisia","Celandine"])
      self.assertEqual(1, len(result))
      self.assertTrue(date.today()-timedelta(days=1) in result)

    def test_prescription_overlap_handled(self):
      result = self.medicines[5].get_date_list_in_window(40)
      self.assertEqual(28, len(result))
      
    def test_nomedicine_noclash(self):
      result = self.patient.clash(["Aconite"])
      self.assertSetEqual(set(), result)
      
    def test_medicine_notinlist_noclash(self):
      self.patient.add_medicine(self.medicines[2])
      result = self.patient.clash(["Aconite","Belladonna"])
      self.assertSetEqual(set(), result)


After completing what I thought was my solution I took a look at Emily Bache's code, tests first. Quickly I saw that there were tests for a couple of conditions that I hadn't read into the spec. These were added and the code tweaked some more.
    def test_singlemed_clash(self):
      self.patient.add_medicine(self.medicines[0])
      self.patient.add_medicine(self.medicines[1])
      self.patient.add_medicine(self.medicines[2])
      result = self.patient.clash(["Aconite"])
      self.assertNotEqual([], result)
      
    def test_doublemedicine_clashself(self):
      self.patient.add_medicine(self.medicines[1])
      self.patient.add_medicine(self.medicines[1])
      result = self.patient.clash(["Belladonna"])
      self.assertNotEqual([], result)


So, tests done - focussed on exploring the possibilities of the spec. As far as possible they aren't tied to the internal structure; although there are also a couple of unit tests written to understand the details mixed in there.
The code has, it is fair to say, evolved over multiple refactoring passes. So what is here is where it ended up, rather than the instinctive first version. (I might add that the first version was written rather early in the morning, the refactoring at a more civilised time!) The process of refining to this was useful, and got me to explore a few language features. I'll show the code next, and then return to discussing the learning.
First the root of the tree, the patient. It enables medicine lookups and identifies clashes using a set intersection. This was also the first time I had used the "splat" operator to pass a list as a sequence of arguments.

'''
Created on 17 May 2015
Starting point: https://github.com/emilybache/KataMedicineClash/blob/master/Python/patient.py

Search for combinations of medicine from a list, which have all been taken at the same time. Report dates of such clashes.

@author: Dan Chalmers 
'''

class Patient(object):
    
    def __init__(self, medicines = None):
        self._medicines = medicines or []
    
    def add_medicine(self, medicine):
        self._medicines.append(medicine)
    
    #A useful step in building the solution, 
    #gives a list of sets of dates, one set per medicine taken that is in search list of meds
    def get_med_date_lists(self, medicine_names, days_back=90):
      return [ med.get_date_list_in_window(days_back) for med in self._medicines if med in medicine_names ] or [set()]
          
    def clash(self, medicine_names, days_back=90):
      med_lists = self.get_med_date_lists(medicine_names, days_back) 
      return set.intersection(*med_lists)
    

Next the medicine. A double-for list comprehension to get date lists and an overriding of equality to enable comparison with the medicine names, which come as a list of strings.

'''
Created on 17 May 2015
Starting point: https://github.com/emilybache/KataMedicineClash/blob/master/Python/medicine.py

A medicine, with a list of prescriptions for that patient.
Equality satisfied on name to simplify search.
Can search on dates it was taken, not stored pre-computed.

@author: Dan Chalmers 
'''

class Medicine(object):
    
    def __init__(self, name):
        self.name = name
        self.prescriptions = []
        
    def add_prescription(self, prescription):
        self.prescriptions.append(prescription)    
        
    def __eq__(self, other):
      return (self.name == other)

    def get_date_list_in_window(self, days_back):
      return set( [ med for p in self.prescriptions for med in p.get_date_list_in_window(days_back) ] )


Finally, the leaves of the tree, the prescription. This provides the list of dates within the search window. Prescription.get_date_list_in_window doesn't have its own test as it started life in with the Medicine.get_date_list_in_window function, and migrated out during refactoring. The existing tests described what I wanted of the function and carried on passing as I restructured.

'''
Created on 17 May 2015
Starting point: https://github.com/emilybache/KataMedicineClash/blob/master/Python/prescription.py

Prescriptions for medicine, a supply date and duration.
Can get a list of dates that overlap a start..today window.

@author: Dan Chalmers 
'''

from datetime import date, timedelta

class Prescription(object):
    
    def __init__(self, dispense_date=None, days_supply=30):
        self.dispense_date = dispense_date or date.today()
        self.days_supply = days_supply
        
    def get_date_list_in_window(self, window_start):
      start_date = max(self.dispense_date, date.today() - timedelta(window_start))
      end_date = min(self.dispense_date + timedelta(self.days_supply), date.today())
      return [ start_date + timedelta(day) for day in range((end_date - start_date).days) ]


As expected, this explored time a little - in particular the timedelta module, which was new to me. However, I practised the refactoring process and use of list comprehensions extensively, as I moved from explicit loops to comprehensions. The syntax is a little different to Haskell, but that remains my reference point despite being a little rusty. This exercise also explored sets more than I had before, and the set / list relationship. The splat operator and __eq__ function were also firsts. So, a lot of learning in this exercise and the result feels bigger than previous Katas. My solution remains a little different to the example solution, but I'm happy with the allocation of functions to classes and that the code feels readable even if I'm missing a few possibilities. There was quite a lot of referring to the API and refactoring in this. I hope the process of writing the functions will become more smooth with experience. I'll return to this Kata at some point, to help these techniques stick and see if they come easier.

Wednesday 13 May 2015

Learning Python (3) - Bowling Scores Kata

The Bowling Scores Kata is widely discussed. This version is after the description in Emily Bache's book. I note that the Ron Jeffries' description doesn't have the special symbol for spare, which simplifies things. Essentially the goal is to produce a score for a line of frames in 10 pin bowling. Not roll by roll, just the end score. It is just complex enough to allow a couple of different approaches, but still compact enough to solve quickly. It's an exercise I've given programming students, and done myself in Java a couple of times before. This got it onto my list of katas, to look at how my approaches to the two languages differ.

I've taken a couple of different approaches in the Java version to compare them. In Python my instinct was to take a functional approach: consume a frame, produce a score and the unconsumed frames. Consuming a frame is easy to build up test by test, to move from adding scores to handling the more complex features of spares and strikes. Once a frame can be handled, a whole game is easy.

So, tests first, in the order written.

import unittest
import bowlinggame

class Test(unittest.TestCase):

  def testScoreNumberTurn(self):
    (score, remainder) = bowlinggame.processTurn("12")
    self.assertEqual(3, score)
    self.assertEqual(0, len(remainder))

  def testScoreNoHit(self):
    (score, remainder) = bowlinggame.processTurn("--")
    self.assertEqual(0, score)
    self.assertEqual(0, len(remainder))
  #I'm sufficiently confident of the way scoreNormalRoll is being used not to test "1-" and "-1" etc
    
  def testScoreNumberTurnNotLast(self):
    (score, remainder) = bowlinggame.processTurn("1234")
    self.assertEqual(3, score)
    self.assertEqual(2, len(remainder))

  def testScoreSpareTurnNotLast(self):
    (score, remainder) = bowlinggame.processTurn("1/34")
    self.assertEqual(13, score)
    self.assertEqual(2, len(remainder))

  def testScoreStrikeFollowedByTwoBalls(self):
    (score, remainder) = bowlinggame.processTurn("X12")
    self.assertEqual(13, score)
    self.assertEqual(2, len(remainder))

  def testScoreStrikeFollowedBySpare(self):
    (score, remainder) = bowlinggame.processTurn("X1/")
    self.assertEqual(20, score)
    self.assertEqual(2, len(remainder))
    
  def testScoreStrikeFollowedByNumbers(self):
    (score, remainder) = bowlinggame.processTurn("X12")
    self.assertEqual(13, score)
    self.assertEqual(2, len(remainder))

  def testScoreStrikeFollowedByStrikeNumbers(self):
    (score, remainder) = bowlinggame.processTurn("XX23")
    self.assertEqual(22, score)
    self.assertEqual(3, len(remainder))
    
  def testScoreStrikeFollowedByStrikes(self):
    (score, remainder) = bowlinggame.processTurn("XXXX")
    self.assertEqual(30, score)
    self.assertEqual(3, len(remainder))

  def testSimpleGame(self):
    score = bowlinggame.processGame("12345123451234512345")
    self.assertEqual(60, score)

  def testPerfectGame(self):
    score = bowlinggame.processGame("XXXXXXXXXXXX")
    self.assertEqual(300, score)

  def testHeartbreakGame(self):
    score = bowlinggame.processGame("9-9-9-9-9-9-9-9-9-9-")
    self.assertEqual(90, score)

  def testAllSparesGame(self):
    score = bowlinggame.processGame("5/5/5/5/5/5/5/5/5/5/5")
    self.assertEqual(150, score)

if __name__ == "__main__":
    #import sys;sys.argv = ['', 'Test.testName']
    unittest.main()

And the code. The various methods were added to as the tests were produced, with a couple of refactoring passes.
'''
Bowling Game Kata
Created on 11 May 2015
@author: Dan Chalmers 
'''


def processGame(line):
  score = 0
  for _ in range(0,10):
    (turnScore, line) = processTurn(line)
    score += turnScore
  return score  
  
def processTurn(throws): 
  if throws[1] == '/':
    return (10+scoreExtraBalls(1, throws[2:]), throws[2:])
  elif throws[0] == 'X':
    return (10+scoreExtraBalls(2, throws[1:]), throws[1:])
  else:
    return (scoreNormalRoll(throws[0]) + scoreNormalRoll(throws[1]), throws[2:])

# Strikes and spares get 2 or 1 extra balls, with no further extras, that just score      
def scoreExtraBalls(extras, throws):
  if extras == 0:
    return 0
  # A spare extra ball will give 10 overall, needs some special code. 
  # But a strike in the extras works as a number!
  elif extras == 2 and throws[1] == '/':
      return 10
  else:
    return scoreNormalRoll(throws[0]) + scoreExtraBalls(extras-1, throws[1:])
    
def scoreNormalRoll(throw):
  if throw == '-':
    return 0
  elif throw == 'X':
    return 10
  else:
    return int(throw)

A version which pre-processes spares to 10-previous (and X to 10 and - to 0) would be almost as fiddly as this, even if the core algorithm is more compact. In this version the connection between the special outcomes and the behaviour is arguably clearer.

I've written using the "consume a turn" approach in Java too, and being able to return a pair rather than making a class to generate Frame objects instantly makes this neater. The Python version has fewer named functions, and these functions are marginally more complex - but, I think, readable entities.

I think that this Kata fits some of Python's strengths neatly: a functional approach without needing to store state, and processing a stream of values (my Phone Numbers solution also played to this).

Friday 8 May 2015

Learning Python (2) - Phone Numbers Kata

Another instalment in the "showing my first attempts" (and being open to any suggestions) series...

Last night, while waiting for the election coverage to warm up, I sat down with a drink and did another kata. This time it was the "phone numbers" Kata, based on the description in The Coding Dojo Handbook by Emily Bache. Which has a nice tidy spec:
Given a list of phone numbers, detect whether any are inconsistent, ie that is one number is a prefix of another.
In addition, the intention was to avoid explicit loops and have it be reasonably efficient for large data sets. In this case a recursively built tree representation of numbers came as the natural approach that I had sketched in my head before starting. (Just enough planning, even when test driven.)

So, tests first. Written top down.
import unittest
import phonenumbers

class Test(unittest.TestCase):

    def setUp(self):
      self.tree = phonenumbers.create_empty()

    def testFirstNodeCreated(self):
      added = self.tree.add_number("1", "Ann")
      self.assertTrue(added,"first number failed to add")

    def testSecondNumber(self):
      self.tree.add_number("1", "Ann")
      added = self.tree.add_number("2", "Bob")
      self.assertTrue(added,"second number failed to add")

    def testSecondNumberDuplicate(self):
      self.tree.add_number("1", "Ann")
      added = self.tree.add_number("1", "Ann2")
      self.assertFalse(added,"second number added but duplicate")

    def testTwoDigitNumbers(self):
      added = self.tree.add_number("11", "Ann")
      self.assertTrue(added,"first two digit number failed to add")
      added = self.tree.add_number("12", "Bob")
      self.assertTrue(added,"second two digit number failed to add")
      
    def testMixedDigitNumbers(self):
      added = self.tree.add_number("11", "Ann")
      self.assertTrue(added,"two digit number failed to add")
      added = self.tree.add_number("124", "Bob")
      self.assertTrue(added,"three digit number failed to add")
      added = self.tree.add_number("1256", "Claire")
      self.assertTrue(added,"four digit number failed to add")
      
    def testAddingPrefixOfExisting(self):
      self.tree.add_number("1234", "Ann")
      added = self.tree.add_number("12", "Bob")
      self.assertFalse(added,"prefix of existing number added incorrectly")
      
    def testAddingNumberExtensionOfExisting(self):
      self.tree.add_number("12", "Ann")
      added = self.tree.add_number("1234", "Bob")
      self.assertFalse(added,"extension of existing number added incorrectly")

    def testIgnoresPunctuation(self):
      self.tree.add_number("12 34-56", "Ann")
      added = self.tree.add_number("1234", "Bob")
      self.assertFalse(added,"did not ignore space")
      added = self.tree.add_number("12 3456", "Bob")
      self.assertFalse(added,"did not ignore dash")

    def testProcessConsistentFile(self):
      filename = 'phone_data.txt'  # Consistent
      consistet = phonenumbers.load_file(filename)
      self.assertTrue(consistet, "file " + filename + " was not consistent, expected it would be")
      
    def testProcessInconsistentFile(self):
      filename = 'phone_data_10000.txt'  # Has inconsistencies
      consistet = phonenumbers.load_file(filename)
      self.assertFalse(consistet, "file " + filename + " was consistent, expected it would not be")
      
    def testProcessLongFile(self):
      filename = 'phone_data_20000.txt'  # Consistent, by removing inconsistencies from example data. Allows rough performance check.
      consistet = phonenumbers.load_file(filename)
      self.assertTrue(consistet, "file " + filename + " was not consistent, expected it would not be")
      
if __name__ == "__main__":
    #import sys;sys.argv = ['', 'Test.testName']
    unittest.main()


No big surprises in there. The tests build up the data type that forms the tree, and the processing of input data to build it. The input files are from Emily Bache at gitbub, with the large consistent file made by editing an original. Each test required that I add some functionality, until the second and third tests with a file at the end. There were various refactorings along the way as you'd expect, and a bit more today to tidy up.

The code looks like this:
def create_empty():
  return Number()

def load_file(filename):
  tree = create_empty()
  result = True
  with open(filename, 'r') as f:
    for line in f:
      result = result and _load_line_(tree, line)
  return result
  
def _load_line_(tree, line):
  fields = line.split(',')
  consistent = tree.add_number(fields[1].strip(), fields[0])
  if not consistent:
    print line + "is not consistent!"
  return consistent

class Empty():
  ''' A node in a phone number tree. 
  Empty base type that is used to pack the array of pointers to next branches in a tree 
  where there isn't a branch for that number yet.
  '''
      
class Number():
  '''A number node, which has entries for each digit 0-9 that will contain another Node.
  '''
  
  def __init__(self):
    self.numbers = [Empty()] *10
    
  def add_number(self, number, name):
    '''Add a number to the tree, provided 
    there isn't already a number which is a prefix of this number
    or this number isn't a prefix of an existing number
    '''
    (digit, entry, number) = self._get_digit_entry_(number)
    if isinstance(entry, Empty):
      self.numbers[digit] = self._make_next_node_(number[1:], name)
      return True
    elif isinstance(entry, Name) or len(number[1:]) == 0:
      return False
    else:
      return entry.add_number(number[1:], name)
      
  def _get_digit_entry_(self, number):
    if number[0] == ' ' or number[0] == '-':
      return self._get_digit_entry_(number[1:])
    
    digit = int(number[0])
    entry = self.numbers[digit] 
    return (digit, entry, number)
        
  def _make_next_node_(self, number, name):
    if len(number) > 0:
      node = Number()
      node.add_number(number, name)
      return node
    else:
      return Name(name)
  
  
class Name():
  '''A leaf node in the number tree, representing a person with a complete phone number.
  '''
  def __init__(self, name):
    self.name = name


The use of classes betrays an OO habit, but I got a better idea of making Python classes through this. I had started with something closer to a Haskell data type in my head. I also had a version along the way with an abstract base class, when I had forgotten that lists in Python don't require a common type. That's been refactored out, but was an interesting learning step as it was a language feature I hadn't tried out before. I daresay the Empty and Name classes could be replaced with something built in, but I'm happy with the added expression having them gives.

The add_number function was built up gradually, a clause at a time, with the tests driving additions. But as I said above, I had a plan for where I was going from the start. Handling the punctuation in input files was a requirement that emerged in addition to that plan.

The line by line processing of the file didn't easily submit to using map and list comprehensions, although I did try. Given that the data set may be long I prefer the line by line approach to a "load file then process" approach. So, I have one explicit for loop.

And I was done well before the night got interesting.


Tuesday 5 May 2015

Learning Python (part 1)

My programming language CV is pretty Java centric since about 1999, with some Haskell along the way, a general background of SQL and shell scripts for most of my career and a selection of languages that are very much occasional or in the past and I'm not going to feed to a search engine!

An impending change of day-job sees me needing to get to grips with Python. It's a language I've been aware of for years, but never had cause to use until now. So, how to get familiar? Instalment one of what ought to be a series below. I'm deliberately exposing the learning process here - I'm not going to make claims that these are brilliant examples of python. It will be interesting to return to these as I develop...

Step 1, is obviously to take a look at the python.org web site and get a general feel. Maybe write "hello world".
Step 2, is to run through a tutorial. It isn't one of the "7 languages" so I looked around and found Google's Python Class and worked through that. It was pretty useful, gave me a sense of the strengths and style of the language, and an overview of key features.
Step 3 will be to have a bash at a few Katas. I'm a firm believer in learning by doing and in the case of programming doing comes out as a mix of thinking and typing. I want to do some more or less unguided problem solving, and find an approach that fits the way I like to (behaviour in unit) test drive my code. Katas give a good chance of finding a selection of other solutions out there to compare with and tend to be a tidy quiet-evening self contained activity.
Step 4 will be to build something bigger, probably with Django. But I'm still firmly at step 3.

First up was the "String Addition" kata, as described by Roy Osherove. My code is below, tests are written in the order I wrote them.

import unittest
from string_calc import add

class Test(unittest.TestCase):

    def testAddNothing(self):
      self.assertEqual(0, add(""), "empty string should return zero")
        
    def testAddOne(self): 
      self.assertEqual(1, add("1")) 
       
    def testAddTwo(self):
      self.assertEqual(3, add("1,2"))

    def testAddThree(self):
      self.assertEqual(6, add("1,2,3")) 

    def testAddLongerNumbers(self):
      self.assertEqual(111, add("1,10,100"))
      
    def testHandleWhitespace(self):
      self.assertEqual(10, add("1\n2, 3 4"), "not handling whitespace")
      
    def testShouldRejectNegatives(self):
      self.assertRaises(ValueError, add, "1, -2, 1")
      
if __name__ == "__main__":
    #import sys;sys.argv = ['', 'Test.testName']
    unittest.main()

'''
String Calculator Kata
from problem by Roy Osherove
Created on 3 May 2015

@author: Dan Chalmers
'''

import re
from sys import argv

def add(number_string):
  # break the string into a list of number strings
  number_list = re.findall(r'-?\d+', number_string)
  
  # check for bad input
  if any(number[0] == '-' for number in number_string):
    raise ValueError("")
  
  return sum(map(int, number_list))
    
  
if __name__ == '__main__':
    print(add(argv[1]))

So, what did I learn?

  • Some familiarity with PyDev in eclipse. I spent a little while working out what was there and how I'd want to use it in a bigger project, not just pushing at the path of least resistance.
  • A first go with unittest. This was agreeably straightforward, although I'd cheerfully loose all those "self." references.
  • Reminded myself of the basics of python and found that things were easier by the end.
  • There were a few multiple tests failing runs while I sorted the regular expression and use of re out, but generally each move added something or refactored without breaking the earlier cases.
  • I found myself moving from
    split(<separator>) to
    re.findall('\d+', numbers)
    in order to handle newline and commas together. At that point being able to specify delimiters stopped making much sense as well formatted input is promised. Using a number as part of a delimiter felt bloody minded so had a slight spec out of the pram moment!
  • Gary Bernhardt's solution video reminded me of sum, any and map, so I refactored to use them as I do like that more functional style and wanted to get it in my fingers. Before that it used explicit loops.
Next stop, something maybe a little more involved and with a deliberate "take a functional / no loops" constraint.

Moving to Docker

I know, everyone's been excited about Docker lately. Time to join in.
Why?
For this project, I see three advantages:
  1. Between my development environment, CI server, and deployment I've got some very similar things running. But not quite the same, e.g. deployment has an nginx front end, non-development has firefox and Xvfb running for testing. However, I'd really like the common bits to be identical to make the tests as useful as possible. Lightweight builds of platform updates would be good too.
  2. Auto-deploying new web apps from Jenkins is suffering from a little friction. At the moment it involves a manual step. I'm sure there are other ways of solving this, but as I still haven't done so a way of running new versions for low effort and a reasonable security risk remains on my shopping list. Maybe Docker can help with this?
  3. The distribution of services between servers ought to be a flexible thing. Docker does that, and from a first look version controlled Dockerfiles lend themselves to self-documenting, generated from a script containers. 
What I'm not planning to do, first off at least:
  • Docker for my CI master. There's only one, it just has Jenkins and Java, that can stay as a VM.
  • Docker for my data volumes. For the databases there will just be one of each master / slave. All of the web content comes from the project build. The big user generated data files go to S3. So there isn't much need for deploying multiple copies. I'd also like to keep the data as just a disk for now as that ought to make reversing out of docker easier, should I choose to do so.