Thursday 18 June 2015

Learning Python (7) - Consolidation and Django

It's been a few weeks since my last post. So, what news?

I've done more of the Katas I had planned for learning now. The extra ones have been Map Kata and Data Munging. They were interesting, but I'm not going to blog them in full here. They mostly served to consolidate rather than show me new things. There's a couple more on the list, but they're pushed down the to-do list a bit for now. The fact it doesn't all feel new seems like a good sign.

I've been exploring Django. Mostly in the form of the tutorial so far, although I've added some spice to this: First by testing, both the Django tests in the tutorial, but also Selenium end to end smoke tests. Second, by playing with Docker alongside this. Working with a development container of Django and sqlite3 most of the time, but also running a triple-container setup of Django, Postgres and Jenkins. This isn't the first time I've used Docker, but I'm quite pleased with my current explorations of common and dev / test docker-compose setups, with matching Django common, dev and test setups. More on this as I get a setup for one of my projects that I feel is stable in use.

Finally (you may have noticed) I've started using the splashinapond.com domain. I'm winding down one job and ramping another two up. One is development work for a start-up, the other is my own company. This blog will still focus on my own things, but there's now a name for it.

Tuesday 26 May 2015

Learning Python (6) - Mars Rover

My next Kata was from a set of exercises that Amir Rajan describes, the mars rover. The idea is to model a planet as a grid, place a rover there, and send it instructions for moving about. The instructions are a limited set: forward, backwards, turn left, turn right. The planet is a sphere, so once the basics are working the grid needs to wrap around. And then, being Mars, it has rocks in the way. So sometimes the rover will stop and report where it has got to.

This is a nice problem for practising TDD. As I started it felt a lot like revising the skills I'd been developing already. This is a good thing: Katas are supposed to be about practising skills, and practise is really important for reenforcing learning. The tests flow naturally from unpicking the problem, and are a perfect fit to incrementally developing a function because the turns, moves and grid all offer at least two similar possibilities for each action.

'''
Created on 24 May 2015

@author: Dan Chalmers 
'''
import unittest
from mars_rover import Planet, Direction

class Test(unittest.TestCase):

    def setUp(self):
        self.grid = Planet(4)
        self.grid.landRover(1,1,Direction.NORTH)
        self.grid2 = Planet(4)
        self.grid2.setObstacle(1,3)
        self.grid2.landRover(1,1,Direction.NORTH)
      
    def testPutRoverOnPlanet(self):
        self.assertEqual((1,1), self.grid.getLocation())
        
    def testMoveForwardNorth(self):
        (ok, position) = self.grid.move("f")
        self.assertEqual((True,(1,2)), (ok, position))

    def testTurnRightMoveForwardEast(self):
        (ok, position) = self.grid.move("rf")
        self.assertEqual((True,(2,1)), (ok, position))
        
    def testTurnR2MoveForwardSouth(self):
        (ok, position) = self.grid.move("rrf")
        self.assertEqual((True,(1,0)), (ok, position))

    #4 right turns wrap round to north again
    def testTurnR4MoveForwardSouth(self):
        (ok, position) = self.grid.move("rrrrf")
        self.assertEqual((True,(1,2)), (ok, position))
        
    def testMoveBackwardSouth(self):
        (ok, position) = self.grid.move("b")
        self.assertEqual((True,(1,0)), (ok, position))
        
    def testTurnLeftMoveForwardEast(self):
        (ok, position) = self.grid.move("lf")
        self.assertEqual((True,(0,1)), (ok, position))

    def testTurnLeftMoveBackwardWest(self):
        (ok, position) = self.grid.move("lb")
        self.assertEqual((True,(2,1)), (ok, position))
        
    def testWorldWrapsEast(self):
        (ok, position) = self.grid.move("rfff")
        self.assertEqual((True,(0,1)), (ok, position))

    def testWorldWrapsWest(self):
        (ok, position) = self.grid.move("lfff")
        self.assertEqual((True,(2,1)), (ok, position))

    #Poles wraps across the pole, not a jump to opposite end of the world!
    #Wrap over poles have an extra move after wrap to test direction swap
    def testWorldWrapsNorth(self):
        (ok, position) = self.grid.move("ffff")
        self.assertEqual((True,(3,2)), (ok, position))

    def testWorldWrapsSouth(self):
        (ok, position) = self.grid.move("rflbbb")
        self.assertEqual((True,(0,1)), (ok, position))
        
    def testBlockedByObstacle(self):
        (ok, position) = self.grid2.move("ff")
        self.assertEqual((False,(1,2)), (ok, position))

    #Written when debugging the not blocked, passed immediately
    def testWiggle(self):
        (ok, position) = self.grid.move("frflf")
        self.assertEqual((True,(2,3)), (ok, position))
      
    def testNotBlockedByObstacle(self):
        (ok, position) = self.grid2.move("frflf")
        self.assertEqual((True,(2,3)), (ok, position))
     

if __name__ == "__main__":
    #import sys;sys.argv = ['', 'Test.testName']
    unittest.main()


The code, similarly, flowed easily from the tests. Much of my earlier looking things up and use of explicit loops was being bypassed. I also found that this Kata led to a simple solution, that was fertile ground for refactoring. The similarity of the left / right, forward / backward, 2 poles, wrap around etc pairings naturally pushed me into eliminating duplication and separating little behaviours out. I was pleased that my tests described the behaviours in the spec, even as my code refactored into something much more subdivided.
'''
Mars Rover Kata
Based on Kata defined at http://amirrajan.net/Blog/code-katas-mars-rover/
Created on 24 May 2015
@author: Dan Chalmers 
'''

class Planet( object ):
  def __init__(self, size):
    self.size = size
    self.obstacles =  [[False for _ in range(size)] for _ in range(size)]  #not the same as [[False]*3]*3]
    
  def landRover(self,x,y,direction):
    self.x = x
    self.y = y
    self.direction = direction
    
  def getLocation(self):
    return (self.x, self.y)
  
  def move(self, instructions):
    for i in instructions:
      ok = self._moveStep_(i)
      if not ok: break # Hit an obstacle, so stop
    return (ok,self.getLocation())
      
  def _moveStep_(self, instruction):
    ok = True
    (dx,dy) = Direction.getMove(self.direction)
    if instruction == 'f': ok = self._moveForward_(dx,dy)
    if instruction == 'b': ok = self._moveBackward_(dx,dy)
    if instruction == 'r': self.direction = Direction.turnRight(self.direction)
    if instruction == 'l': self.direction = Direction.turnLeft(self.direction)
    return ok
    
  def _testAndMove_(self,x,y):
    (x,y, direction) = self._checkCoordinatesWrap_(x,y)
    if self.obstacles[x][y]: return False
    self.x = x
    self.y = y
    self.direction = direction
    return True
      
  def _moveForward_(self,dx,dy):
    x = self.x + dx
    y = self.y + dy
    return self._testAndMove_(x,y)

  def _moveBackward_(self,dx,dy):
    x = self.x - dx
    y = self.y - dy
    return self._testAndMove_(x,y)
    
  def _checkCoordinatesWrap_(self,x,y):
    # Because y coordinate wrap may affect X, checkY has to go first
    (x,y,direction) = self._checkYCoordinateWrap_(x,y)  
    return (self._checkXCoordinateWrap_(x), y, direction)
  
  def _checkXCoordinateWrap_(self,c):
    return c % self.size
  
  def _checkYCoordinateWrap_(self, x, y):
    if y == self.size or y == -1: #Over a pole
      y = self._yOverPole_(y)
      x = (x + self.size/2)%self.size
      direction = Direction.swapDirection(self.direction)
    else:
      direction = self.direction
    return (x, y, direction)

  def _yOverPole_(self, y):
      if y == self.size: y = self.size-1
      elif y == -1: y = 0
      return y
    
  def setObstacle(self,x,y):
    self.obstacles[x][y] = True
    
#Enumeration of directions, with a number behind it allows simple changes    
class Direction:
  NORTH = 0
  EAST = 1
  SOUTH = 2
  WEST = 3
  
  @staticmethod
  def turnRight(direction):
    return (direction+1)%4

  @staticmethod
  def turnLeft(direction):
    return (direction-1)%4
  
  @staticmethod
  def swapDirection(direction):
    return (direction+2)%4

  #Get the delta a move forward makes given a direction the rover faces
  @staticmethod
  def getMove(direction):
    if direction == Direction.NORTH: return (0,1)
    if direction == Direction.EAST: return (1,0)
    if direction == Direction.SOUTH: return (0,-1)
    if direction == Direction.WEST: return (-1,0)


Although the focus was on practise, I did do some exploring, and found three things of note:
The first was the Direction class. Coming from a Java background this felt like an enum. However, in 2.7 Python doesn't have a built in enum. So the labels approach was the next best fit. In the end this facilitated the turn, get move delta, and swap direction methods.
The second was the use of @staticmethod.
The third was that
[[False]*3]*3
and
[[False for _ in range(3)] for _ in range(3)]
build subtly different things. When used to create the obstacles map this difference matters: setObstacle is broken for the first, as shown below.

>>> obstacles = [[False]*3]*3
>>> obstacles
[[False, False, False], [False, False, False], [False, False, False]]
>>> obstacles[1][2] = True
>>> obstacles
[[False, False, True], [False, False, True], [False, False, True]]

>>> obstacles2 = [[False for _ in range(3)] for _ in range(3)]
>>> obstacles2
[[False, False, False], [False, False, False], [False, False, False]]
>>> obstacles2[1][2] = True
>>> obstacles2
[[False, False, False], [False, False, True], [False, False, False]]

Friday 22 May 2015

Learning Python (5) - Monty Hall

This is another Kata from Emily Bache's book. In essence simulating a game:
There are three doors, behind one is a prize you want (I chose a bicycle), behind the other two something you don't (a goat). You pick a door, but don't see behind it. Then the game reveals the goat behind one of the other two doors. You may then either win what is behind your original choice of door, or swap to the other closed door and win what is behind that. The best strategy is to swap doors, (because probability).

As a programming exercise this is interesting. You have to create and manipulate some state: the prizes, the choice, the open door, the final choice. More importantly there is randomisation: where do the prizes go, which door do you choose, which goat is revealed. Because of the randomisation, the acceptance test of "on average, do you win more often if you swap choice" requires many iterations of the swap / don't swap to be a safe test. So, this Kata is perfect for some use of mocks / fakes in testing to make the smaller tests predictable while still using the code under test which relies on random behaviour.

I got to the solution quite quickly. As an observation of how I'm becoming more comfortable with Python as a tool I got to the end result with less refactoring for code style, which is pleasing. There were still techniques I needed to look up - mostly about the random library and passing methods as arguments.

So, the tests. The fake Game class got created early on, once I'd written a first test against the Game. The fake deals with the random stuff by giving just enough predictability without replacing too much real code. Overriding choosePrizeDoor means that the fake is a really thin shim over the real Game implementation. The reveal is still random, but the initial choice and prize are controlled in the tests, which makes the test predictable.
The method passing into test_swappingIsBetter took a couple of passes to get right, I haven't used this much in Python before, but knew I wanted to use it very quickly as the alternative was too much duplication.
'''
Created on 21 May 2015
@author: Dan Chalmers 
'''

import unittest
from montyhall.prizes import Goat, Bike, Game
import random #ONLY for the swapping is better test! *shudder*

class Test(unittest.TestCase):

    def test_isprize(self):
      prize = Goat()
      self.assertFalse(prize.isWin());
      prize = Bike()
      self.assertTrue(prize.isWin());
      
    def test_getRevealChoiceOfTwo(self):
      doors = FakeGameZero()
      doors.choose(0)
      (doorNum, prize) = doors.reveal()
      self.assertNotEqual(0, doorNum)
      self.assertFalse(prize.isWin())

    def test_getRevealChoiceOfOne(self):
      doors = FakeGameOne()
      doors.choose(0)
      (doorNum, prize) = doors.reveal()
      self.assertEqual(2, doorNum)
      self.assertFalse(prize.isWin())

    def test_swapAndGetPrize(self):
      doors = FakeGameOne()
      doors.choose(0)
      doors.reveal()
      (doorNum, prize) = doors.swapChoice()
      self.assertEqual(1, doorNum)
      self.assertTrue(prize.isWin())
      
    def test_swapAndLoose(self):
      doors = FakeGameZero()
      doors.choose(0)
      doors.reveal()
      (doorNum, prize) = doors.swapChoice()
      self.assertNotEqual(0, doorNum)
      self.assertFalse(prize.isWin())
      
    def test_notSwapAndGetPrize(self):
      doors = FakeGameOne()
      doors.choose(1)
      doors.reveal()
      (doorNum, prize) = doors.keepChoice()
      self.assertEqual(1, doorNum)
      self.assertTrue(prize.isWin())
      
    def test_notSwapAndLoose(self):
      doors = FakeGameZero()
      doors.choose(2)
      doors.reveal()
      (doorNum, prize) = doors.keepChoice()
      self.assertEqual(2, doorNum)
      self.assertFalse(prize.isWin())
      
    '''
    The story test of does swapping mean you win more often.  
    This is the first time with Python that passing a method as a parameter has 
    clearly been the right thing to do. I do like higher order functions!  
    '''
    def test_swappingIsBetter(self):  
      swapCount = self.getWinCount(Game.swapChoice)
      keepCount = self.getWinCount(Game.keepChoice)
      message = "swapped wins: %d,  kept wins: %d" % (swapCount, keepCount)
      self.assertTrue(swapCount > keepCount)
      print(message)
      
    #1000 iterations, to avoid being bitten on the bum by random too frequently (and so seeing a fail)  
    def getWinCount(self, method):
      wins = 0
      for _ in range(1000):
        (_, prize) = method(self.makeGame())
        if prize.isWin(): wins+=1
      return wins
    
    def makeGame(self):
        doors = Game()
        doors.choose(random.randrange(3))
        doors.reveal()
        return doors
      
'''
Initially written as part of refactoring of the first test of Game
I had in mind that a mock / fake was part of what I wanted to explore in this Kata, 
but in the end this was a natural way to make a reliable test from a class which normally has some random behaviour
'''
class FakeGameZero(Game):
  def _choosePrizeDoor_(self):
    return 0
  
class FakeGameOne(Game):
  def _choosePrizeDoor_(self):
    return 1
  
if __name__ == "__main__":
    unittest.main()
        

The code is quite straightforward. Two classes to represent the prizes. In Java this would probably have come out as an enum and I'm still not convinced that this is the nicest approach in Python. I avoided having all the state as separate variables in the Game for a while, but in the end having the three variables led to much simpler methods. As state was unavoidable, going with the flow seemed to be the way to go.
'''
Created on 21 May 2015
The Monty Hall Game Kata after Emily Bache's description
@author: Dan Chalmers 
'''

import random

class Goat(object):
  def isWin(self):
    return False
  
class Bike(object):
  def isWin(self):
    return True
  
  
class Game(object):
  DOOR_COUNT = 3
  
  def __init__(self):
    self.doors = [Goat()] * self.DOOR_COUNT
    self.choice = None
    self.revealed = None
    #self.prizeDoor = None #prize door is set next
    self._setPrize_(self._choosePrizeDoor_())
    
  def choose(self, doorNum):
    self.choice = doorNum
      
  def _choosePrizeDoor_(self):
    return random.randrange(self.DOOR_COUNT)
    
  def _setPrize_(self, prizeDoor):
    self.prizeDoor = prizeDoor
    self.doors[prizeDoor] = Bike()
    
  def reveal(self):
    cands = [x for x in range(self.DOOR_COUNT) if x != self.choice and x != self.prizeDoor]
    self.revealed = cands[random.randrange(len(cands))]
    return (self.revealed, self.doors[self.revealed])
  
  def getPrizeDoorNum(self):
    return self.prizeDoor
  
  def swapChoice(self):
    cands = [x for x in range(self.DOOR_COUNT) if x != self.choice and x != self.revealed]
    self.choice = cands[0]
    return self._getChoice_()
  
  def keepChoice(self):
    return self._getChoice_()
    
  def _getChoice_(self):
    return (self.choice, self.doors[self.choice])

This was quite a quick problem to solve, but still taught me new things (random, passing functions). I was pleased with how easy using the fake was in testing. I thought it worked well as a follow-on from Medicine Clash, practising the list comprehensions and use of classes.

Wednesday 20 May 2015

Learning Python (4) - Medicine Clash Kata

Another Kata as part of my learning Python project (I do have some other things going on, just not quite done). Another one from Emily Bache, documented on GitHub. The basic premise is that a patient takes a series of medicines. Each medicine may have one or more prescriptions, and so get taken from one date for a number of days. This is all very well, but some medicines combine in strange and interesting ways. Medicine tries to avoid interesting treatments. So we have a program which looks for clashes (in the past). The exercise comes recommended as practice for building an object graph and assigning responsibilities, use of test doubles (I didn't) and use of dates. A bare bones set of three classes are provided: Patient, Medicine and Prescription. The latter two could be combined for this problem, but part of the set-up is that these classes reflect an underlying relational model and we're just ignoring other details so these classes are fixed.

My solution evolved a bit as the development went on, so the tests don't quite document the development linearly. The first behaviour to test was clearly a simple clash of overlapping dates. After a quick think it was clear that the starting point was going to be comparing lists of dates. So my first passing unit tests extracted these without making tests for a clash. I also set up some prescriptions for our user of dangerous medicines as a test fixture, although much of the setUp code followed later.

import unittest
from medicineclash.medicine import Medicine
from medicineclash.patient import Patient
from medicineclash.prescription import Prescription
from datetime import date, timedelta

class Test(unittest.TestCase):


    def setUp(self):
      prescribe_yesterday = Prescription(date.today()-timedelta(days=1))
      prescribe_one_week_ago = Prescription(date.today()-timedelta(days=7), 14)
      prescribe_two_weeks_ago = Prescription(date.today()-timedelta(days=14), 7)
      prescribe_four_weeks_ago = Prescription(date.today()-timedelta(days=28), 30)
      
      self.patient = Patient()
      self.medicines = []
      
      medicine = Medicine("Aconite")
      medicine.add_prescription(prescribe_yesterday)
      self.medicines.append(medicine)
      
      medicine = Medicine("Belladonna")
      medicine.add_prescription(prescribe_two_weeks_ago)
      self.medicines.append(medicine)      

      medicine = Medicine("Celandine")
      medicine.add_prescription(prescribe_one_week_ago)
      self.medicines.append(medicine)      
      
      medicine = Medicine("Duboisia")
      medicine.add_prescription(prescribe_four_weeks_ago)
      self.medicines.append(medicine)      

      medicine = Medicine("Ergot")
      medicine.add_prescription(prescribe_two_weeks_ago)
      medicine.add_prescription(prescribe_yesterday)
      self.medicines.append(medicine)      
      
      medicine = Medicine("Foxglove")
      medicine.add_prescription(prescribe_four_weeks_ago)
      medicine.add_prescription(prescribe_one_week_ago)
      self.medicines.append(medicine)      
      
     
    def test_simpleclash(self):
      self.patient.add_medicine(self.medicines[0])
      self.patient.add_medicine(self.medicines[1])
      self.patient.add_medicine(self.medicines[2])
      result = self.patient.clash(["Aconite","Celandine"])
      self.assertSetEqual(set([date.today()-timedelta(days=1)]), result)

    def test_get_med_date_lists_single(self):
      self.patient.add_medicine(self.medicines[0])
      result = self.patient.get_med_date_lists(["Aconite"])
      self.assertEqual(1, len(result))
      self.assertEqual(1, len(result[0]))
        
    def test_get_med_date_lists_double(self):
      self.patient.add_medicine(self.medicines[0])
      self.patient.add_medicine(self.medicines[1])
      result = self.patient.get_med_date_lists(["Aconite","Belladonna"])
      self.assertEqual(2, len(result))
      self.assertEqual(1, len(result[0]))
      self.assertEqual(7, len(result[1]))


I got these passing, with some quite long code with explicit loops. There was another unit (rather than behaviour in the spec) test by now. I won't reproduce the extra test or the first version code here, but it is worth noting that there was some refactoring at this point and the test of the removed function went with it. The rest of the tests follow, written one at a time.
    def test_nonoverlap_noclash(self):
      self.patient.add_medicine(self.medicines[0])
      self.patient.add_medicine(self.medicines[1])
      result = self.patient.clash(["Aconite","Belladonna"])
      self.assertSetEqual(set(), result)
 
    def test_doubleclash(self):
      self.patient.add_medicine(self.medicines[0])
      self.patient.add_medicine(self.medicines[1])
      self.patient.add_medicine(self.medicines[2])
      self.patient.add_medicine(self.medicines[3])
      self.patient.add_medicine(self.medicines[4])
      result = self.patient.clash(["Duboisia","Ergot"])
      self.assertEqual(8, len(result))
      self.assertTrue(date.today()-timedelta(days=14) in result)
      self.assertTrue(date.today()-timedelta(days=1) in result)
      
    def test_tripleclash(self):
      self.patient.add_medicine(self.medicines[0])
      self.patient.add_medicine(self.medicines[1])
      self.patient.add_medicine(self.medicines[2])
      self.patient.add_medicine(self.medicines[3])
      self.patient.add_medicine(self.medicines[4])
      result = self.patient.clash(["Aconite","Duboisia","Celandine"])
      self.assertEqual(1, len(result))
      self.assertTrue(date.today()-timedelta(days=1) in result)

    def test_prescription_overlap_handled(self):
      result = self.medicines[5].get_date_list_in_window(40)
      self.assertEqual(28, len(result))
      
    def test_nomedicine_noclash(self):
      result = self.patient.clash(["Aconite"])
      self.assertSetEqual(set(), result)
      
    def test_medicine_notinlist_noclash(self):
      self.patient.add_medicine(self.medicines[2])
      result = self.patient.clash(["Aconite","Belladonna"])
      self.assertSetEqual(set(), result)


After completing what I thought was my solution I took a look at Emily Bache's code, tests first. Quickly I saw that there were tests for a couple of conditions that I hadn't read into the spec. These were added and the code tweaked some more.
    def test_singlemed_clash(self):
      self.patient.add_medicine(self.medicines[0])
      self.patient.add_medicine(self.medicines[1])
      self.patient.add_medicine(self.medicines[2])
      result = self.patient.clash(["Aconite"])
      self.assertNotEqual([], result)
      
    def test_doublemedicine_clashself(self):
      self.patient.add_medicine(self.medicines[1])
      self.patient.add_medicine(self.medicines[1])
      result = self.patient.clash(["Belladonna"])
      self.assertNotEqual([], result)


So, tests done - focussed on exploring the possibilities of the spec. As far as possible they aren't tied to the internal structure; although there are also a couple of unit tests written to understand the details mixed in there.
The code has, it is fair to say, evolved over multiple refactoring passes. So what is here is where it ended up, rather than the instinctive first version. (I might add that the first version was written rather early in the morning, the refactoring at a more civilised time!) The process of refining to this was useful, and got me to explore a few language features. I'll show the code next, and then return to discussing the learning.
First the root of the tree, the patient. It enables medicine lookups and identifies clashes using a set intersection. This was also the first time I had used the "splat" operator to pass a list as a sequence of arguments.

'''
Created on 17 May 2015
Starting point: https://github.com/emilybache/KataMedicineClash/blob/master/Python/patient.py

Search for combinations of medicine from a list, which have all been taken at the same time. Report dates of such clashes.

@author: Dan Chalmers 
'''

class Patient(object):
    
    def __init__(self, medicines = None):
        self._medicines = medicines or []
    
    def add_medicine(self, medicine):
        self._medicines.append(medicine)
    
    #A useful step in building the solution, 
    #gives a list of sets of dates, one set per medicine taken that is in search list of meds
    def get_med_date_lists(self, medicine_names, days_back=90):
      return [ med.get_date_list_in_window(days_back) for med in self._medicines if med in medicine_names ] or [set()]
          
    def clash(self, medicine_names, days_back=90):
      med_lists = self.get_med_date_lists(medicine_names, days_back) 
      return set.intersection(*med_lists)
    

Next the medicine. A double-for list comprehension to get date lists and an overriding of equality to enable comparison with the medicine names, which come as a list of strings.

'''
Created on 17 May 2015
Starting point: https://github.com/emilybache/KataMedicineClash/blob/master/Python/medicine.py

A medicine, with a list of prescriptions for that patient.
Equality satisfied on name to simplify search.
Can search on dates it was taken, not stored pre-computed.

@author: Dan Chalmers 
'''

class Medicine(object):
    
    def __init__(self, name):
        self.name = name
        self.prescriptions = []
        
    def add_prescription(self, prescription):
        self.prescriptions.append(prescription)    
        
    def __eq__(self, other):
      return (self.name == other)

    def get_date_list_in_window(self, days_back):
      return set( [ med for p in self.prescriptions for med in p.get_date_list_in_window(days_back) ] )


Finally, the leaves of the tree, the prescription. This provides the list of dates within the search window. Prescription.get_date_list_in_window doesn't have its own test as it started life in with the Medicine.get_date_list_in_window function, and migrated out during refactoring. The existing tests described what I wanted of the function and carried on passing as I restructured.

'''
Created on 17 May 2015
Starting point: https://github.com/emilybache/KataMedicineClash/blob/master/Python/prescription.py

Prescriptions for medicine, a supply date and duration.
Can get a list of dates that overlap a start..today window.

@author: Dan Chalmers 
'''

from datetime import date, timedelta

class Prescription(object):
    
    def __init__(self, dispense_date=None, days_supply=30):
        self.dispense_date = dispense_date or date.today()
        self.days_supply = days_supply
        
    def get_date_list_in_window(self, window_start):
      start_date = max(self.dispense_date, date.today() - timedelta(window_start))
      end_date = min(self.dispense_date + timedelta(self.days_supply), date.today())
      return [ start_date + timedelta(day) for day in range((end_date - start_date).days) ]


As expected, this explored time a little - in particular the timedelta module, which was new to me. However, I practised the refactoring process and use of list comprehensions extensively, as I moved from explicit loops to comprehensions. The syntax is a little different to Haskell, but that remains my reference point despite being a little rusty. This exercise also explored sets more than I had before, and the set / list relationship. The splat operator and __eq__ function were also firsts. So, a lot of learning in this exercise and the result feels bigger than previous Katas. My solution remains a little different to the example solution, but I'm happy with the allocation of functions to classes and that the code feels readable even if I'm missing a few possibilities. There was quite a lot of referring to the API and refactoring in this. I hope the process of writing the functions will become more smooth with experience. I'll return to this Kata at some point, to help these techniques stick and see if they come easier.

Wednesday 13 May 2015

Learning Python (3) - Bowling Scores Kata

The Bowling Scores Kata is widely discussed. This version is after the description in Emily Bache's book. I note that the Ron Jeffries' description doesn't have the special symbol for spare, which simplifies things. Essentially the goal is to produce a score for a line of frames in 10 pin bowling. Not roll by roll, just the end score. It is just complex enough to allow a couple of different approaches, but still compact enough to solve quickly. It's an exercise I've given programming students, and done myself in Java a couple of times before. This got it onto my list of katas, to look at how my approaches to the two languages differ.

I've taken a couple of different approaches in the Java version to compare them. In Python my instinct was to take a functional approach: consume a frame, produce a score and the unconsumed frames. Consuming a frame is easy to build up test by test, to move from adding scores to handling the more complex features of spares and strikes. Once a frame can be handled, a whole game is easy.

So, tests first, in the order written.

import unittest
import bowlinggame

class Test(unittest.TestCase):

  def testScoreNumberTurn(self):
    (score, remainder) = bowlinggame.processTurn("12")
    self.assertEqual(3, score)
    self.assertEqual(0, len(remainder))

  def testScoreNoHit(self):
    (score, remainder) = bowlinggame.processTurn("--")
    self.assertEqual(0, score)
    self.assertEqual(0, len(remainder))
  #I'm sufficiently confident of the way scoreNormalRoll is being used not to test "1-" and "-1" etc
    
  def testScoreNumberTurnNotLast(self):
    (score, remainder) = bowlinggame.processTurn("1234")
    self.assertEqual(3, score)
    self.assertEqual(2, len(remainder))

  def testScoreSpareTurnNotLast(self):
    (score, remainder) = bowlinggame.processTurn("1/34")
    self.assertEqual(13, score)
    self.assertEqual(2, len(remainder))

  def testScoreStrikeFollowedByTwoBalls(self):
    (score, remainder) = bowlinggame.processTurn("X12")
    self.assertEqual(13, score)
    self.assertEqual(2, len(remainder))

  def testScoreStrikeFollowedBySpare(self):
    (score, remainder) = bowlinggame.processTurn("X1/")
    self.assertEqual(20, score)
    self.assertEqual(2, len(remainder))
    
  def testScoreStrikeFollowedByNumbers(self):
    (score, remainder) = bowlinggame.processTurn("X12")
    self.assertEqual(13, score)
    self.assertEqual(2, len(remainder))

  def testScoreStrikeFollowedByStrikeNumbers(self):
    (score, remainder) = bowlinggame.processTurn("XX23")
    self.assertEqual(22, score)
    self.assertEqual(3, len(remainder))
    
  def testScoreStrikeFollowedByStrikes(self):
    (score, remainder) = bowlinggame.processTurn("XXXX")
    self.assertEqual(30, score)
    self.assertEqual(3, len(remainder))

  def testSimpleGame(self):
    score = bowlinggame.processGame("12345123451234512345")
    self.assertEqual(60, score)

  def testPerfectGame(self):
    score = bowlinggame.processGame("XXXXXXXXXXXX")
    self.assertEqual(300, score)

  def testHeartbreakGame(self):
    score = bowlinggame.processGame("9-9-9-9-9-9-9-9-9-9-")
    self.assertEqual(90, score)

  def testAllSparesGame(self):
    score = bowlinggame.processGame("5/5/5/5/5/5/5/5/5/5/5")
    self.assertEqual(150, score)

if __name__ == "__main__":
    #import sys;sys.argv = ['', 'Test.testName']
    unittest.main()

And the code. The various methods were added to as the tests were produced, with a couple of refactoring passes.
'''
Bowling Game Kata
Created on 11 May 2015
@author: Dan Chalmers 
'''


def processGame(line):
  score = 0
  for _ in range(0,10):
    (turnScore, line) = processTurn(line)
    score += turnScore
  return score  
  
def processTurn(throws): 
  if throws[1] == '/':
    return (10+scoreExtraBalls(1, throws[2:]), throws[2:])
  elif throws[0] == 'X':
    return (10+scoreExtraBalls(2, throws[1:]), throws[1:])
  else:
    return (scoreNormalRoll(throws[0]) + scoreNormalRoll(throws[1]), throws[2:])

# Strikes and spares get 2 or 1 extra balls, with no further extras, that just score      
def scoreExtraBalls(extras, throws):
  if extras == 0:
    return 0
  # A spare extra ball will give 10 overall, needs some special code. 
  # But a strike in the extras works as a number!
  elif extras == 2 and throws[1] == '/':
      return 10
  else:
    return scoreNormalRoll(throws[0]) + scoreExtraBalls(extras-1, throws[1:])
    
def scoreNormalRoll(throw):
  if throw == '-':
    return 0
  elif throw == 'X':
    return 10
  else:
    return int(throw)

A version which pre-processes spares to 10-previous (and X to 10 and - to 0) would be almost as fiddly as this, even if the core algorithm is more compact. In this version the connection between the special outcomes and the behaviour is arguably clearer.

I've written using the "consume a turn" approach in Java too, and being able to return a pair rather than making a class to generate Frame objects instantly makes this neater. The Python version has fewer named functions, and these functions are marginally more complex - but, I think, readable entities.

I think that this Kata fits some of Python's strengths neatly: a functional approach without needing to store state, and processing a stream of values (my Phone Numbers solution also played to this).

Friday 8 May 2015

Learning Python (2) - Phone Numbers Kata

Another instalment in the "showing my first attempts" (and being open to any suggestions) series...

Last night, while waiting for the election coverage to warm up, I sat down with a drink and did another kata. This time it was the "phone numbers" Kata, based on the description in The Coding Dojo Handbook by Emily Bache. Which has a nice tidy spec:
Given a list of phone numbers, detect whether any are inconsistent, ie that is one number is a prefix of another.
In addition, the intention was to avoid explicit loops and have it be reasonably efficient for large data sets. In this case a recursively built tree representation of numbers came as the natural approach that I had sketched in my head before starting. (Just enough planning, even when test driven.)

So, tests first. Written top down.
import unittest
import phonenumbers

class Test(unittest.TestCase):

    def setUp(self):
      self.tree = phonenumbers.create_empty()

    def testFirstNodeCreated(self):
      added = self.tree.add_number("1", "Ann")
      self.assertTrue(added,"first number failed to add")

    def testSecondNumber(self):
      self.tree.add_number("1", "Ann")
      added = self.tree.add_number("2", "Bob")
      self.assertTrue(added,"second number failed to add")

    def testSecondNumberDuplicate(self):
      self.tree.add_number("1", "Ann")
      added = self.tree.add_number("1", "Ann2")
      self.assertFalse(added,"second number added but duplicate")

    def testTwoDigitNumbers(self):
      added = self.tree.add_number("11", "Ann")
      self.assertTrue(added,"first two digit number failed to add")
      added = self.tree.add_number("12", "Bob")
      self.assertTrue(added,"second two digit number failed to add")
      
    def testMixedDigitNumbers(self):
      added = self.tree.add_number("11", "Ann")
      self.assertTrue(added,"two digit number failed to add")
      added = self.tree.add_number("124", "Bob")
      self.assertTrue(added,"three digit number failed to add")
      added = self.tree.add_number("1256", "Claire")
      self.assertTrue(added,"four digit number failed to add")
      
    def testAddingPrefixOfExisting(self):
      self.tree.add_number("1234", "Ann")
      added = self.tree.add_number("12", "Bob")
      self.assertFalse(added,"prefix of existing number added incorrectly")
      
    def testAddingNumberExtensionOfExisting(self):
      self.tree.add_number("12", "Ann")
      added = self.tree.add_number("1234", "Bob")
      self.assertFalse(added,"extension of existing number added incorrectly")

    def testIgnoresPunctuation(self):
      self.tree.add_number("12 34-56", "Ann")
      added = self.tree.add_number("1234", "Bob")
      self.assertFalse(added,"did not ignore space")
      added = self.tree.add_number("12 3456", "Bob")
      self.assertFalse(added,"did not ignore dash")

    def testProcessConsistentFile(self):
      filename = 'phone_data.txt'  # Consistent
      consistet = phonenumbers.load_file(filename)
      self.assertTrue(consistet, "file " + filename + " was not consistent, expected it would be")
      
    def testProcessInconsistentFile(self):
      filename = 'phone_data_10000.txt'  # Has inconsistencies
      consistet = phonenumbers.load_file(filename)
      self.assertFalse(consistet, "file " + filename + " was consistent, expected it would not be")
      
    def testProcessLongFile(self):
      filename = 'phone_data_20000.txt'  # Consistent, by removing inconsistencies from example data. Allows rough performance check.
      consistet = phonenumbers.load_file(filename)
      self.assertTrue(consistet, "file " + filename + " was not consistent, expected it would not be")
      
if __name__ == "__main__":
    #import sys;sys.argv = ['', 'Test.testName']
    unittest.main()


No big surprises in there. The tests build up the data type that forms the tree, and the processing of input data to build it. The input files are from Emily Bache at gitbub, with the large consistent file made by editing an original. Each test required that I add some functionality, until the second and third tests with a file at the end. There were various refactorings along the way as you'd expect, and a bit more today to tidy up.

The code looks like this:
def create_empty():
  return Number()

def load_file(filename):
  tree = create_empty()
  result = True
  with open(filename, 'r') as f:
    for line in f:
      result = result and _load_line_(tree, line)
  return result
  
def _load_line_(tree, line):
  fields = line.split(',')
  consistent = tree.add_number(fields[1].strip(), fields[0])
  if not consistent:
    print line + "is not consistent!"
  return consistent

class Empty():
  ''' A node in a phone number tree. 
  Empty base type that is used to pack the array of pointers to next branches in a tree 
  where there isn't a branch for that number yet.
  '''
      
class Number():
  '''A number node, which has entries for each digit 0-9 that will contain another Node.
  '''
  
  def __init__(self):
    self.numbers = [Empty()] *10
    
  def add_number(self, number, name):
    '''Add a number to the tree, provided 
    there isn't already a number which is a prefix of this number
    or this number isn't a prefix of an existing number
    '''
    (digit, entry, number) = self._get_digit_entry_(number)
    if isinstance(entry, Empty):
      self.numbers[digit] = self._make_next_node_(number[1:], name)
      return True
    elif isinstance(entry, Name) or len(number[1:]) == 0:
      return False
    else:
      return entry.add_number(number[1:], name)
      
  def _get_digit_entry_(self, number):
    if number[0] == ' ' or number[0] == '-':
      return self._get_digit_entry_(number[1:])
    
    digit = int(number[0])
    entry = self.numbers[digit] 
    return (digit, entry, number)
        
  def _make_next_node_(self, number, name):
    if len(number) > 0:
      node = Number()
      node.add_number(number, name)
      return node
    else:
      return Name(name)
  
  
class Name():
  '''A leaf node in the number tree, representing a person with a complete phone number.
  '''
  def __init__(self, name):
    self.name = name


The use of classes betrays an OO habit, but I got a better idea of making Python classes through this. I had started with something closer to a Haskell data type in my head. I also had a version along the way with an abstract base class, when I had forgotten that lists in Python don't require a common type. That's been refactored out, but was an interesting learning step as it was a language feature I hadn't tried out before. I daresay the Empty and Name classes could be replaced with something built in, but I'm happy with the added expression having them gives.

The add_number function was built up gradually, a clause at a time, with the tests driving additions. But as I said above, I had a plan for where I was going from the start. Handling the punctuation in input files was a requirement that emerged in addition to that plan.

The line by line processing of the file didn't easily submit to using map and list comprehensions, although I did try. Given that the data set may be long I prefer the line by line approach to a "load file then process" approach. So, I have one explicit for loop.

And I was done well before the night got interesting.


Tuesday 5 May 2015

Learning Python (part 1)

My programming language CV is pretty Java centric since about 1999, with some Haskell along the way, a general background of SQL and shell scripts for most of my career and a selection of languages that are very much occasional or in the past and I'm not going to feed to a search engine!

An impending change of day-job sees me needing to get to grips with Python. It's a language I've been aware of for years, but never had cause to use until now. So, how to get familiar? Instalment one of what ought to be a series below. I'm deliberately exposing the learning process here - I'm not going to make claims that these are brilliant examples of python. It will be interesting to return to these as I develop...

Step 1, is obviously to take a look at the python.org web site and get a general feel. Maybe write "hello world".
Step 2, is to run through a tutorial. It isn't one of the "7 languages" so I looked around and found Google's Python Class and worked through that. It was pretty useful, gave me a sense of the strengths and style of the language, and an overview of key features.
Step 3 will be to have a bash at a few Katas. I'm a firm believer in learning by doing and in the case of programming doing comes out as a mix of thinking and typing. I want to do some more or less unguided problem solving, and find an approach that fits the way I like to (behaviour in unit) test drive my code. Katas give a good chance of finding a selection of other solutions out there to compare with and tend to be a tidy quiet-evening self contained activity.
Step 4 will be to build something bigger, probably with Django. But I'm still firmly at step 3.

First up was the "String Addition" kata, as described by Roy Osherove. My code is below, tests are written in the order I wrote them.

import unittest
from string_calc import add

class Test(unittest.TestCase):

    def testAddNothing(self):
      self.assertEqual(0, add(""), "empty string should return zero")
        
    def testAddOne(self): 
      self.assertEqual(1, add("1")) 
       
    def testAddTwo(self):
      self.assertEqual(3, add("1,2"))

    def testAddThree(self):
      self.assertEqual(6, add("1,2,3")) 

    def testAddLongerNumbers(self):
      self.assertEqual(111, add("1,10,100"))
      
    def testHandleWhitespace(self):
      self.assertEqual(10, add("1\n2, 3 4"), "not handling whitespace")
      
    def testShouldRejectNegatives(self):
      self.assertRaises(ValueError, add, "1, -2, 1")
      
if __name__ == "__main__":
    #import sys;sys.argv = ['', 'Test.testName']
    unittest.main()

'''
String Calculator Kata
from problem by Roy Osherove
Created on 3 May 2015

@author: Dan Chalmers
'''

import re
from sys import argv

def add(number_string):
  # break the string into a list of number strings
  number_list = re.findall(r'-?\d+', number_string)
  
  # check for bad input
  if any(number[0] == '-' for number in number_string):
    raise ValueError("")
  
  return sum(map(int, number_list))
    
  
if __name__ == '__main__':
    print(add(argv[1]))

So, what did I learn?

  • Some familiarity with PyDev in eclipse. I spent a little while working out what was there and how I'd want to use it in a bigger project, not just pushing at the path of least resistance.
  • A first go with unittest. This was agreeably straightforward, although I'd cheerfully loose all those "self." references.
  • Reminded myself of the basics of python and found that things were easier by the end.
  • There were a few multiple tests failing runs while I sorted the regular expression and use of re out, but generally each move added something or refactored without breaking the earlier cases.
  • I found myself moving from
    split(<separator>) to
    re.findall('\d+', numbers)
    in order to handle newline and commas together. At that point being able to specify delimiters stopped making much sense as well formatted input is promised. Using a number as part of a delimiter felt bloody minded so had a slight spec out of the pram moment!
  • Gary Bernhardt's solution video reminded me of sum, any and map, so I refactored to use them as I do like that more functional style and wanted to get it in my fingers. Before that it used explicit loops.
Next stop, something maybe a little more involved and with a deliberate "take a functional / no loops" constraint.

Moving to Docker

I know, everyone's been excited about Docker lately. Time to join in.
Why?
For this project, I see three advantages:
  1. Between my development environment, CI server, and deployment I've got some very similar things running. But not quite the same, e.g. deployment has an nginx front end, non-development has firefox and Xvfb running for testing. However, I'd really like the common bits to be identical to make the tests as useful as possible. Lightweight builds of platform updates would be good too.
  2. Auto-deploying new web apps from Jenkins is suffering from a little friction. At the moment it involves a manual step. I'm sure there are other ways of solving this, but as I still haven't done so a way of running new versions for low effort and a reasonable security risk remains on my shopping list. Maybe Docker can help with this?
  3. The distribution of services between servers ought to be a flexible thing. Docker does that, and from a first look version controlled Dockerfiles lend themselves to self-documenting, generated from a script containers. 
What I'm not planning to do, first off at least:
  • Docker for my CI master. There's only one, it just has Jenkins and Java, that can stay as a VM.
  • Docker for my data volumes. For the databases there will just be one of each master / slave. All of the web content comes from the project build. The big user generated data files go to S3. So there isn't much need for deploying multiple copies. I'd also like to keep the data as just a disk for now as that ought to make reversing out of docker easier, should I choose to do so.


Friday 20 March 2015

Just One

I was eyeing up the current (and forthcoming) mac laptops the other day, as you do. I was struck by the "one port" fascination. I've yet to buy anything that uses thunderbolt, so the USB-C thing makes no difference to me. But I am struck that that having one port isn't a major leap forward - it just offloads the demultiplexing problem somewhere else.
For instance, as I sit here at my desk my laptop has 5 wires coming out of it: power, ethernet (wireless in my office is variable), backup disk (happens to be old enough for firewire but can use USB), pictures to the monitor and USB to the monitor. Before I wrote this I copied a file to a USB memory stick, in case things aren't as expected where I'm going later. The monitor's USB ports are full too: audio interface, keyboard, mouse and fitbit dongle. I also often plug in a cable to access a phone; my laptop connects to a projector on average once a week.
At home there's a similar story: I've got a backup disk (spot the ex-sysadmin who found time machine over wireless unreliable), an external disk of pictures and music (editing photos is just faster on a wire than a wireless network), an ipod, a camera (memory card or USB), occasional connection to the TV for family streaming, occasional USB stick and power. And with no external screen to act as a hub and moving between rooms with the computer no fixed place for a separate box to demultiplex whichever combination of stuff I'm using.
So, if I got a new laptop, could I make one port work? Probably. Though it would involve spending money on new adapters, maybe hardware and some rechargeable batteries for wireless stuff just to get the current, or equivalent, gadgets connected. Would I gain anything? Not as far as I can see. I'd still have boxes and wires, just slightly rearranged. I don't recall ever thinking "I wish I had fewer sockets on my computer".

Which gets me to the software analogy. Message busses are all very well. I've written and used more than one. But if they're the only way of communicating in the whole system, they bring their own challenges.
Scalability for one.
Fit to function for another. I don't mind having a separate channel for audit logs, or system management. If the bus breaks or load goes up I still want these important functions, without creating headaches for security, priority and API clarity.
Does doing all communications (or persistence) in one way buy me anything? I like reuse and patterns as much as the next OO-person. There are limits to how much complexity I want to put up with and a common approach can reduce both local and system complexity by sharing the overhead. But, increasing system complexity needs this payoff. Some connections are special, some adapters are too expensive.
A fundamental point I make when training people to solve problems is that there are many ways to solve even quite simple problems. Some may be better than others, but the concept of a solution space, where each solution has its own properties, is an important one. If you vary the parameters of the problem you vary the point in the solution space that is the best fit. There may be a solution that is a pretty good fit to many problems, but far from ideal for some cases. Shading the whole of this space with the biggest crayon you can find isn't big or clever.

Friday 6 March 2015

Continuous Deployment and Programmers on Call

Two days ago I was attending an interesting Brighton Java talk on continuous deployment by Jose Baena followed by a discussion led by James Stanier. I heard about some new tools; got reassurance that what I was doing was also, roughly, someone else's approach; and that the tricky problems were felt by others too.
One of the discussion points was "Should developers always be on-call for their code?". My gut reaction was "always? No! I've done that - 18 months with a pager - no way, it's soul sapping." In the shower this morning I arrived at a more nuanced answer, that (of course) agrees with my gut reaction. The reasons are intertwined, bear with me while I pull them apart a bit...
First, there's some basic pragmatism: What do you do with the code written by someone that left? If you pair program, who's on call - both of you, take it in turns? If I've maintained a method in a class do I suddenly get the entire class?
Second, and leading from that thought, is the issue of who dials the number of the right person? If an organisation gets beyond about 6 people (arguably 2 if you work in layers of a stack) there will be failures where it isn't immediately apparent who owns it. Now you need 1st and 2nd line support. By the time first line has isolated that it is a particular unit which is at fault, not the database or the user interface it connects to, then they are probably in a position to make the fix faster than they can get the 2nd line on the job.
James offered the thought (bait?) on the night that being on call might prompt a sense of responsibility. The come back from the floor (not me) on that was that developers paid for call outs wouldn't be motivated to produce good code. Getting called becomes a source of income. My experience is that after a while, with a decent wage, the intrusion on my life mattered more than the double time. However, to some extent I think he is right - throwing code over a wall to be someone else's problem isn't how I like to work. Even in the 1990s I mixed development, test and support. However, it does lead me to my third point: given the first two issues, it is clear that matching an individual to a block of code is problematic. Many problems require that the on-call person has the sophistication to move up and down the stack, look at systems issues, understand the relationships between the parts. So, maybe the on-call needs to be at a team level. There's still a responsibility. But there's also the responsibility to your team mates' social life, not to mention being fingered for code which was hard to understand and fix!
And there's the fourth point: Yes, feel responsibility for your code. This isn't ownership* though. As a team the power to fix the code needs to be dynamically allocated.
So, being on call - as a team, but on rotation - motivates responsibility and quality. It gives a reason for good testing, code reviews, not sitting in a corner but getting the whole team's work, and seeing the full development cycle that putting development and ops together implies.


* I don't mean avoiding "ownership" in the sense of "I'm going to own this problem", but in the sense of "this code is my precious, it's mine!" or "I'm leaving the company and taking it with me".

Sunday 18 January 2015

Thinking and Programming Videos

I'm in the process of making some videos (screen casts), showing how I arrive at solutions to programming exercises. They are for students learning programming, rather than experienced programmers, though that would be interesting too. The idea of using video is to expose the thought process. I'm not trying to show "follow these steps to produce this code". The problem solving skills that lead to those steps are the interesting thing.

Producing the videos has been a learning experience for me too. The sustained thinking aloud is quite fun. Recording and editing is quite a time sink, but is gradually becoming smoother - and the results more polished as it becomes more familiar. The unexpected thing was how much it exposes the stuff that interferes with mental flow: being hungry, interruptions, outside noises and so on. Not only am I editing out the interruption, but there's a typically a run of edits required as I make a couple of small mistakes or fail to express myself as clearly as I'd like over the few minutes that it takes to get back into my stride.

The video I'm happiest with so far is below. (It is also the shortest!) This follows the bit of the "Money" problem where inheritance isn't the solution. In some ways this was also the most challenging lab so far. Most of the time the thing to show is "the right way". Here the exercise is, at least partly, about recognising when things have gone wrong and rolling back is the best choice. (It is also about the composition vs inheritance choice, but composition is the next exercise.) Important issues, but in my experience "this is wrong and here's why" is avoided in the classroom.