Friday 8 May 2015

Learning Python (2) - Phone Numbers Kata

Another instalment in the "showing my first attempts" (and being open to any suggestions) series...

Last night, while waiting for the election coverage to warm up, I sat down with a drink and did another kata. This time it was the "phone numbers" Kata, based on the description in The Coding Dojo Handbook by Emily Bache. Which has a nice tidy spec:
Given a list of phone numbers, detect whether any are inconsistent, ie that is one number is a prefix of another.
In addition, the intention was to avoid explicit loops and have it be reasonably efficient for large data sets. In this case a recursively built tree representation of numbers came as the natural approach that I had sketched in my head before starting. (Just enough planning, even when test driven.)

So, tests first. Written top down.
import unittest
import phonenumbers

class Test(unittest.TestCase):

    def setUp(self):
      self.tree = phonenumbers.create_empty()

    def testFirstNodeCreated(self):
      added = self.tree.add_number("1", "Ann")
      self.assertTrue(added,"first number failed to add")

    def testSecondNumber(self):
      self.tree.add_number("1", "Ann")
      added = self.tree.add_number("2", "Bob")
      self.assertTrue(added,"second number failed to add")

    def testSecondNumberDuplicate(self):
      self.tree.add_number("1", "Ann")
      added = self.tree.add_number("1", "Ann2")
      self.assertFalse(added,"second number added but duplicate")

    def testTwoDigitNumbers(self):
      added = self.tree.add_number("11", "Ann")
      self.assertTrue(added,"first two digit number failed to add")
      added = self.tree.add_number("12", "Bob")
      self.assertTrue(added,"second two digit number failed to add")
    def testMixedDigitNumbers(self):
      added = self.tree.add_number("11", "Ann")
      self.assertTrue(added,"two digit number failed to add")
      added = self.tree.add_number("124", "Bob")
      self.assertTrue(added,"three digit number failed to add")
      added = self.tree.add_number("1256", "Claire")
      self.assertTrue(added,"four digit number failed to add")
    def testAddingPrefixOfExisting(self):
      self.tree.add_number("1234", "Ann")
      added = self.tree.add_number("12", "Bob")
      self.assertFalse(added,"prefix of existing number added incorrectly")
    def testAddingNumberExtensionOfExisting(self):
      self.tree.add_number("12", "Ann")
      added = self.tree.add_number("1234", "Bob")
      self.assertFalse(added,"extension of existing number added incorrectly")

    def testIgnoresPunctuation(self):
      self.tree.add_number("12 34-56", "Ann")
      added = self.tree.add_number("1234", "Bob")
      self.assertFalse(added,"did not ignore space")
      added = self.tree.add_number("12 3456", "Bob")
      self.assertFalse(added,"did not ignore dash")

    def testProcessConsistentFile(self):
      filename = 'phone_data.txt'  # Consistent
      consistet = phonenumbers.load_file(filename)
      self.assertTrue(consistet, "file " + filename + " was not consistent, expected it would be")
    def testProcessInconsistentFile(self):
      filename = 'phone_data_10000.txt'  # Has inconsistencies
      consistet = phonenumbers.load_file(filename)
      self.assertFalse(consistet, "file " + filename + " was consistent, expected it would not be")
    def testProcessLongFile(self):
      filename = 'phone_data_20000.txt'  # Consistent, by removing inconsistencies from example data. Allows rough performance check.
      consistet = phonenumbers.load_file(filename)
      self.assertTrue(consistet, "file " + filename + " was not consistent, expected it would not be")
if __name__ == "__main__":
    #import sys;sys.argv = ['', 'Test.testName']

No big surprises in there. The tests build up the data type that forms the tree, and the processing of input data to build it. The input files are from Emily Bache at gitbub, with the large consistent file made by editing an original. Each test required that I add some functionality, until the second and third tests with a file at the end. There were various refactorings along the way as you'd expect, and a bit more today to tidy up.

The code looks like this:
def create_empty():
  return Number()

def load_file(filename):
  tree = create_empty()
  result = True
  with open(filename, 'r') as f:
    for line in f:
      result = result and _load_line_(tree, line)
  return result
def _load_line_(tree, line):
  fields = line.split(',')
  consistent = tree.add_number(fields[1].strip(), fields[0])
  if not consistent:
    print line + "is not consistent!"
  return consistent

class Empty():
  ''' A node in a phone number tree. 
  Empty base type that is used to pack the array of pointers to next branches in a tree 
  where there isn't a branch for that number yet.
class Number():
  '''A number node, which has entries for each digit 0-9 that will contain another Node.
  def __init__(self):
    self.numbers = [Empty()] *10
  def add_number(self, number, name):
    '''Add a number to the tree, provided 
    there isn't already a number which is a prefix of this number
    or this number isn't a prefix of an existing number
    (digit, entry, number) = self._get_digit_entry_(number)
    if isinstance(entry, Empty):
      self.numbers[digit] = self._make_next_node_(number[1:], name)
      return True
    elif isinstance(entry, Name) or len(number[1:]) == 0:
      return False
      return entry.add_number(number[1:], name)
  def _get_digit_entry_(self, number):
    if number[0] == ' ' or number[0] == '-':
      return self._get_digit_entry_(number[1:])
    digit = int(number[0])
    entry = self.numbers[digit] 
    return (digit, entry, number)
  def _make_next_node_(self, number, name):
    if len(number) > 0:
      node = Number()
      node.add_number(number, name)
      return node
      return Name(name)
class Name():
  '''A leaf node in the number tree, representing a person with a complete phone number.
  def __init__(self, name): = name

The use of classes betrays an OO habit, but I got a better idea of making Python classes through this. I had started with something closer to a Haskell data type in my head. I also had a version along the way with an abstract base class, when I had forgotten that lists in Python don't require a common type. That's been refactored out, but was an interesting learning step as it was a language feature I hadn't tried out before. I daresay the Empty and Name classes could be replaced with something built in, but I'm happy with the added expression having them gives.

The add_number function was built up gradually, a clause at a time, with the tests driving additions. But as I said above, I had a plan for where I was going from the start. Handling the punctuation in input files was a requirement that emerged in addition to that plan.

The line by line processing of the file didn't easily submit to using map and list comprehensions, although I did try. Given that the data set may be long I prefer the line by line approach to a "load file then process" approach. So, I have one explicit for loop.

And I was done well before the night got interesting.

No comments:

Post a Comment