Friday, December 26, 2014

Digital Ocean

In addition to working on Haskell, I'd like to start learning some C in 2015. I've started working through the online version of Zed Shaw's Learn C the Hard Way.

Since I work on both Windows and OS X (and carry a Chromebook around), I've been using Digital Ocean's SSD cloud server for $10 / month. So far it's worked well -- I've been running Ubuntu and have had no problems installing packages for Python, Haskell, and C. It's nice to have a consistent environment regardless of where I'm working, and I just rysnc file directories back to Dropbox for backup.

Think like a Fundamentalist, Code like a Hacker...

I've just finished Eric Meijer's Intro to Functional Programming on edX. Since the course was taught in Haskell, I also managed to get my feet wet with the language and now feel prepared to work through Learn You a Haskell and Real World Haskell.

One of the course's mantras has been "think like a fundamentalist, code like a hacker". The idea is to start with an approach that is fundamentally correct and iteratively refine the implementation. This is a pretty convincing philosophy for Haskell (and perhaps functional programming in general), since things like type checking and pattern matching facilitate reasoning about the correctness of programs. An interesting question that follows, however, is how to balance readability/clarity vs the efficiency of solutions that are fundamentally sound but "hacked".

There's a good example in Graham Hutton's Programming in Haskell called "Making Append Vanish" (chapter 13, section 6). We start with an example of a recursive function that reverses the elements of a list:

reverse              :: [a] -> [a]
reverse []            = []
reverse (x:xs)        = reverse xs ++ [x]

Because this solution using the ++ append operator is relatively inefficient (requiring quadratic time), Hutton demonstrates using induction that a linear time solution can be derived:

reverse              :: [a] -> [a]
reverse xs            = reverse' xs []

reverse'             :: [a] -> [a] -> [a]
reverse' [] ys        = ys
reverse' (x:xs) ys    = reverse' xs (x:ys)

At least for me, the second solution takes much longer to digest, and hence is less clear -- but it is demonstrably correct and far more efficient. I'm not sure if it's really fair to call the second solution "hacked". But it does illustrate an interesting aspect of the functional programming paradigm, which I'm looking forward to exploring further.

Wednesday, December 24, 2014

Intro to Git and GitHub

While working through the last two weeks of the Functional Programming course on edX, I took a break to run through the short How to Use Git and GitHub class on Udacity.

I've been using Git for a while, but this was a nice introduction to GitHub -- the course is very well-designed and felt like a good balance of fundamental principles and practical application.

It also pointed me to these two very useful environment tweaks:


Tuesday, November 11, 2014

Hiatus and Haskell

I've been on hiatus as I started a new job this year... but I have resumed some coursework and hope to document progress on a more regular basis.

Erik Meijer's Introduction to Functional Programming on EdX has been an excellent (and gentle) introduction to the subject, and has also served as a good way to dive into Haskell. I suspect Learn You a Haskell will be much easier to digest after this course.

Monday, February 3, 2014

Project Euler 51 - 60

Project Euler Solutions to Questions 51 - 55
Project Euler Solutions to Questions 56 - 60

With some of the problems, I'm starting to run up against the one-minute execution time guideline.

From a computational perspective, I feel as if I'm still reasonably well equipped to handle the problems. Various parts of my Coursera and Udacity education have come in handy, including dynamic programming, basic cryptography, and the general approach to breaking down problems as presented in Peter Norvig's class.

From a number theoretical perspective, though, I'm running into some trouble.  In Problem 60, for example, the naive brute-force approach runs in O(n^5), where n is on the order of 10,000 (and possibly much larger). There are surely ways to cut down the search space, but I need to do some more reading on primes.

On the subject of primes, I need to find a better primality test.  So far I've been able to get by with a combined approach of:
  • generating all primes that are less than 10,000,000 via the Sieve of Erastothenes
  • checking primality of larger numbers by prime factoring and counting the number of factors
Maybe the next step is to implement a faster sieve or something like the Miller-Rabin test.

Saturday, February 1, 2014

Project Euler 36 - 50

Project Euler Solutions to Questions 36 - 40
Project Euler Solutions to Questions 41 - 45
Project Euler Solutions to Questions 46 - 50

Problem 46 is about one of Goldbachs' conjectures that turns out to be false.

The original Goldbach conjecture is an interesting problem.  Following Wikipedia's example, I tried creating a graph that shows the number of ways to represent even integers as the sum of two primes (the number of "Goldbach partitions").  From the small dataset, it looks like the number of Goldbach partitions increases with integer size, and moreover there is some periodicity to the number of partitions. But of course, such observations are meaningless in the context of mathematical proofs.

Monday, January 27, 2014

Project Euler 26 - 35

Project Euler Solutions to Questions 26 - 30
Project Euler Solutions to Questions 31 -35

I find myself using more functional style components these days, e.g. recursion, list comprehension, and functions with single-line returns to check multi-clause conditions.  There's an elegance to the simple, piece-meal constructs.  My next big project is to learn Haskell...

Friday, January 24, 2014

Project Euler 21-25

Project Euler Solutions for Questions 21-25

Two of the problems require finding the sum of all proper divisors of a number.   The algorithm needs to be reasonably efficient, since it is iterated over a non-trivial set of numbers (e.g. all integers from 1 to 10,000).

The approach I've implemented runs well enough (about 1 second for all integers from 1 to 10,000), but I suspect there's a much faster approach to part 2.
  1. As an optimization step, pre-generate a list of primes via a sieve.  Decompose each number into its prime factors, e.g. [2, 2, 7]  for the number 28.

  2. For each number, generate the list of proper divisors from the list of prime factors via the following steps:

    • Generate the power set of the list of prime factors.  Trim the head (the empty element) and the tail (the sum product of which is the number itself).  Using [2, 2, 7] as an example, the trimmed power set  is [[2], [2, 2], [7], [2, 7]].

    • For each list in the power set, find the sum product. Including the number 1 as a divisor, this results in [1, 2, 4, 7, 14].

    Given the list of proper divisors, take its sum, e.g. 1 + 2 + 4 + 7 + 14 = 28.

Project Euler 11-20

Project Euler Solutions for Questions 11-15

Project Euler Solutions for Questions 16-20

Wednesday, January 22, 2014

Project Euler 1 -10 & IPython

I've started working my way through the Project Euler problems.  Even in the first ten problems, it's been interesting to see how brute-force search approaches compare to solutions more oriented in number theory.

For this kind of modular problem solving, IPython has proved a fantastic environment to work in for a number of reasons:
  • Simple text markdown + MathJax support for referencing problem text
  • The "cells" allow for easy recording and reference of intermediate results 
  • The "magic" functions like "%timeit" and "%prun" are useful for simple timing and profiling
  • Integrated packages/languages are easily accessible -- so far I've used SymPy and Cython

Project Euler Solutions for Questions 1 - 10

Thursday, January 9, 2014

Top Four

Evernote has a new programming challenge, which I found to be more interesting than the other three posted previously:
Given an unordered list of N elements, write a function to find the top four elements of the given list. Make sure your function runs in linear time O(N).
The solution is trivial for O(N log N), i.e. the optimal running time of comparison sort algorithms.  It's not possible to sort the entire list any faster in a deterministic way.  But because only the top four elements are required, we just need to identify the fourth largest element in O(N) time.

So, a selection algorithm.  The fourth largest element can be found in linear time by partitioning the input list recursively, using the median of medians to identify optimal pivot points.  I recall that this algorithm was taught in the early weeks of Algorithms, Part I -- and it's still the most surprising application of divide-and-conquer that I've seen so far.

Once the fourth largest element is identified, returning the top four elements follows immediately in linear time.

Tuesday, January 7, 2014

Beauty and Learning

I've just completed Peter Norvig's class on Udacity, the Design of Computer Programs.

The course surprised me more than any other I've started or completed so far, primarily due to the artfulness of Norvig's insights and solutions.  We often think of computer science (and science in general) as highly logical and structured, in contrast to the more creative, fluid nature of the arts.  But programming languages, like natural languages, can produce an infinite set of utterances, and some are bound to be better than others.

What is "better"?  The quality of computer programs can, of course, be measured by objective measures like correctness, running time, and space efficiency.  More often than not, programs that satisfy all three are optimized for simplicity (thought not always readability).  Simplicity is a good measure.  When people talk about quality, though, they also refer to related but distinct things like elegance and beauty. It's even in written into the Python language:
>>> import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
When I think of what I've learned in the class, it's hard to point to specific things like data structures or algorithms.  But the class has enhanced my appreciation for the depth of knowledge and skill that underlies good programming.  There are murmurings of dissatisfaction in the Udacity forums and in various other reviews, primarily around the perceived difficulty and ambiguity of some of the course materials.  Maybe the frustration is only natural; unlike most textbook material, Norvig's solutions are downright unexpected and beautiful, a result of what I can only image is tens of thousands of hours of experience and critical thinking.  Just the sort of learning he advocates in Teach Yourself Programming in Ten Years.