A Free Education: 2015

Sunday, September 20, 2015

F# and Functional Pearls

After encountering the language last year, I've finally gotten around to studying F# from Expert F#. So far, the language has proven much friendlier than Haskell (though it's not really a fair comparison since I'm now more aware of functional constructs).

As practice, I've started implementing some Project Euler problems in F#. I've also been working on porting some Haskell "functional pearls" from Richard Bird's excellent book Pearls of Functional Algorithm Design. The going is slow but rewarding, as it's turning out to be a great way to practice F# while also learning more about functional programming in general.

One chapter that I've worked on is the "countdown" problem, in which the goal is to use some numbers from a given list to generate an arithmetic expression that evaluates to a value as close to target as possible. For example, given a list of numbers [1, 3, 7, 10, 25, 50] and a target of 831, the solution is:

> display(countdown1 831 [1;3;7;10;25;50]);;

(7+((1+10)*(25+50))) = 832

https://github.com/tkuriyama/functional-pearls/tree/master/countdown

Friday, August 28, 2015

Longest Subpalindrome in Linear Time

I've finally rewritten the linear time algorithm for finding the longest palindromic substring ("subpalindrome") within a given string. I originally came across this problem in Peter Norvig's class on Udacity and have worked on it quite a bit since then.

I'm not altogether satisfied with the clarity of the code -- it is quite imperative and stateful, with many indexing variables -- but I don't see a way to meaningfully refactor without changing the fundamental structure, so I'll leave that for another day.

Algorithm Description and Visualization
http://tarokuriyama.com/projects/palindrome2.php

Code
https://github.com/tkuriyama/palindrome

Saturday, August 15, 2015

Blacksmith Puzzle Solved

The blacksmith puzzle from a previous blog post is now solved!

I experimented with a number of heuristics, but I wasn't able to come up with a very good way to rank next moves.

In the end, the combinatorial solution works fine -- it's just that the solution space was unnecessarily polluted by permutations of the same visited states. Modifying the hash of visited states to use sorting ensures uniqueness, and the solution now runs in a few seconds. Interestingly, the magic number of 13 chain links appears to be necessary, as the solver didn't find a solution when fed 12 chain links as a starting state.

https://gist.github.com/tkuriyama/8de2711675350645349d

Tuesday, August 11, 2015

Russian Railroad Puzzle

This was a fun and quick puzzle as the technique from Peter Norvig's solution to the Zebra Puzzle can be reused directly. It's very nice to be able to formulate the conditions in terms that are clear and almost like plain English.

https://gist.github.com/tkuriyama/5a0f5d886a3578a72d58

Passengers in a Railroad Compartment

Six passengers sharing a train compartment are from Moscow, Leningrad, Tula, Kiev, Kharkov, and Odessa.

A and the man from Moscow are physicians

E and the Leningrader are teachers.

The man from Tula and C are engineers.

B and F are WWII veterans, but the man from Tula has never served in the armed forces.

The man from Kharkov is older than A.

The man from Odessa is older than C.

At Kiev, B and the man from Moscow got off.

At Vinnitsa, C and the man from Kharkov got off.

Match initials, professions, and cities. Also, are the facts both sufficient and necessary to solve the problem?

from Boris A Kordemksy, The Moscow Puzzles: 359 Mathematical Recreations, p111.

Sunday, August 9, 2015

Blacksmith Khecho's Ingenuity

Another puzzle from Dover's Moscow Puzzles. Building on similar structures and ideas to The Whims of Three Girls, it was much easier to conceive and develop a combinatorial solution for this one.

However... the solution does not terminate!

Presumably this is due to the combinatorial search space being much larger -- even with a small sub-problem with two people weighing 20 and 30 pounds and six of the 10-pound chains, the solver finds 226 solutions.

Solution # 1 of 226

[20, 30, 10, 10, 10, 10, 10, 10] | * [] --- [] | []
[10, 10, 10, 10, 10, 20, 30] | [10] --- [] * | []
[10, 10, 10, 10, 20, 30] | [10] --- [10] * | []
[10, 10, 10, 20, 30] | * [10] --- [10, 10] | []
[10, 10, 10, 30] | [10, 20] --- [10, 10] * | []
[10, 10, 10, 30] | * [10] --- [10, 10] | [20]
[10, 10, 10, 10, 30] | * [] --- [10, 10] | [20]
[10, 10, 10, 10] | [30] --- [10, 10] * | [20]
[10, 10, 10, 10] | * [] --- [10, 10] | [20, 30]

When I use the problem inputs, the solution candidate stack hovers rises steadily to about 1000 candidates, falls to 700 or so over a minute or two, and begins to rise steadily thereafter.

The optimization is interesting to think about because it is not obvious how to evaluate the merits of different states. The only trivial optimization I've done is to permit people to jump off the balance and crash it, if the move results in a complete puzzle (not explicitly addressed in the problem statement, and a plausible interpretation).

I will come back to this later as I haven't made much progress in the past day.

Sunday, August 2, 2015

The Whims of Three Girls

3 girls, each with her father, go for a stroll. They come to a small river. One boat, able to carry 2 persons at a time, is at their disposal. Crossing would be simple for everyone except the girls. None is willing to be in the boat or ashore with 1 or 2 strange fathers, unless her own father is present too. How do they all get across?

I came across the above in a book of puzzles. It's an interesting exercise in modeling a game space and generating correct state progressions. I started working on a Haskell version with a friend but the full script is still well beyond my reach. Meanwhile, my Python version works reasonably well and BFS returns what appears to be a minimum-length solution in approximately 75 ms on my 2014 MacBook Pro. This served as yet another lesson in the importance of modularity and testing to the end of writing code that's clear and correct. More practice required...

https://github.com/tkuriyama/fording-puzzle

Tuesday, July 28, 2015

DNA Sequencing

A quick post before July expires...

I've been following Algorithms for DNA Sequencing on Coursera this month. As one might imagine, it's mainly about algorithms related to string / sequence matching and alignment. The "sequence assembly" problem is something I haven't come across before and is quite intriguing -- especially how it relates to the history of the human genome project.

I'm a little surprised by the course because I realize, in comparison, how good almost every other Coursera course I've taken has been. This course is about what you'd expect in a resource that's available for free: the instructor is clearly knowledgeable and interested in teaching, but the details are often vague and the explanation of algorithms are not very rigorous. It's still excellent as freeware, but I'm not sure I'd be happy if I'd paid Coursera to take the course as a "Signature Track" student.

As an aside, I found myself pulling my hair out due to a Python performance issue and a little bit of line profiling magic in iPython saved the day.

http://pynash.org/2013/03/06/timing-and-profiling.html

It turns out that calling in dict.keys() to check for key presence in a dictionary is silly as it creates a temporary list; in dict is obviously the way to go as it does what you would expect -- the hash lookup.

Wednesday, June 10, 2015

Mnemosyne

I've been doing some old Google Code Jam programs for fun. One paradigm that comes up repeatedly as a useful tool is Dynamic Programming (DP). In this one called "Welcome to Code Jam", for example, it is fairly obvious that the complexity of naive approaches will increase dramatically with large datasets. A DP algorithm, on the other hand, solves the problem handily in O(mn) time, where m is defined by number of characters in the problem text and n is defined the number of characters in "welcome to code jam".

So what does the difference in running time actually look like, between a naive and DP algorithm? In this case, one can conceive of visualizing the number of array accesses required for each logical step of the respective algorithms. As a start, I wrote a Python module to help with logging array accesses (in a slightly more generalized manner, with support for tracking basic interactions across a number of types using Abstract Base Classes):

https://github.com/tkuriyama/mnemosyne

Saturday, May 30, 2015

Nand to Tetris

After another hiatus, I've finished the six week Nand to Tetris (Part I) course on Coursera.

The premise is to design and build a von Neumann computer from first principles. This has been one of the most interesting courses I've taken so far as it demystifies to a large extent the fundamental workings of a computer and the hardware-software gap. Since it is not an electrical engineering course (and besides, working with hardware would be pretty impractical for Coursera), everything is done in a Java hardware simulator that the instructors wrote.

The only primitives given in Part I are Nand and Data Flip-Flop; pretty much everything else in the "HACK" computer is built from scratch. Part I ends with writing a cross assembler for the HACK machine language in a language of one's choice (no bootstrapping required). Since the machine language is fairly simple, I wrote the assembler in a few hours in Python. It would be a good learning experience to try and rewrite it in Haskell -- maybe using Parsec.

Presumably, Part II is all about building Tetris for the HACK computer, which should be an interesting challenge.

Sunday, February 15, 2015

February Update

I've been fairly busy with work so I haven't been able to follow the schedule for the Sedgewick Algorithms Part 1 as much as I'd like. I've been surprised by how much I've enjoyed it so far -- this week's assignment involves implementing a deque, which I've always taken for granted in Python (in collections.deque). I initially tried with a dynamically resizing array, but the assignment's requirement for constant-time deque operations looks like a job for doubly linked lists.

Something that I read about in Niemeyer and Knudsen's Learning Java before--but didn't understand at the time--is the significance of generics for parametric polymorphism. I assume this is something that one would learn in an introductory programming language theory class, but in my case I've had to arrive at it by being forced to think a great deal about types in learning Haskell.

(Speaking of types -- I attended the Compose Conference in NYC recently and was quite struck by Don Syme's talk on F#. Another language to add to the learning list...)

In other news, while working on a data visualization project (with the new p5.js, which originates from processing.js), I struggled for a day or two with JavaScript before arriving at prototypes and bind()... something that I'll need to revisit more fundamentally in the near future.

Monday, January 26, 2015

Sedgewick Algorithms Course

Working through the first twenty or so lessons from Zed Shaw's Learn C the Hard Way has been interesting but slow going -- a completely open-ended autodidactic mode is pretty difficult to maintain.

To review some algorithms, I've joined Sedgewick's recurring Algorithms Part 1 on Coursera. I gather that this class is more focused on implementation details and the specifics of programming in Java than Tim Roughgarden's excellent algorithms classes on Coursera, which uses CLRS and tends to focus more on reasoning about correctness and complexity. The contrast should be quite interesting. I haven't written in Java since working with Processing, so this should serve as a good language refresher as well.

Monday, January 19, 2015

The "Unfaithful" Sieve of Eratosthenes

I recently came across this bit of Python code on Stack Overflow, in a post that asked for an elucidation of the lambda predicates involved:

primes = range(2, 20) 
for i in range(2, 8): 
     primes = filter(lambda x: x == i or x % i, primes)

This is basically functional-esque code for something that resembles a naive implementation of the Sieve of Erastothenes. Given an input list of integers to start, the algorithm iterates through a second list of integers and uses the lambda predicates to filter the input list such that only primes remain. (The upper bound of 8 for the second list appears arbitrary; it's also a suboptimal upper bound given the input upper bound of 19, but we can ignore that here.)

As far as a naive implementation goes, it's arguably more elegant in a pure functional form like the following (in Haskell):

sieve          :: [Integer] -> [Integer]
sieve []        = []
sieve (p:xs)    = p : sieve [x | x <- xs, x `mod` p > 0]

The Haskell version can be called like so:

sieve [2..19]

As it turns out, though, this is a sieve by trial division rather than the true Sieve of Eratosthenes. Although the underlying logic is similar, the implementation of the algorithm is sufficiently different that the "unfaithful" sieve suffers from noticeably worse time complexity. A quick Google search immediately brings up literature on the subject: The Genuine Sieve of Eratosthenes.