Sunday, June 21, 2020

What goes on in a computer in a millisecond?

In a 2001 Q&A session (around 1h30m) Knuth is asked about unsolved challenges in the field of compute science, à la Hilbert's problems. Knuth's responds by referring to a prompt he posed to the World Computer Congress in 1989, in a keynote titled "Theory and Practice, IV" (Arxiv has a version, though this version appears to have the slides as well):

"Make a thorough analysis of everything your computer does during one second of computation"

Does the prompt require adjustment for context, some thirty years later? Knuth noted the practical challenges associated with monitoring and interpreting the computer's activities, whose difficulty would appear to remain on the same order of magnitude today. On the other hand, Knuth associated 1 second with approximately 250,000 instructions. Given that CPU clock speeds have jumped from standard units in MHz to GHz (wikipedia notes clock speeds in 1989 were in the 16-35 MHz range, in line with Knuth's approximation), maybe today's equivalent of the question is:

"What goes on in a computer in a millisecond"?

In the same keynote, Knuth provides a set of guiding questions, including the correctness of the observed activities and their correspondence to theory -- that is, whether they reflect existing theory, and whether new theory would yield practical improvements.

Despite the daunting scope of the problem, the fundamental idea of mapping theory to some atomic observations of computing practice seems widely applicable and useful for students and practitioners alike.

Sunday, June 7, 2020

Compilers, Project Euler 100

It was sad to see Stanford retire Lagunita (apparently due to a decision to retire their Open edX implementation). Many of their classes appear to be slated for addition to edX, including the Compilers class -- at least according to an email sent to Lagunita participants. I seem to recall the content originally being available on Coursera. It certainly has the look, feel, and substances of the original batch of very high quality courses (like Tim Roughgarden's Algorithms courses, Andrew Ng's AI course, Dan Boneh's Cryptography Part I... we're still waiting on Part II after many years, apparently because he is trying to finish his book). There seems to be a great deal more of content on the MOOCs now, but also maybe some more noise.

In any case, I spent most of my study time in March speeding through the Compilers videos and finishing the first programming assignment on lexing (yacc / flex) to make sure all the mechanics work. Fortunately, there are offline graders that can be run in the class VM, so it will be possible to complete the other programming assignments at a more leisurely pace. Given that there isn't currently an active forum for unaffiliated students like me to ask questions, GitHub has been in an invaluable resource to reference other students' implementations and try to triangulate certain issues.

Speaking of using reference materials... seven years after starting Project Euler and learning about the Sieve of Eratosthenes, I've finally completed more than 100 problems. About 65 problems have been solved in Haskell over the past year (the balance at some point during prior years, in Python).


I had no qualms referencing the internet for many problems, since it's unlikely I would have independently stumbled upon many of the math theorems required to solve some problems efficiently (like the Sierpinski triangle and Lucas' Theorum behind one of the problems on Pascal's triangle). In that regard, I'm glad of folks who have ignored Proect Euler's rules of the road and posted their research online. At least for autodidactic purposes, it seems like more information is (almost?) always better. How to use and internalize the information, how to differentiate research from shortcuts, how to set boundaries of information taken vs internally synthesized -- those are all the responsibility of the student, with decisions ultimately depending on one's goals. My (recent) goals have been in rough descending order to learn functional programming paradigms, to learn Haskell, to enjoy problem solving, and last + least to learn some underlying math. As far as the first two go, continuing with more problems probably offers diminishing returns (especially compared to pursuing orthogonal problem spaces), but I'll likely continue with it on the slow path for the enjoyment. 

Sunday, February 16, 2020

Mini Regex


Hacker News linked to Brian Kenighan's post about a small regex matcher, which seemed like a good bite-sized problem to try implementing -- both in general and for practicing Haskell, particularly with this being a pattern matching problem.

Following Kernighan's specifications, the regex language is limited to:

    c    matches any literal character c
    .    matches any single character
    ^    matches the beginning of the input string
    $    matches the end of the input string
    *    matches zero or more occurrences of the previous character


To make the exercise a bit more stateful, I chose to implement find (return the left-most occurrence of a pattern that matches the regex, if it exists), instead of just indicating the presence of a match. There is minimum lookahead required -- just one character to detect boundaries for the Kleene star.

I first tried to implement find with a lazy map over the tails of the input (returning the head as the left-most match), but ran into the problem of inputs with no matches. Since I couldn't find an out-of-box function, mapUntil implements a map that stops when a non-default value is found, and otherwise returns the default value. This also has the benefit of using an empty string to indicate no match, which is more direct here than, say, using Maybe. Always wrapping and unwrapping the input with the start and end sentinels seemed like a reasonable price to avoid introducing special cases later on.


find :: Regex -> String -> Match
find r s = unwrap $ mapUntil "" (findWord r) (tails $ wrap s)
  where wrap xs = ('^':xs) ++ "$"
        unwrap xs = [c | c <- xs, c `notElem` ['^', '$']]

mapUntil :: Match -> (String -> Match -> Match) -> [String] -> Match
mapUntil d _ [] = d
mapUntil d f (x:xs) = if m /= d then m else mapUntil d f xs
  where m = f x ""


The core matching function covers four patterns: non-star regex, star regex, the regex is consumed (success), or otherwise failure.


findWord :: Regex -> String -> Match -> Match
findWord (x:'*':xs) s@(y:ys) ms = findWord xs ys' (reverse ms'++ ms)
  where (ms', ys') = if not (match x y) then ("", s) else splitWhile (matchStar x xs) s
findWord (x:xs) (y:ys) ms = if match x y then findWord xs ys (y:ms) else ""
findWord "" _ ms = reverse ms
findWord _ _ ms = ""

Match is very simple. MatchStar needs a bit of extra state, since '.*' matching should stop if the next part of the regex matches input.

match :: Char -> Char -> Bool
match x y = x == '.' || x == y

matchStar :: Char -> String -> Char -> Bool
matchStar x [] y = match x y
matchStar x (x':xs) y = (x == '.' && x' /= y) || x == y


SplitWhile seems like it should exist... it calls both takeWhile and dropWhile, so could be rewritten more efficiently to require only single pass over the input.


splitWhile :: (a -> Bool) -> [a] -> ([a], [a])
splitWhile f xs = (takeWhile f xs, dropWhile f xs)


There are a few reverses and concatenations, which weren't immediately obvious to me to refactor into more efficient patterns. Code with some hand-written test cases: https://gist.github.com/tkuriyama/f60210a0e6dfa5bc0872494afaa0030b

Monday, October 7, 2019

Learn You a Haskell, Again

After a few failed attempts in past years, I feel like I've finally collected enough momentum and functional thinking to overcome the initial hurdle in learning Haskell.

Brent Yorgey's 2013 UPenn class was recommended by a few sources, so I worked through that over the past two or three months. The class pulls together material from Learn You a Haskell (LYAH), Real World Haskell (RWH) and lectures notes into 12 weeks of materials. There are 11 weekly assignments that culminate in Functors, Applicatives, and Monads. At this point, I think I need significantly more practice in the last topics... Overall, the reinforcement provided by two independent texts in addition to the course materials was quite helpful in triangulating new topics, both conceptually and through worked examples.

Some good next steps seem to be continue working through some later chapters in LYAH and RWH (with a particular interest in monadic parsing), as well as the next course suggested by https://github.com/bitemyapp/learnhaskell.



Sunday, August 26, 2018

Cryptopals Set 6 -- Bleichenbacher


After another hiatus, I finally managed to wrap up Set 6 with Bleichenbacher's PKCS 1.5 padding oracle attack (original paper from '98 http://archiv.infsec.ethz.ch/education/fs08/secsem/bleichenbacher98.pdf).

https://github.com/tkuriyama/cryptopals/tree/master/set6

All of the padding-based attacks are quite interesting. From first principles, I guess it is difficult to design padding that is cryptographically sound, since by definition padding needs to be deterministic and reasonably easy to parse.

One thing that escapes me is the use of ceiling division for a number of equations in Bleichenbacher's paper -- it doesn't seem to be specified, yet all the implementations I referenced seemed to apply it consistently.

I will probably take a break from the Cryptopals for now (i.e. defer Sets 7 and 8) and tackle some other projects. I feel reasonably comfortable with F# now, but still want to learn Haskell, and meanwhile Python is fully transitioning to Python 3 and Guido has retired as BDFL...

The MOOC landscape seems to have advanced in the past year or two as well, so it will be interesting to take up a few more courses again.

Tuesday, March 20, 2018

Cryptopals Set 6 -- F#


I've worked through Set 6 up to the last two problems involving Bleichenbacher's PKCS 1.5 Padding Oracle attack. With a new job just started, it will likely take a while longer to complete those, so this will be a partial post in the meantime.

Completing the first six problems of the set required me to revisit two shortcuts that I took thus far: a reasonably fast (probabilistic) prime gen algorithm, and a cube root (or n-root) algorithm for BigIntegers. In both cases the implementations are borrowed, but it was a good exercise to think through them.

As a slight non sequitur: working with BigIntegers in F# has given me a deep appreciation for just how convenient it is to work with integers in a language like Python. Thanks to PEP 237, integers are treated for the most part uniformly, so the user never needs to think about size. But of course, big integers that exceed the CPU word size do require implementation. Which is obvious from first principles, but not something that occurred to me in a meaningful way until I had to write a re-implement a number of functions to work for BigIntegers. Convenience and learning are uneasy friends.

Overall in the first six problems, the exercise of forging a 1024-bit RSA signature (when e=3) was both the most interesting and the most challenging.

https://github.com/tkuriyama/cryptopals/tree/master/set6

Thursday, November 23, 2017

Cryptopals Set 5 - F#


Set 5 was advertised as difficult, but I found the implementations to be quicker that previous sets as they involve more number theory and less programming.

https://github.com/tkuriyama/cryptopals/tree/master/set5

I took a few implementation shortcuts (each of them a good exercise to do properly, but I'd rather forge ahead and get through the cryptographic concepts at this point):
  • Client-server interactions are simulated (still haven't sorted out F# package manager yet)
  • Prime numbers are drawn from known list rather than generated 
  • Cube root solving is ignored (p40 just tests the result against the cube of the original message, rather than testing the cube root of the result against the original message)
I spent some time trying to implement Newton's method of approximating roots, which worked fine for square roots but in my implementation almost always did not converge for cube roots (once a guess became negative, it stayed negative... how to constrain to positive numbers while improving the guess in each iteration?). Something else to investigate in the future.