Tuesday, July 28, 2015

DNA Sequencing

A quick post before July expires...

I've been following Algorithms for DNA Sequencing on Coursera this month. As one might imagine, it's mainly about algorithms related to string / sequence matching and alignment. The "sequence assembly" problem is something I haven't come across before and is quite intriguing -- especially how it relates to the history of the human genome project.

I'm a little surprised by the course because I realize, in comparison, how good almost every other Coursera course I've taken has been. This course is about what you'd expect in a resource that's available for free: the instructor is clearly knowledgeable and interested in teaching, but the details are often vague and the explanation of algorithms are not very rigorous. It's still excellent as freeware, but I'm not sure I'd be happy if I'd paid Coursera to take the course as a "Signature Track" student.

As an aside, I found myself pulling my hair out due to a Python performance issue and a little bit of line profiling magic in iPython saved the day.

http://pynash.org/2013/03/06/timing-and-profiling.html

It turns out that calling in dict.keys() to check for key presence in a dictionary is silly as it creates a temporary list; in dict is obviously the way to go as it does what you would expect -- the hash lookup.