Monday, November 25, 2013

Thanksgiving with D3

I've been studying bits and pieces of D3.js for the past few weeks.  Although I've struggled with data binding in the data-driven nature of the framework, the library comes with a large number of interesting features like support for Voronoi diagrams and dozens of geographical projections.

The creator of the library, Mike Bostock, has made a mind-boggling number of examples available on, and there's also a text by Scott Murray that O'Reilly makes available for free online.

Since it's almost Thanksgiving, I thought I'd try making some choropleths about turkey production and consumption in the US.  As it turns out, this is fairly straightforward using D3: download some free SVGs of blank maps from Wikimedia Commons; find the relevant US Census and USDA data to create CSV files; and link up the two with a few lines of code.

  • Top = turkey production by state in 2010[1].  Minnesota and North Carolina produce by far the most turkeys, with a combined total of over 2 billion pounds.
  • Bottom = per capita meat consumption by county in 2011[2].  The relative uniformity of the data isn't surprising, but I wouldn't have guessed the pockets of above-average consumption in the Northeast and California (perhaps a positive correlation with income?).  

[2] USDA, Food Environment Atlas Data, 2011.  This was the closest proxy I could find to turkey consumption.  No data available for Alaska and Hawaii.

Sunday, November 24, 2013

UI Choices: Coursera vs Udacity

Two roads diverged in a yellow wood
And sorry I could not travel both... 

Now that I've spent a considerable amount of time on both Coursera and Udacity, the difference in UI design really stands out to me.

For an imperfect analogy: Coursera is like Windows, whereas Udacity is like OS X.

On Coursera, a course website is comprised of many sections, full of information like schedules and logistics.  Course videos can be viewed online, but also downloaded easily, often with accompanying written and typed notes provided in a convenient format.  The Coursera experience feels like a natural extension of a brick-and-mortar classroom experience; indeed, some universities offer rudimentary sites of the same nature.

Udacity, on the other hand, provides a single, streamlined interface, integrating course videos and quizzes.  Lecture notes are provided in the form of a Wiki, auto-generated from subtitles and improved by the community at large.  Udacity provides some simple FAQs, but really there's little to understand in terms of course logistics.  You simply complete what you want, or can, at your own pace; there's not much emphasis on grades.

Perhaps the largest difference I've noticed so far is the assignment submission process.  On Udacity, everything I've done is via the rudimentary Python IDE provided inline: complete the code and press "Test Run" or "Submit" for evaluation.  This is very simple and easy to use.

Simplicity, of course, can also be restrictive.  On Coursera, an Algorithms assignment might just expect some numerical values, so the student can write in any language and submit results via the web GUI.  For classes that require additional software, like Octave/MatLab for Machine Learning, scripts are provided so that assignments may be submitted directly from the software environment in question (e.g. just call "submit" in Octave).

Although I have a slight preference for the Coursera format (mainly for the ability to review material easily), maybe it's not really about what's better or worse.  What we see is the development of a nascent ecosystem, in which initial design decisions can reflect--and maybe also reinforce--something like a genomic difference between types of MOOCs.  It's sort of surprising to think about how design, which is ostensibly cosmetic, reflects a deeper identity. But I guess this a question of the genre "turtles all the way down"...

Sunday, November 17, 2013

Udacity Enrollment

Udacity has announced that starting in January 2014, some courses will be offered with the option of proper "enrollment".  This is essentially an option to purchase personalized feedback and guidance for a monthly fee.  Enrolled students will also receive a "verified certificate" upon course completion.

It's fantastic that the model remains a freemium one (i.e. all the same course materials will still be available for free).  As long as Udacity remains committed to providing free courseware, it's in the interest of everyone using the platform to root for their success.

What sort of assumptions can we make about revenue potential?

Wikipedia reports 700,000 users on Udacity.  Let's say that 50% of them are periodically active -- maybe they browse one or two courses a year.  Of the 350,000 active users, let's say 1% are willing to pay for enrollment in any given month.

So that makes 3,500 students.  The current enrollment cost is $150 for most classes (ignoring the early adopters' discount), so that translates into revenues of roughly $500,000 a month, or $6 million a year.

Does that seem reasonable?

It's pretty hard to say without the ability to peek at some user data.  The one thing that strikes me is that 3,500 students is a pretty large number to support for personalized coaching.  The monthly fee model is brilliant, since some users will sign up without taking full advantage of the services offered.  Coupled with user demand that's naturally distributed over time, this means that coaching capacity can be "overbooked" to a certain extent.  Still, it seems like a challenge to find several dozen (if not more) coaches while maintaining high standards of education.

I'm curious to see what happens, and I hope it works!  I may try enrolling in a course for the experience.  It would be interesting to have discussion sections, too, on something like Google Hangouts.

Edit: I see that Peter Norvig's Design of Computer Programs did something like the above -- office hours on Google Hangouts for week 5 of the course.  Very cool!

Saturday, November 16, 2013

Retention Rate

I wonder about the retention rate of students for fixed-term classes, i.e. the breakdown of number of students who sign-up, who actually start the course, and who complete the course.  The behavior of students in this matter obviously impacts the sustainability and revenue model of MOOC offerings (though I hope they can remain free to most students!).

The one class I haven't able to complete so far has been Martin Odersky's Functional Programming Principles in Scala on Coursera.  There were a few reasons, like the fact that I started a few weeks late, but it ultimately boils down to the fact that functional programming isn't at the top of my current priority queue.

Part of the beauty of free courses, of course, is that the only cost of walking away is the opportunity cost.  Free to come and go -- Udacity seems to embrace this principle the most.

Visualizing Algorithms

Bret Victor suggests in his fascinating talk that the current programming paradigm involves "playing computer", insofar as we must envision how a computer will interpret code as we build up more and more complex ideas.  It's clear that most human minds cannot immediately internalize complex ideas -- this is why teaching is about breaking down subjects into digestible modules, often with visual aids.

Although the materials in a class like Tim Roughgarden's Algorithms: Design and Analysis are extremely well presented, I can't help but think that for topics like graph search and dynamic programming, visualizations could be an additional, effective tool for helping students learn.

I tried my hand at creating one such visualization -- the algorithm is for finding the longest palindromic substrings within a given string, e.g. "racecar" within "red hot racecar".  Here's the link.


I've completed Tim Roughgarden's Algorithms: Design and Analysis Part I and Part II, as well as Dan Boneh's Cryptography Part I.

As to be expected from Stanford, the level of instruction is very high.  A few thoughts on the courses:

  • It's clear that the professors are personally interested in teaching and reaching a wide audience.   Despite the non-trivial material covered, the courses were quite lucid -- more so than many of my undergraduate classes.
  • For me, the difficulty level was just right.  Some of the programming assignment were challenging, requiring 15 - 20 hours.  But they were doable and satisfying.   So I imagine that anyone who's studied some computer science, with sufficient motivation, would be able to complete the courses.  And I think a topic like cryptography is interesting to more knowledgeable folks, too, as it pertains to almost every aspect of our electronic lives.

As for the Coursera format:

  • The fact that courses run over a fixed period is noteworthy.  At this point, other MOOCs like edX and iversity seems to share this approach, with the notable exception of Udacity. Perhaps I'll address this further in another post, but the clear advantage of the approach is that the forums are in sync. Assuming there's a normal distribution of knowledge among participants, chances are that someone else can answer a question you have.  I found the forum discussions very helpful, especially in completing open-ended assignments like the Traveling Salesman Problem.
  • The evaluation system works well for the programming assignments, given that: (a) there is a unique answer that can be verified easily; (b) a certain degree of optimality is required to find the answer, i.e. if you can get the right answer, your solution is likely decent.  It's nice to see these substantial assignments supplemented by in-video and weekly multiple choice quizzes.  I suppose the larger question for MOOCs is how effective feedback can be provided in the humanities.
  • Breaking up classes into many small modules makes a lot of sense.  This approach seems common at all MOOCs.  It's practical in terms of keeping one's attention, finding specific parts to review, and skipping select topics when appropriate.  There's no one falling asleep in lecture here.


Over the past six months or so, I've been studying computer science in my spare time.  My main resources have been providers of free Massive Open Online Courses (MOOC), like Coursera and Udacity.  I thought it would be interesting--at the very least for myself--to track my progress and reflections.  MOOCs are quickly becoming a hot topic.  What is their place in the educational ecosystem?  And can they remain free?

I'll write a few catch up posts to cover what's happened so far, but first a few points of reference:

  • What I know as I begin
    • I took a data visualization class in college. I taught myself some Python and know enough to muddle through basic tasks in VBA, JavaScript, etc.  I worked in financial technology for four years.  
  • What I expect 
    • To survey introductory material in a wide range of topics
    • To learn enough to be able to dive deeper into selected topics
  • What I do NOT expect
    • To become an expert programmer 
    • To learn every fundamental principle 
    • [29 July 2015] A retrospective edit: first principles are the only way to go :)

Sic infit...