Sunday, December 22, 2013

Student Motivation in MOOCs

I've just completed Programming Languages on Udacity with Wesley Weimar.  The goal of the class was to build a bare-bones HTML and JavaScript web-browser, thereby introducing lexing, parsing, and interpreting computer languages.  The final project involved building a non-deterministic finite-state machine that accepts the same language as a subset of regular expressions (using Python Lex-Yacc).

This is the first full-length course I've finished on Udacity.  When compared to something similar on Coursera--in both difficult and quality--I estimate that the course took 50 - 75% longer to complete. The reason is quite simple: there are no deadlines on Udacity, so I procrastinated more.

I speculated previously about retention rates, and it turns out that this is a big deal for Udacity.  This article in Fast Company mentions statistics from a recent study: the completion rate of MOOC-type classes is only 7%.  As a result, the article describes, Udacity is shifting its focus to courses that are more professionally oriented, many in partnership with tech companies.  Presumably, the idea is that the clear goal of potential employment will increase motivation and create value more directly.

At this early stage in online education, it's hard to say whether such changes are for better or for worse--or even to say just what exactly "better" or "worse" means.  On the topic of completion, though, I do think there's a fundamental human trait at play: we're generally curious and eager to learn, but we're not very good at completing tasks without external motivation.  Indeed, the low MOOC completion rate is hardly surprising, given that it's hard enough to get full-time college students paying $50,000 a year to attend lectures regularly.

For traditional colleges, of course, discipline and motivation constitute an important part of the education.  As this article about historical attempts to create radio-based education puts it:
While MOOCs expose students to information, that is not the most fundamental dimension of learning. Perhaps most central to an education are the habits of mental discipline and the motivation it instills. Traditional colleges offer engaged professors who care if students attend class, answer their questions, and help them stay focused. Colleges offer spaces for a type of sociability that broadcast radio and MOOCs have yet to replicate.
I'm not sure that there is a "most fundamental dimension of learning" per se, but it will be interesting to see how MOOCs handle the question of student motivation going forward.

Wednesday, December 4, 2013

Classifying Handwritten Digits

Six weeks into Andrew Ng's Machine Learning class on Coursera, I found a Kaggle competition to classify handwritten digits that's almost identical in nature to one of the programming assignments. This seemed like a good chance for further practice in implementing neural networks.

Quick Start: accuracy ~ 93.5%

I loaded the Kaggle training data and modified code I already had for a 3-layer neural network:

  • 1 hidden layer of 100 nodes, regularization parameter lambda = 0.3
    • Accuracy of ~ 92% after 100 iterations 
In an attempt to improve the regularization parameter lambda, I plotted a number of sample values against their corresponding error rates (% misclassifications). With the caveat that I only ran 10 iterations for each value, lambda = 1 appeared to minimize the cross validation error.  Retraining the network accordingly increased accuracy to ~ 93.5%.  Further training seemed only to increase the variance.

Larger Networks: accuracy ~ 96%

  • 2 hidden layers of 200 nodes each, lambda = 0.01
    • Accuracy of ~ 94.5% after 130 iterations 
  • 1 hidden layer of 500 nodes, lambda = 0.01
    • Accuracy of ~ 96.5% after 75 iterations
Of the handful of networks that I tried, it seemed like a single hidden layer of 500 nodes learned the training examples the most efficiently.  The 75 iterations completed in about an hour and a half.  

As with the initial network, the regularization parameter lambda was chosen by minimizing the error against cross validation data.  After 75 iterations, the network had almost perfect accuracy on the training data (>99.5%), but the accuracy on cross validation and segregated test data hovered around 96%.

At this point, the network appeared sufficiently large to learn the training data -- it just wasn't generalizing well enough for new data.  I tried to lower the variance by running more iterations and including the previously segregated test data as training data, with little success.  A few last thoughts:

  • I'm not sure how to reason about the optimal size and structure of a neural network, given a dataset.  There must be a good way to approach this by taking a subset of data (for reasonable processing time) and running through a number of informed guesses.
  • I calibrated lambda using random initial parameters.  Does it help to reevaluate the regularization along the way, as the network is trained more and becomes more prone to overfitting?

Submission: accuracy ~ 96%

The standing of 170th place is, of course, nothing to write home about -- but it's gratifying after just six weeks of coursework.