Notes from "Educational Data Mining: Predict the Future, Change the Future"

Friday February 15, 2013

Data Mining doesn't sound as sexy as Data Science these days, but CUNY's initiative has pulled together a fantastic series of talks focusing on whatever you call it, as applied to education. Ryan Baker opened the series earlier today with an excellent overview of the whole field. CUNY will be posting video of the talk, as well as references to papers mentioned, but there was so much good stuff that I wanted to explore and leave some links to interesting things here.

Professor Baker started with a brief history of big data, which has been used for years in physics, biology, and meteorology. Now it's popular across the web, largely in very clearly commercial applications, but it's becoming very relevant to education, largely because educational software can collect so much data as students interact with it.

The applications of EDM described were very diverse, from predicting standardized test scores to automatically detecting student (dis)engagement, "gaming" of educational systems, or emotion broadly. An amusing example is detecting "WTF behaviors," where "WTF" stands for "Without Thinking Fastidiously" - for example, if students are exploring a largely unsupervised 3D world, how do you know who's really on task and who's just carrying bananas to the toilet for kicks? (True story.)

It's an exciting time! As Professor Baker says, "The data's all there, you just have to find ways to link it." And, of course, ways to learn from it. What follows will be a collection of things that you can link and learn from right now:

There are two major societies in this field:

Other interesting things:

School District Demographics System: Probably just the tip of the things-I-didn't know-were-on-the-NCES-web-site iceberg. Neat interactive mapping and downloadable data sets by school districts.

Zombie Division: A game, apparently popular in the UK, of the ubiquitous "first-person destroy-numbered-skeletons-with-weapons-corresponding-to-their-divisors" genre.

I keep hearing more about Reasoning Mind, an online math ed thing. I haven't fallen in love with it yet, but who knows.

PSLC DataShop: "The world's largest repository of learning interaction data." Sort of like a cross between the UCI Machine Learning Repository and ICPSR, intersect education. It's at LearnLab, the Pittsburgh Science of Learning Center.

Professor Baker revealed that he's planning a Coursera (MOOC) course in big data and education for the fall 2013 semester, which should be announced in August. I'm looking forward to it!

"Signals" at Purdue: A really neat project that identifies college students at risk of not succeeding, as early as the second week of classes, and automatically recommends interventions as simple as an email lightly customized by a professor. Really shiny web site, too!

It would be nice if projects like Signals published more about their methods. Apparently universities in Europe, particularly Spain, are less competitive and so publish more about the programs they develop to help their students. Of course, it'd be nice if Knewton would publish too. Pearson does, for goodness sake.

Largely, it seems like the best ed tech follows the "all dashboards should be feeds" advice, not just showing data but identifying the actionable bits and even making recommendations for actions to take. Assistments, for example, will send an email summary of online homework completion, letting a teacher know which question students had the most difficulty with, and other highlights.

Speaking of Assistments, apparently Science Assistments has spun off and switched to a much less memorable name.

And finally, I found interesting an off-the-cuff top three of the kinds of things that Educational Data Mining studies. Here it is as I understood it:

  1. Knowledge modeling: "What do students know?"
  2. Knowledge structure modeling: "What's the deal with these things we know?"
  3. Emotion/engagement modeling: "How are students feeling?"
I'm pretty interested in number two, but they're all good. Looking forward to more talks at CUNY!

This post was originally hosted elsewhere.