Wrap-up from DC Hack and Tell #4

I’ve been putting together these wrap-ups from Hack and Tell in DC for a while now. They go out to the meetup list and they’re archived on github, but I like them so much I thought I’d put them here too. Working through the back-catalog:

DC Hack and Tell

Round 4: The Christmas Invasion

Time to wrap up the Christmas Invasion and put a bow on it… Here are all the good things we saw, in non-random order!

  • Aaron talked about lots of graphs made from NYC test scores.
  • Rick showed a really neat Medicare visualization that he made, which started as a National Day of Civic Hacking project. (Cool!)
  • Julian demoed the next big programming language, MyCoolLang aka Lebowski, rich with Python and LLVM goodness.
  • Chris fought the good fight against lighswitches, automating his home via his lightbulbs’ port 80 (duh).
  • In addition to inventing languages, Julian also improves existing ones – he showed how he became a Python core dev and improved performance (timing).
  • “So your friendly neighborhood bikeshare station is out of bikes again. What are the odds?” CHRIS WILL SHOW YOU THE ODDS.
  • And Joseph showed some of the magic of saltvagrant, and of course salty vagrant.

Happy solstice, everybody! See you on January 13, 2014!

Wrap-up from DC Hack and Tell #3

I’ve been putting together these wrap-ups from Hack and Tell in DC for a while now. They go out to the meetup list and they’re archived on github, but I like them so much I thought I’d put them here too. Working through the back-catalog:

DC Hack and Tell

Round 3: Hack… to the Future!

And now, a wrap-up… in random order!

  • Mike showed the excellent audioverb for all your language in situ needs – and it even has a youtube explanation too!
  • Aaron talked about rjstat, his R package for reading and writing the JSON-stat data format.
  • Fearless leader Jonathan shared a classic Hack and Tell hack for decoding cryptograms using simple language models and SIMULATED ANNEALING! (I know, right?) It’s called cryptosolver. We miss you, Jonathan!
  • Bayan showed how to simulate fantasy football drafts/seasons in R to test theories and impress your friends! With a Prezi!
  • Tom presented not just the JS live-coding mistakes.io but also super fun interactive statistics and simple-statistics!
  • Aaron also showed this Guess the Letter thing. Oh my gosh there’s a blog post.

And there will be even more good stuff coming soon… to the future!

Unnatural Causes

Unnatural Causes is “a seven-part documentary series exploring racial & socioeconomic inequalities in health” from 2008. In a horrible irony, the episodes are not available to the public. The cheapest way to see it is to pay $24.95 to FORA.tv for streaming. There doesn’t seem to be an option for buying the DVD from the main site unless you are an organization. I believe this is a mistake. The apparent goals of the producers would be better served by making the complete materials publicly available at no cost. You can watch some clips on their YouTube channel, which is good, but why not release everything, together with information on actions to take or links to further information? I don’t even remember where I heard about the series, and it wasn’t particularly easy to track down viewing options. The audience would be so much bigger if energy was devoted to spreading the videos rather than locking them up.

As I have now been lucky enough to see the complete series, here is a brief summary of the episodes:

1. “In sickness and in wealth”: The Whitehall Study is introduced. The Whitehall Study, which is frequently referenced throughout the series, found that health is associated with wealth, not just in a binary poor-vs.rich way, but in gradations all along the levels of wealth. The importance of a sense of control and a corresponding stress of social subordination are pointed to as people at varying levels of health and wealth are introduced in an American city. Also apparently there was some experiment that gave everybody colds by putting virus right into their noses – is that seriously an experimental technique that people use?

2. “When the bough breaks”: The stress of institutional and persistent racism is identified as a determinant of health. The example of low birth weights for babies born to black mothers is given. Also I noticed that the series is dedicated to the memory of Judy Crichton.

3. “Becoming American”: It is noted that Latino immigrants to America are initially healthier than other Americans, and tight families are given as a potential explanation. Also the Pennsylvania town that hosts the examples has some community center, and a youth center, which seem nice. Then it’s brought to light that immigrant health is much worse after five years, and also there’s some mention of mental illness.

4. “Bad sugar”: A community of Native Americans is the example of the episode, relevant because of very high levels of diabetes. The stress of being displaced by US forces, not dealt fairly with and essentially forced to eat a radically different and inferior diet, as well as the attendant problems of poverty, all contribute.

5. “Place matters”: Biggest takeaway was learning about the original redlining, which gave good home loans almost exclusively to white people from around 1934 to 1962. Grrr. The episode then talks about how bad neighborhoods are stressful; violence, mold, asthma, all suck. Everything is health policy. There’s also a pointing to the failure of private developers to provide what is really needed for people.

6. “Collateral damage”: This episode centers on the Marshall Islands, where US military involvement no longer sends showers of nuclear fall-out, but a base still dominates the economy to ill effect. Overcrowding on the adjacent island, which is essentially a slum compared to the island of the US base, leads to tuberculosis and other ailments. The people of the Marshall Islands can leave their homes and move the US (Arkansas is a popular destination, it seems) but health problems can continue there.

7. “Not just a paycheck”: Electrolux is a Swedish company that moved one factory from Michigan to Mexico and another from Sweden to Hungary. In Michigan this ruined a lot of lives, while in Sweden it was a comparatively small problem. Americans are less well protected by their government and their unions than the Swedes are by theirs, and the Americans have worse health outcomes. The American setting also illustrates increasing inequality as a family laid off from the factory lives on an old family farm that is increasingly surrounded by huge second homes of the rich.

This post was made possible through the generous support of the B. R. Schumacher Foundation.

Here Comes Everybody

Harlan mentioned this book so I read it.


It came out back in 2008 and was a lot more timely then, I imagine.

There are lots of interesting tidbits in here. It’s largely anecdote-based, and it uses the word “suasiontwice. Here are some quotes:

… large social systems cannot be understood as a simple aggregation of the behaviors of some nonexistent “average” user.

… it’s easier to like people who are odd in the same ways you are odd, but it’s harder to find them.

… trying something is often cheaper than making a formal decision about whether to try it.

… the question “Do the people who like it take care of each other?” turns out to be a better predictor of success than “What’s the business model?”

Shirky also brings up the Bill Joy quote, “No matter who you are, most of the smart people work for someone else.” This made me wonder whether Google agrees, these days.

I like reciprocal altruism a lot: “With reciprocal altruism, favors are exchanged without formal bookkeeping …” (emphasis mine). This is my preferred way of doing things. The problem seems to be the number of people and anonymity online, and so there are systems with formal bookkeeping like eBay’s buyer/seller rating system, or points on StackOverflow. Is this the direction that everything is moving in? If we end up with zero privacy/anonymity online, will that solve the problem of freeloaders and other bad behavior?

Things I hadn’t previously heard of: asmallworld (gross), Dodgeball (people are still doing this stuff). Also Richard Gabriel‘s Worse Is Better talk (increasingly it seems LISP people have all the ideas).

Maybe the most interesting bit from the book was this forward-looking claim:

So here’s a hypothesis about the near future, based on little more than a hunch and some tantalizing examples: we’re about to experience a revolution in collective action, and the driver of that revolution will be new legal structures that will support productive collective action.

I don’t know if that has happened, or if it is happening. Shirky pointed out that intellectual property was the main collective product at the time of his writing – things like Linux and Wikipedia, where licenses like the GPL protect the product. The only things I think of that are beyond software and writing are products that get kickstarted, for example, and I don’t know if that counts. Restricting to financial structures seems unfortunate. But crowd-funding and anonymous currencies like BitCoin might be the closest thing to steps in this direction, as far as I can see. Meetup was in the book, and doesn’t have any special legal structures for organizations as far as I know. What else am I missing?

Quizz Quotes

I was exploring Google Papers the other day and came across Quizz: Targeted Crowdsourcing with a Billion (Potential) Users by Ipeirotis and Gabrilovich. Downside: occasionally reads like a Google ad. Upside: really interesting results from an experimental Q&A system which is still live. It’s very cool. Here are some quotes with my commentary:

… the strong self-selection of high-quality users to continue contributing, while low-quality users self-select to drop out.

… there is little incentive for unpaid users to continue participating when there is no monetary reward and they are not good at the task.

The goal of the system was not educational, so they celebrate the fact that it isn’t fun if you suck.

These results indicate that users may be more interested in learning about the topic rather than just knowing whether they answered correctly.

The results included that people answer more questions when the interface shows the correct answer as “feedback” rather than just showing “correct” or “incorrect.” This section of experimental results was particularly interesting, including commentary on possible failures of leaderboards.

… as more and more users participate, the achievements of the top users are difficult to match, effectively discouraging users from trying harder.

They did say that a leaderboard including only the last week’s worth of results was more effective.

I’m less interested in the application of this kind of system for crowd-sourcing information, more interested in educational applications, but there is some clear overlap, and cited papers such as The multidimensional wisdom of crowds seem very interesting. Also through Ipeirotis’ blog I found out about Smarterer, which is interesting as well. There’s some sort of spectrum, or multi-dimensional thing going on, with education, crowdsourcing, and evaluation all in the mix.

The authors’ application of information gain and a Markov Decision Process are also interesting.

Writing to think: Questions on the web

I have made some things online that involve “asking and answering questions” in the traditional multiple-choice-test way. I built the software to do that (with Python on Google App Engine, again differently with node.js on Heroku) both times.

Is there any “built in” web element for questions and answers of the types I’m thinking of? There are HTML forms. HTML forms provide pretty much flexibility, and even start to have some functionality for different question structures – radio buttons for a single choice vs. checkboxes for multiple selections. But HTML forms, being just HTML, have pretty clear limits. Javascript can add some more functionality, and then eventually you need a web server backend of some kind to support more.

There are web services like Google Forms and SurveyMonkey, and the very task-specific Doodle, which take all of HTML/Javascript/backend and run it all for you. This means that the available functionality is whatever they provide, everything is hosted by them, and as far as I know there is little or no mechanism for creating things outside of their web GUIs.

The popular services just mentioned mostly collect information without any feedback; when you want to have a “correct” answer there isn’t much functionality. Where is a good existing solution? There’s internet detritus like MakeaQuiz.net. There’s Quizlet, which seems pretty neat but also isolated perhaps by its attempt to chase education spending. (It also supports, like most education sites, an unhealthy distinction between student and teacher.)

The desire for profit seems to poison projects that could otherwise have a broader positive effect. Projects affiliated with the very cool JiTT methodology disappeared into companies. I’m not even sure what sort of thinking led to the closing of the Khan Academy source.

But it isn’t just the profit motive that keeps question-and-answer technology balkanized; there’s no real standard, and I don’t think it’s very easy to come up with one. The systems I built aren’t easily transferred anywhere for use by others, for example. This is my fault, but I also don’t think it’s a very easy thing to design.

There are some attempts at standards for questions, at least. BlackBoard has a way to load questions from some tab-delimited formats. Moodle has something called GIFT. There’s the Question and Test Interoperability spec, which is such a huge mess you need to employ a stapler guy to support it. And there’s something called QUOX. Oh my.

And these are all purely for assessment, where earlier there were some purely for survey/data collection. It seems to me that they shouldn’t be so different. Fundamentally isn’t it all just questions?

Another take on this, I suppose, is sites like Stack Overflow, which represent a different sort of questioning. And there is OSQA, “the Open Source Q&A system”, which is cool. You could run that on your server, or for that matter run Moodle, or some survey platform, most likely. So that’s also another delivery model: the run-your-own-server-with-pre-built-software model. A lot of setup/maintenance overhead, and still not a lot of interoperability as far as I can tell. (OSQA is also available hosted.)

Just one more: There are also frameworks for building assessments, which try to generalize while still providing some structure. I was happy to find out about the one linked, for Rails; I don’t know if there are others or if any are widely used.

Markdown is pretty much the best thing ever. (Note to self: get off wordpress…) Can we come up with a markdown solution to the question problem? Something super light-weight, that blends easily into text files that humans would actually write…

The kramdown (etc.) markdown extension for definition lists seems like a candidate. Here’s how it works:

This is the "term".
: This is the "definition".

Get’s rendered something like this, using the standard HTML definition list tags:

This is the “term”.
This is the “definition”.

So let’s say the term is the question, and the (possibly many) definitions are answer choices. Of course we could have a blank definition represent a text box (or text area):

What do you think?

A multiple-choice survey could be as easy as this then:

What's your favorite color?
: red
: blue
: green

To add correctness functionality, a little more syntax could be added:

Sugar is sweet.
: true*
: false

The idea here is that these text files would be rendered into interactive HTML/Javascript such that you wouldn’t see which was the correct answer – you would select an answer, possibly have a submit button of some kind, and get feedback on whether your answer agreed with the one in the text. I do think that teacherly paranoia about “test security” is one thing that prevents good functionality from spreading much on the web. Nobody wants to share their oh-so-secret correct answers, lest the horrible children cheat. I think this perspective is a disease on society.

Maybe this could be a short answer question:

What is the capital city of Wisconsin?

Of course you have the problems of evaluating text answers (Is “Madison, WI” also correct? etc.). Generally, there is of course an awful lot of functionality that you want from questions, and it may be hard to reduce it all down. Some things should be obvious: true and false is a special case of multiple choice. But other things like scoring, when/whether to show the correct answer, etc. seem difficult to abstract very far.

The text questions could be rendered as stand-alone HTML/Javascript, or to connect with (or even be hosted on) some sort of web system. More details would have to be worked out.

The illustrious Ramnath, who always seems to be doing cool things several years before I know about them, has thought about this markdown question idea to some degree. I want to find out more about what he’s done.

Data done wrong: The only-most-recent data model

It’s not very uncommon to encounter a database that only stores the most recent state of things. For example, say the database has one row per Danaus plexippus individual. The database could have a column called stage which would tell you if an individual is currently a caterpillar or a butterfly, for instance.

This kind of design might seem fine for some application, but you have no way of seeing what happened in the past. When did that individual become a butterfly? (Conflate, for the moment, the time of the change in the real world and the time the change is made in the database – and say that the change is instantaneous.) Disturbingly often, you find after running a timeless database for some time that you actually do need to know about how the database changed over time – but you haven’t got that information.

There are at least two approaches to this problem. One is to store transactional data. In the plexippus example this could mean storing one row per life event per individual, with a date-time of database execution. The current known state of each individual can still be extracted (or maintained as a separate table). Another approach is to use a database that tracks all changes; the idea is something like version control for databases, and one implementation with a philosophy like this is datomic.

With a record of transactional data or a database that stores all transactions, you can query back in time: what was the state of the database at such-and-such time in the past? This is much better than our original setup. We don’t forget what happened in the past, and we can reproduce our work later even if the data is added to or changed. Of course this requires that the historical records not be themselves modified – the transaction logs must be immutable.

This is where simple transactional designs on traditional databases fail. If someone erroneously enters on April 4th that an individual became a butterfly on April 3rd, when really the transformation occurred on April 2nd, and this mistake is only realized on April 5th, there has to be a way of adding another transaction to indicate the update – not altering the record entered on April 4th. This can quickly become confusing – it can be a little mind-bending to think about data about dates which changes over time. The update problem is a real headache. I would like to find a good solution to this.