SoME1 results

Oct 23, 2021

The event

In the last post, I described a contest James Schloss and I organized meant to encourage more people to make online math explainers. This could include videos, blog posts, games, or whatever else people might dream up. We called it the Summer of Math Exposition, or SoME1.

The heart of the event was really a discord community, where people could trade notes, share their work, and otherwise engage with like-minded people interested in explaining some math concept online. The framing of having winners was primarily to align everyone around a common goal, with a common deadline, and a common set of principles for what constitutes a "good" math explainer.

We felt a bit of a tug and pull between the goal of making the contest easy to judge, and the goal of having people produce lessons that would be genuinely helpful to people. For example, shorter pieces are easiest to judge at a higher throughput, while quality lessons sometimes (though not always) take meaningful time. Using something like number of up-votes or likes would make for an easy first-pass filter, but runs the risk of punishing more niche content with very deliberate and specific target audience, or which has not yet built up any kind of audience.

We ended up with an event that was actually quite challenging to judge, but I like to think that this reflects a tendency to be relatively uncompromising on the main goal of encouraging people to make meaningful lessons. I can almost guarantee that anyone reading this will be able to find a lesson produces as part of this even which completely and utterly fascinates them.

The peer review process

We received over 1,200 submissions. While we set no explicit constraints on the length of entries, we did say that someone judging it should be able to assess its quality within 10 minutes. Realistically, I knew that for any I was going to consider as a final contender, I would probably want to spend more time on it, and to write some personalized feedback, so looking at all 1,200 was unfortunately untenable.

For a first pass, we had a peer review system. Naturally this raises all sorts of worrying questions. How do you know the system is fair? What if there are bad actors trying to manipulate it? What if people's judgements are not based on the principles we want to emphasize, but are based on more superficial properties like visual pizzazz?

One of the participants in the event actually wrote up a post about this system, which is incredibly on-brand for a summer of math exposition, so I needn't repeat too much here. The system we used is called Gavel, which is based on the Bradley-Terry algorithm for assigning rankings based on pairwise selections. You can read more about its design here.

We told participants that to be considered for the final judgement, they would need to contribute 1 hour to the peer review process. The experience is to look at two pieces, let's call them A and B, then to simply decide on which is better, according to the four principles laid out in the original announcement of the event. Then you're fed a third entry, C, and told to give a comparison of B and C, and so on. Behind the scenes there's an algorithm trying to be smart about which pieces it feeds you, building a model not just for the ranking of entries, but also for the trustworthiness of judges.

During our process, there were around 13,000 total comparisons made in the system, and after about the first 7,000, the entries listed at the top 100 remained relatively stable. This gives some assurance that the rankings generated by this system were at least not random. And having looked at those 100 and more, I feel reassured that lower production quality but highly interesting lessons still managed to bubble toward the top.

I use the phrase "top 100", but to be more precise the goal was to generated a list of 100 such that there's a very high likelihood that the top 5 were in that list. It seems entirely plausible that inefficiencies in the system could cause entries that should be ranked, say, 80-100, to instead mistakenly drop below. The failure mode we're really concerned with, though, is that one which should be ranked 5th somehow gets dropped below that top 100.

Then again, the whole task is inherently subjective, so the premise of having an objective ranking, even in principle, is flimsy.

One systematic bias that was definitely at play is that some people thought it was a contest only for videos, and would down vote non-video entries accordingly. I'm not sure where this idea came from. Maybe it's only natural that a contest primarily announced on YouTube generates a YouTube-centric following. In an effort to combat this, James and I did look through many of the non-video entries which did not bubble up to that top 100.

Final judgements

I reached out to a few friends in the community of math explanation to ask if they'd be willing to pitch in a little bit of time to looking through some of the final contenders, with the aim of making that final decision not rest too much on my own subjective opinions. One got back to me mentioning how he'd love to help, but was really too busy that week. But then he watched just one minute of one entry out of curiosity...then a second minute...then the whole thing. Then another video, and another, and before too long had gotten sucked into more than I had originally asked for.

Indeed, looking through these entries turned out to be very fun. I had a spreadsheet going, one row per entry, where in one column I simply wanted to answer "is this likely going to be one of the best?", yes or no. That is, a very high bar of quality should be set for saying yes in that column, so that for a second pass I could have just a small handful to choose between. However, as I started going through them all, that column's entries looks like "yes, yes, probably, yes, maybe, yes, ..." This was highly unhelpful for my future self, but again, indicative of how great these entries were.

None were perfect. I could easily see room for improvement in just about any I looked at (the same can be said for everything I make). For anyone thinking about putting together some kind of math explainer, I do want to emphasize that you shouldn't be worried about a need for it to be excellent in all respects. As long as it's a solid idea, with an engaging explanation, people will resonate with it.

What made this most challenging in the end was just how apples-to-oranges it felt. One entry might be great because of the topic choice alone, while another stands out for a particular visual intuition in the middle, while yet another stands out for an unexpected application of a seemingly useless bit of math. How do you compare these to each other directly? How can you possibly choose 5 "winners" when you know deep down the premise of a winner is absurd?

The winners

I just put out a video announcing winners, which describes in more detail what it is I like about each one. But honestly, each speaks for itself, so I'd highly encourage you to take a look. In no particular order, here they are:

Collectively, these represent many of the qualities which I think makes for excellent math exposition, qualities like clarity, strong motivation, compelling choices of topic matter, and non-obvious insights.

All entries

But there are many, many more beyond these five which deserve your attention as well.

We put together a YouTube playlist of all entries, and listed below is a collection of all non-YouTube entries. In both cases, after an initially random ordering I tried to do some mild curation to put some of the entries I particularly enjoyed toward the top, but otherwise don't read too much into the order of the lists. Neither should be interpreted as a ranking, or anything like that.

Youtube channels

There were some who created a youtube channel this summer and submitted their entire channel. We were not sure where to place these entries, so we will put them here:

Table of Contents