Could Vanilla use a redditesque sorting algorithm for discussions?

Currently, Vanilla sorts by the date of the last comment on a discussion in descending order. This is the classic way forums are structured. Recently, I've been thinking of experimenting with a sorting algorithm inspired from those used on social news sites.

How the sorting algorithm works.

Here is the basics of the sorting algorithm:

sort = (timestamp(dateLastComment)  - x) * y
     + log(score)

Notes

  • x and y are constants that we play around with to make the numbers work well together. On reddit x is 1134028003 which could just be the date of their first post ever. On reddit y is 1 / 45000.
  • The score comes from reactions. It is on a logarithmic scale so that an ever increasing number of reactions give diminishing returns. This should also mean that scores on small forums are still in the same ballpark as the most active forums.
  • With this algorithm the sort value would always be increasing as new discussions are posted. This is great in terms of scaling because we only have to recalculate the sort on items when someone posts or reacts. Then we can store the sort value in an indexed column.

Visualizing the sorting algorithm.

With a classic discussion sorting algorithm a comment is always described as bumping the discussion to the top of the list. Imagine a post at the bottom physically bumping to the top of the list. Now picture this same thing happening whenever someone reacts to the post. Both commenting and reacting bump the post in the same way.

Now imagine that each comment or reaction doesn't quite have enough strength to bump the discussion right to the top, but rather just bumps it up by a good bit. Brand new or exceptionally well liked posts would still bump right to the top, but posts that aren't well liked would just bump up a few spots in the list.

Possible issues with the sorting algorithm.

  • At it's core the algorithm isn't going to be that different from the current discussion sort. This could actually really be a good thing as too drastic of a change could backfire.
  • Can this be a successful feature without promoting positive reactions more? Social news sites put their up votes right on the discussion list for example.
  • Adding a new column to a big table such as GDN_Discussion is always a problem for huge forums.

Variations

Here are some variations that might be worth playing around with.

Use dateInserted instead of dateLastComment.

sort = (timestamp(dateInserted)  - x) * y
     + countComments * z
     + log(score)

In this version of the algorithm we start with the date the discussion was posted and then bump it by a fixed amount with each comment. The score still helps out too. Commenting still bumps posts, but not in such an absolute way. This helps prevent old discussions from being resurrected to the front page, but maybe too much. Fiddling around with the numbers a bit could help though.

This algorithm might also bridge the gap between the community and social news sites. Social news sites just use dateInserted and score. We know forums need some sort of bumping from comments and this algorithm might give us just the right amount of bump.

Using the sort to optimize announcements.

sort = (timestamp(dateLastComment)  - x) * y
     + log(score)
     + (announce == 1) * a1
     + (announce == 2) * (a1 + a2)

In this version we make announced discussions always have a larger sort value than discussions that aren't announced. This lets announced discussions naturally float to the top. You'll also notice that discussions announced within a category (announce == 2) are placed above globally announced discussions. This allows us to select all announced discussions within a category, but only globally announced discussions when viewing recent discussions. Pretty cool huh?

Why bother?

Why bother changing the sort algorithm anyway? That's something that I hope we can discuss here. To me, doing a slight tweak to an old standby is exciting and could really help engagement in communities. At the same time I don't want to completely throw out a tried and true algorithm that is often underestimated for its sheer utility.

Comments

  • To veer in a tangential direction: I do frequently hear requests like "I want X category to order like a blog" or "I want Y category to be ordered by score". These sort of cases allow for CMS-style content feeds & contests. What do we think about per-category ordering?

    To the "Variations" section: The first thing that jumped out at me is that something like + log(score + countComments) might be a more appropriate scale, maybe even countComments*2 in there. Commenting is implicit voting in my mind. Worth playing with, anyway.

  • With regards to per-category sorting. That is possible, but a tangent to this idea.

    Putting comments inside the log is also possible and may be worth experimenting with. The reason why I didn't do that is that it's one step further along the drastic change road. Consider this:

    • Right now commenting has absolute power to bump a post. Any comment bumps a post as far as it can be bumped.
    • My variation makes comments not bump a post so much, but still a good deal. For example, in my version a comment would bump a post by 20 minutes rather than how ever much it takes to move to comment to the top.
    • Your variation makes comments bump even less. So it is an even more drastic change.

    Still, I think your bumping algorithm bears experimenting too. What it says is that all we want are a certain optimal number of comments/reactions and then that's enough. The risk here is that a discussion that really heats up early will then use up all of its bump space right away and then sink. A tonne of positive reactions right away will kill the discussion which doesn't seem right. A slight tweak would be + log(score) + log(countComments) so that comments are isolated from the reaction effect.

  • Question -- if we made a change like this, would the old way still be usable? I think the ideal situation is making a change like this optional (or even default, but with the option to use the old way) if that's all possible.

  • Unknown
    via Email
    It would probably be possible to go back to the old way as config option or
    even a user navigation option since this is kind of a drastic change. In
    general though we don't want to add too many configuration settings to our
    application and want to be careful not to support every admin heart's
    desire.
  • I'd like to revive this conversation, by adding that Avaaz is willing to contribute towards the building of these features.

    And I'm curious as to how difficult/time consuming per-category ordering is > @Linc said:

    To veer in a tangential direction: I do frequently hear requests like "I want X category to order like a blog" or "I want Y category to be ordered by score". These sort of cases allow for CMS-style content feeds & contests. What do we think about per-category ordering?

    To the "Variations" section: The first thing that jumped out at me is that something like + log(score + countComments) might be a more appropriate scale, maybe even countComments*2 in there. Commenting is implicit voting in my mind. Worth playing with, anyway.