Let's Plan "The Columning"

Unknown
edited April 2018 in Dev & Ops

About three years ago there was the "spacening" and the "casening". These were coding standard changes that most of our codebase and took significant planning and implementation. We got through them though and now is the time to plan our next big event: "The Columning".

What is The Columning?

Quite simply: we want to add columns to some of our main tables to support future features. This change won't really affect our codebase that much, but it will absolutely murder our database servers. This is why we've held off on adding features that require new database columns on some of our bigger tables. This is also why I want to get all the new columns we want in at once, which is why I am starting this discussion to plan things out.

New Column Ideas

There aren't too many columns I have in mind, so I want the team to really think of stuff we may need.

GDN_Discussion

  • Locale: The locale of the discussion body. This could be used to aid multilingual sites or be used for machine translation plugins.
  • Trending: A score based on the date, number of comments, and reactions. Sort of like our own Reddit sorting algorithm.
  • StatusID: I'm wondering if we should come up with a core status API so that Q&A, Ideation, Resolved, etc. can all pull from the same status pool. I'm not sure on this one.
  • IsFeatured: Whether or not the discussion has been featured. (charrondev)
  • CountFeatured: The count of featured comments.

GDN_Comment

  • Locale: Same use as GDN_Discussion.
  • ParentCommentID, ThreadCommentID: While I don't want us to necessarily switch to a fully-threaded platform, it would be nice to keep track of reply information and be able to support one level of nested threading in the future. The ParentCommentID would refer to the direct reply, while the ThreadCommentID would refer to the top comment in the reply chain for optimal one-level-threading. Displaying reply information would also allow us to deprecate quoting as a means of replying.
  • IsFeatured: Whether or not a comment is featured to allow us to filter by featured comments or jump between them more easily. My idea is that although a comment may be featured in different areas the fact that it is featured is cached on the comment for easier traversal.
  • Trending: Same use as GDN_Discussion without the comment count.

GDN_Category

  • Attributes: This is one table that is added to a lot, but is missing an attributes column. (charrondev)

GDN_User

  • Locale: The user's preferred locale.

GDN_Activity

  • Attributes: Do we want to ever switch to the standard attributes column? If so we should at least put it there.

Columns I don't think we should add

  • BodyHtml: We had discussed caching the rendered HTML for each body so that display is much quicker. I worry that this will hurt our internal network load though. I think instead we should optimize by storing the rendered HTML in the current Body column and create another backup table for the raw text. If you look at the current APIv2 the raw text is always retrieved on only a single endpoint. Old endpoints will be difficult here though.

Other Ideas

  • Do we want to switch from datetime to timestamp for date columns? Timestamps are nice because they have an implied timezone. They are required fields though.

Think about this

I am posting this so that the team can see my thought process and learn, but I am also conceding that this is a big database change so I want more eyes on this. It would really suck if go through a giant database update and then forget something fairly obvious.

Comments

  • Unknown
    edited April 2018

    Datetime vs timestamp:

    The following assumptions are made in this comment:

    • the issue of dates and times is a view-layer issue (in the browser)
    • all existing dates and times are stored in UTC format
    • we use some type of library on the client to handle date displays with I18N requirements (e.g. moment.js)
    • we do modify the view layer according to user-defined preferences for dates, e.g. what is the following date: 03/04/05 ?
    • we are not concerned about database column space/size for this change

    I've found the easiest date format to work with is the ISO-8601 format that is easily digestible to both PHP and JavaScript, so it could easily be passed unaltered from the DB to the API's JSON payload without modification. It is trivially easy to perform math or comparative functions against this format. It is also easily grokkable for humans in the event debugging / code reasoning is required.

    My 2 cents.

  • I should clarify @chrisdepage that I mean the MySQL "timestamp" column type and not integer timestamps. It has the slight advantage that we see the dates in our proper timezone when looking at data directly in the database which can make certain troubleshooting easier. It also means there is no ambiguity on what date is in the database which we've run into for things like data imports.

    Currently we store most columns as MySQL "datetime" types, but hard-code the connection and PHP to UTC. We do conversions based on the user's timezone.

    In terms of our APIv2 our schema object recognizes date/times and converts them appropriately whether they are given to it as a string, DateTime object, or integer timestamp.

  • Ahhhh that's good. Please disregard the appropriate pieces of my previous comment.

  • I too had been misunderstanding your comments about timestamp to be a Unix timestamp, which I found mildly terrifying for the reasons Chris stated.

    I'm fine with moving to the Timestamp column type, but it sounds like it would be some careful work to pull that off so smoothly that we don't see support requests around it. Most of these other changes are "free" from a development standpoint, so that one stands out.

  • Could we lump into this converting any tables that aren't utf8-mb4?

    Additionally:

    • We should add an attributes column on GDN_Category. That would avoid plugins that provide per-category customizations from needing to add additional columns.
    • If we're adding IsFeatured to GDN_Comment, we should probably add it to GDN_Discussion as well
  • A modified the OP with your suggestions @charrondev.