Code Standard Proposal: Ban modification or creation of HTML in PHP code

The issue

In the last few weeks, I have both created myself or discovered during code review of other's code XSS vulnerabilities while editing existing views/HTML built with PHP.

We already have a backend language configured that prevents XSS. Twig.

Despite it being ready available

  • any available view could have their extension changed to .twig and would be loaded with the appropriate view renderer).
  • Any small inline snippets can be easily pulled out relative to the file and quickly loaded from a non-controller context by using Vanilla\Web\TwigRenderTrait . See this error embed and its accompanying class.

We do not actually make these changes a lot of the time. I have used the justification and seen it given "it's not in scope".

It should never be out of scope to do something securely. We shouldn't be resetting the "days since last XSS introduced" counter every month.

Proposal

I can add lint rule that bans modification or using of HTML strings in PHP code, and modification of any file in a /views/* directory that has the extension *.php.

It would then be part of the scope of any work that requires writing HTML in an existing section of the site or creating a new one to do it in twig.

This would apply to all teams and be enforced in CI for all repos that have it.

Pitfalls

I can see a few pitfalls to this:

  • We may be missing some utilities in twig to easily convert our PHP views. (url, link, firing event to capture extra HTML output, etc).
  • Certain conventions are established yet. (how would you convert just part of a giant helper_functions view for example).

I've created an issue to try and solve these pitfalls before putting the rule in effect.


Comments

  • I'm not a 100% yes on this because I don't think we should saddle devs with switching out entire views just because they want to fix one bug in it.

  • Then that would be a no? The proposal is to do exactly that so that this is a known quantity ahead of time.

    Currently the answer is always "It's not in scope" or "There's not time". Somehow there is time though for Team B to triage the hackerone tickets, pay out a $600 bounty, and backport the fix to multiple branches though.

    To me, it's a similar mentality we have to take to the rest of our lint rules. It's technical debt. If we want to start cleaning it up, we can't add more, and each need to make incremental fixes along the way.

    If you edit a function with no docblocks, our linter requires you to fix them before getting your PR merged.

    Similarly if you touch a really long line, you have to break it up or your PR will fail in CI. You might not have originally written the very long line, wrote the undocumented function, or created the view in PHP, but if we want things to progressively better over time we all need to commit to taking an extra 5 minutes on certain PRs. If something is so critical that there isn't 5-10 minutes for this, we're probably going to want to bypass the normal process anyways.

  • I'm (the only one) in the "no" camp. I 100% agree any and all new views/HTML need to be done in Twig. Where you're losing me is that anytime I touch any view I have to convert it to Twig. I think converting the old views is something we should do, but I don't think I should be responsible for rewriting a view just to touch it. This will likely lead to developers taking the "lazy" route and hacking in some fix that doesn't require an update to the view in order to save time.

    I think it would be better to create real issues (e.g. "Convert all the views in vanilla/categories") and epics that are actually sprinted to convert these older views. We know we need to convert them. Let's make it a priority and convert them in earnest.

  • Same here. I agree we should improve our views and all new views should be in Twig. I think what you're proposing is risky. Also, we have so much customizations that could easily break here.

    I really do want to update our HTML. I think we'll need a strategy to update everyone without breaking the world.

  • My idea for updating things was to make self contained modules. Example, we make a module in react to replace the discussions list. Then we could have a way to swap out the existing view with that. Slowly, we can replace the old with the new.

  • Unknown
    edited September 2019

    @Ryan

    I think it would be better to create real issues (e.g. "Convert all the views in vanilla/categories") and epics that are actually sprinted to convert these older views. We know we need to convert them. Let's make it a priority and convert them in earnest.

    Do you feel similarly about the rest of our coding standards? Those have the same behaviour with the forum codebase, where section you modify need to be brought up to date.

    How would you feel about it if went a little bit differently:

    • Ban modification or adding HTML in any PHP file that is not in a /views/* directory (should be TwigRenderTrait ). These tend to be much smaller snippets anyways.
    • Any view that has an XSS in it fixed gets converted to twig.
    • All new views are in twig (kind of doing it already, but really need to push it in services).
    • Each product team commits to converting 1 view per sprint.

    Without any ongoing commitment to it I don't think it's going to get anywhere in tackling the very risky tech-debt problem that we're dealing with here.

    @slafleche

    Also, we have so much customizations that could easily break here.

    What are you worried would be breaking here? I'm not talking about changing the HTML structure here. I'm talking about 1:1 conversions that keep the same markup and fire the same events. The real difference is

    1. Escaped by default.
    2. Views are clearer to read (this would make converting them in the future easier).
  • What are you worried would be breaking here?

    Everything! There are so many things to worry about including customizations, plugins, overwrites. I don't think it's worth it. There are so many rabbit holes this can lead into.

    I think we're better off looking to the future and finding a way to update clients to new HTML.

  • Do you feel similarly about the rest of our coding standards? Those have the same behaviour with the forum codebase, where section you modify need to be brought up to date.

    I think the time invested in most of those is trivial compared to rewriting a view. If I have to spend 15 seconds slapping a comment block on method or five seconds breaking up an array/argument list across multiple lines, it's whatever. Rewriting templates isn't going to be a huge time sink in most cases, but it'll certainly be incomparable to the amount of time necessary to fix most legacy coding standard issues, now.

    All new views are in twig (kind of doing it already, but really need to push it in services).

    Each product team commits to converting 1 view per sprint.

    These two are definitely inline with what I'm thinking. New stuff should certainly be in Twig. We also need to be proactive, rather than reactive, to updating these templates. I'm all for converting batches of views (e.g. everything under Vanilla's "categories" view folder or everything under Dashboard's "activity" view folder), per sprint. It'll be easier to estimate and plan that way. I think we'll be able to move through these very quickly, once we get started, and we won't need to wait for the next security issue to come in to make updates we already know we need to do.

  • I do agree with @Ryan and @slafleche here and add one more argument that converting legacy views definitely doesn't take 5 minutes if done properly.

    From the services point of view this sort of thing is never measured when writing an estimate. Clients have no interest in paying for refactoring code because for them, essentially it doesn't matter. Let's not forget that Services main goal is to pay for itself.

    My point of view is the same as @Ryan , this sort of thing should be properly scoped and not just "done on the run".

  • @Isis

    From the services point of view this sort of thing is never measured when writing an estimate.

    Maybe it should be? That's part of what this proposal is. Do we have a mandate from management that we need to cut all of the corners to take 30 minutes off of the project time?

    @slafleche

    Everything! There are so many things to worry about including customizations, plugins, overwrites.

    Customizations and plugins have no effect on the original view. I'm saying we should still fire the same events, and plugin overwrites don't care if the view extension is .php, .tpl or .twig. Not sure in what way you expect that to break anything downstream.

    @Isis

    That converting legacy views definitely doesn't take 5 minutes if done properly.

    How much of that is due to unfamiliarity with twig and lack of certain utilities? Maybe the first couple of times would take longer, which is why I've taken some time in my current sprint to convert a few complex views with events, form values, various function calls, etc and see what infrastructure we are missing.

    What about the alternate version I proposed (leaving existing views in the /views folder alone)? It would still require modified HTML outside of the /views directory to be converted. That would mean echo calls in themehooks. I understand some of our core views (especially involving lots of table functions) are particular complex and may need a little bit more effect, but your average random themehooks echo is far simple to replace with a call to $this->renderTwig().

  • It feels like implementing this 100% immediately creates a minefield of potential surprises, while doing it incrementally doesn't seriously impact the benefit while very much mitigating the potential for surprise costs.

    Do we have a mandate from management that we need to cut all of the corners to take 30 minutes off of the project time?

    No, but there would probably be a management mandate that implementing a policy like this would then put R&D on the hook to unblock a services developer that comes across a view they cannot figure out how to safely update, or did not anticipate needing to touch the view when writing the estimate.

    It's a sound goal; the timeline to full compliance feels aggressive for the sake of being aggressive.

  • I think we all want to do this, but it's just a question of priorities. Everyone has already given some incremental ideas in this thread. Given that we would implement this incrementally no matter what, let's order the steps we would take to get to 100% beautiful Twig.

    1. All new views are made in twig. This is insignificant extra work and we should do it.
    2. What about new views that don't have the correct helper functions or other necessary access to our framework? This is the reason why Smarty never took off for us; we didn't have the ability to do the work back then. I think our team is much stronger now than back then. However, in order to do this in agile we must have a commitment from R&D that they will patch the Twig integration layer if any other dev gets blocked by that no matter what. I can do my part in this and negotiate away points in the sprints to accommodate as best as possible, but only the development teams can modify sprints, so it is your decision on whether or not to do this.
    3. Finally, we have our very complex views that are likely difficult to ensure that we faithfully translate them 100%. None of our current views are backed by tests so we have no failsafes against mistakes. Shouldn't new views also require tests? Keep in mind that this whole change is being proposed because of bugs caused by simple human error. Verily, this change has a similar potential for many other simple human errors. That decreases the value in doing this.

    I feel like I've given an intellectually honest appraisal for number three above. Does that mean we shouldn't do it? Absolutely not. I don't want to be a company that doesn't strive to make positive changes like this. The good news is that I think we do make positive changes, and we've been getting better at making those changes recently.

    That being said. I really want you all to think through these tactical decisions with some more points in mind:

    1. What are the alternatives? In business there is a term called "next best alternative" that is useful in decision making. What is the next best thing we can do? For example, we could fairly easily write a lint rule that detects 99% of XSS vulnerabilities in our PHP views. Is doing that provide more net value to us? We need to ask ourselves to weigh alternatives to help us make better decisions.
    2. What else could we be doing instead of this. This is the job of our backlog. I think if there is an impression that management thinks stuff like this is "not in scope" then that is a failing on my part. The backlog is quite clear on product development, but not on the issue of tech debt. I can help sort some tech debt into the backlog. That is primarily the job of the engineering team though, and I think you should include some of the larger pain points of support. I can tell you that our notifications scaling issue is causing a lot more unhappiness at Vanilla right now. We prioritize engineering goals alongside product goals and devs can advocate for any goals during any planning meeting.

    In the end you may decide: "I'm a human and just want to do this or that because it will satisfy me." That's totally fine. I will try my best to support you. Development is an aesthetic exercise, and our creativity and best work can only flourish if we now and then take the time to scratch those itches "just because".

  • Unknown
    edited September 2019

    Objections

    We don't account for tech-debt in estimates

    @Isis

    From the services point of view this sort of thing is never measured when writing an estimate. Clients have no interest in paying for refactoring code because for them, essentially it doesn't matter. Let's not forget that Services main goal is to pay for itself.

    I'm not proposing that everyone continues with the same estimates if they have to modify an existing view or that we add line items to our estimation process for tech-debt.

    I'm proposing that we include scope for the tech-debt in normal deliverable issues.

    For example, delivering the product management UI in the dashboard, there was 1 task for UI implantation. I did not have 5 separate tasks for

    • Fixing our dashboard measurement units that conflicted with our new components from knowledge base.
    • Resolving conflicts between our new modals, and the bootstrap modals used in the dashboard.
    • Creating new view primitives in React for the dashboard that interop with existing stylesheets.
    • Adding these components to the storybook.
    • Creating an interop layer between our old form, and new react elements in the form.
    • Writing unit tests for the data manipulations done.

    These were all part of "the definition of done". They were implementation details of the greater task.

    It's too fast

    @Linc

    It feels like implementing this 100% immediately creates a minefield of potential surprises, while doing it incrementally doesn't seriously impact the benefit while very much mitigating the potential for surprise costs.

    What if, before implementing a hard stop in linter against, we issue warnings, and promote awareness of it through code review for a month before flipping the flag in the linter?

    This would allow additional time for problem areas to be discovered. I don't think leaving a subset of developers to be responsible for this is an "incremental" approach. I think it just compartmentalizes the problem.

    Some views are too complex

    @Linc

    No, but there would probably be a management mandate that implementing a policy like this would then put R&D on the hook to unblock a services developer that comes across a view they cannot figure out how to safely update, or did not anticipate needing to touch the view when writing the estimate.

    From my experience most estimates contain ~4 hours for code review, or at least they did when I was still reviewing estimates. That should leave plenty of time to either

    • Unblock the developer.
    • Decide to make an exception for a particular view because it is too complex for that timeline.

    I would satisfied to keep an exception list for views that are particularly problematic use that is a priority list of views that should be tackled pro-actively by one of the product teams.

    Potential for breakage in views

    I feel like this could be challenging for certain views in particular. Unfortunately almost none of our existing PHP views could be subject to tests without first decoupling some data fetching that can occur in them. With those first moved to the controller we could more easily do snapshot tests for views with complex logic.

    I think we have a lot of other parts of the application that are in more dire need of tests than the view layer though (like portions of React view layer that are untested.)

    We shouldn't refactor existing views. Make new versions instead.

    @slafleche

    I think we're better off looking to the future and finding a way to update clients to new HTML.

    My idea for updating things was to make self contained modules. Example, we make a module in react to replace the discussions list. Then we could have a way to swap out the existing view with that. Slowly, we can replace the old with the new.

    Unfortunately the timeline for this type of thing is far too long for it to be of any use. We would then be maintaining 2 views, and still have potential security holes in the old one.

    Bit of scope creep there too. We need to do this, and may be starting it this year, but it doesn't solve this problem. This is also why I'm not proposing just going in to refactor all existing views in one go. Some views may never need to be touched again until we retire them, and I'm ok with that. The ones we do need to update in the meantime should be updated to follow best practices in my book.