Reverse Proxy - What IS the big deal?
I'm working with Immediate Media. They are currently live with Gardener's World, but are working on:
- Made for Mums
- You and Your Wedding
- Sky at Night
- Who do you think you are?
- Hitched
- Bike Radar
- TBD
Many stakeholders are involved in these projects and so the IM team and I are fielding a lot of questions and concerns, and are struggling to reassure them, and to deter some "out of the box" thinking.
The main question that keeps coming back is around sub-domain vs. subfolder.
This client came on board with the idea of using hubnode, but this was quickly thrown out when it became clear that each community needed it's own top level domain.
Now, they initially agreed that sub-domains should be fine, but stakeholders and SEO teams were a bit iffy about it.
Now with the 2nd, 3rd, and 4th projects underway, the question is coming back again, and this time they've come up with their own solution:
As you know, one of the big areas of concern regarding SEO is the sub-domain / sub-directory issue, which is probably the most visible to our stakeholders. We’ve been having a little play around here and may have found a solution, by using a very light piece of middleware. Below is a summary from one of our lead developers:
I set up a single route on the default domain, with a path of “/forum/”. Any page request that begins with “example.com/forum/”will be handled by this route.
As soon as a page request to this route is received, it breaks off the part of the URL after “/forum/”.
It then uses this to send a page request to “forum.example.com”.
For example:
“example.com/forum/discussion/1010386/how-to-post-a-new-discussion” will send a page request to “forum.example.com/discussion/1010386/how-to-post-a-new-discussion”.
It then takes the content it gets from that page request, and returns it to the user.
There are also a few rules set up to handle things like relative links on the page, as well as absolute links to the original “forum.example.com”, so that the end result closely resembles the original page, with working links that all refer to “/forum/”.
There are further discussions and investigation needed for registering and logging in, as well as the handling of cookies, and existing redirects, so this is by no means a complete solution yet.
What they are describing here is essentially reverse proxy, from what I've been told.
Ops and R&D have been pretty clear that reverse proxy is not an option, and that is has security problems, but the consensus has been that we need a clear and consistent way of communicating to clients that this is not an option and why. Immediate Media has been told, but they don't understand why, so they are of course, are still asking questions.
What is the messaging here? Why is reverse proxy not an option?
Comments
-
Sorry, I am new to the team and to the project and don't know the whole project well.
But:
For example:
“example.com/forum/discussion/1010386/how-to-post-a-new-discussion” will send a page request to “forum.example.com/discussion/1010386/how-to-post-a-new-discussion”.
let say VF receive the request as
forum.example.com/discussion/1010386/how-to-post-a-new-discussionthat means some links on the page will haveforum.example.comas a domain name.Examples:
<link rel="canonical" href="https://forum.example.com/discussions" /><meta property="og:url" content="https://forum.example.com/discussions" /><a href="https://forum.example.com/discussion/1/welcome-to-awesome#latest">Welcome to awesome!</a>And some others. We do use relative links for most cases, but some of them still include domain name and those are pretty important for SEO. When those few (or many) links will point to another domain it can bring some cookie / authorization / page "behavior" conflicts.
0 -
https://github.com/vanilla/internal/tree/master/plugins/reverseproxysupport
^ This handles all the technical problem from our side.
The problem is the technical problems that can happen on the client's side. Setting up a reverse proxy increase the risks that something can go wrong (uptime). It also increase support and many other factors.
1 -
and many other factors.
I think Val is looking for specifics and therefore the reason for the post.
I am lucky that we approved Qualtrics using the reverse proxy plugin on their site as that would have been a painful debate with their Head of SEO (we had numerous other painful conversations). @AlexanderKim If you want to see a live example you can check out that community.
There's also the fact that Qualtrics was really only to be able to do it because the reverse proxy plugin existed in the first place. It existed because it was approved as a service for Big Fish Games. Therefore, there's at least twice where we have said "okay we can do this".
I won't really comment on the support or security because it's not my place to say. So far there really hasn't been any major issues with Qualtrics reverse proxy support to my surprise actually. I know we took a bit of extra time to debug their last release but then ended up accidentally deploying to them anyway on the weekend. I know for one, we aren't allowed to use VF Spoof with Qualtrics because of how it pings their server when we do that and it isn't a 'true' SSO.
Like I mentioned in hipchat the other day, we need to find a way to squash this subdomain vs subfolder debate once and for all. I know from my own Googling it appears you could maybe give a 55-45 ratio to subfolder's being 'superior' but ultimately subdomains are the industry standard for any hosted solution i.e SalesForce, Hubspot, Lithium.
That said, maybe it's worth figuring out how to make the Reverse Proxy safe and scalable and only offer it on our Enterprise plan as a differentiating factor.
1 -
Thanks @BrendanParm. The customer understands that:
The problem is the technical problems that can happen on the client's side. Setting up a reverse proxy increase the risks that something can go wrong (uptime). It also increase support and many other factors.
.
But, in order to put the conversation to bed, he wants to know why.
0 -
Reverse proxy means the customer's server is sitting between the user's browser and the response from our servers, acting as a traffic cop intermediary. That means they are inserting themselves as a dependency of our cloud system, which defeats one of the goals of using a cloud provider (independence from self-hosted systems).
There are pretty dramatic downsides:
- Increases response times, because it's making a double hop - first to their servers, then our servers (similar to embedding, really). Response time is known to impact SEO (as announced by Google), wheras the impact of subfolder vs subdomain is hearsay.
- Makes uptime dependent on the customer.
- Makes troubleshooting networking issues more difficult, because we do not control the first hop in the request / response chain.
- Compromises the security of our user spoofing system, used to investigate support issues on communities. We cannot securely use vfspoof on a reverse proxy-enabled site because our credentials would be passed thru the customer's servers.
15 -
Would this mean that that Qualtrics's SLA guaranteed uptime is not exactly true? I suppose if we deployed code that broke their site we could be liable. However, if they subject to a DDoS attack and the traffic is being routed through Akimai on their end then it wouldn't really be our bad? Could be sticky.
0 -
This ticket exists because they are using the Reverse Proxy plugin.
So to fix these caching issues we implemented a fix where we return the Vary Header (https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Vary) with the values "Accept-Encoding, Cookie"
If you request https://www.qualtrics.com/community/ you can see that the Vary Header is set to "Accept-Encoding"
By adding my WAN IP to the ReverseProxySupport.Redirect.ExcludedIPs configuration and by requesting https://qualtrics.vanillacommunities.com/ you can see that the proper values ("Accept-Encoding, Cookie") are returned.
This means that their proxy is dropping that header and setting it to the default value.
We want to update our caching strategy so that we use CloudFlare instead of Varnish for caching and there's a high % of change that we'll either have to specifically exclude Qualtric in our rules or we'll have to wait for them to update their proxy to make sure that all the headers we need returned are returned properly.
TL;DR; Having clients using the reverse proxy = More support, SLA exception, More work/problem on Dev/Ops side
1 -
Would this mean that that Qualtrics's SLA guaranteed uptime is not exactly true?
It's a very big asterisk.
0 -
I've rewritten our "Custom Domains" help page to incorporate all the points from this discussion as well as my arguments to Immediate Media (which I believe were successful in dissuading them from proceeding with such a setup).
1 -
I worked on this issue with Ryan the past week and remembered this thread, so I'm just going to pop this here: https://github.com/vanilla/support/issues/1697
Another interesting thing to do if someone wants to dig deeper this rabbit hole is to see how Reverse Proxy has affected support: https://github.com/vanilla/support/issues?q=is%3Aissue+reverse+proxy
1 -
Not just that. We most of the time have to dedicate a couple of extra hours on infrastructure work to ensure that this won't break Qualtrics (it ends up breaking it anyway in some case). The complexity it adds is not worth it.
1