How to embed a URL into another URL

Sebastian G.
6 min readJun 28, 2021

--

Photo by Chantal Lim on Unsplash

Sometimes, we need to redirect a user from one webpage to another webpage. This can happen due to various reasons, from my perspective, mostly out of three reasons:

  1. when navigating, the user first has to perform some action (e.g. to login) before seeing the initially requested resource. Only after a successful execution of the action, the user is then forwarded to the initially requested resource
  2. some sort of tracking is in place, which routes the user first on a tracking url, which fires a bunch of tracking and then redirects the user to the original resource
  3. similar to the second case, there are URL-shortener all over the web, most-prominently bitly.com, which try to make very-long URLs more readable for human beings

In any of the above cases, we need to make sure that the user ends up at the exact same URL, they tried to access, even though, we route them through another URL.

Generally, there are two approaches to do so:

  1. add URL to a GET-parameter
  2. first transform the URL into a unique ID that is then handed over via a GET-parmeter

Both of these approaches have their pros and cons, which I would like to discuss here.

Add URL to a GET-parameter

Different parts of a URL (taken from MDN)

A URL consists of the above parts (scheme, authority, path, parameters, anchor).

We are especially interested in the light-blue parts with the parameters, because this is where we need to embed our target URL.

An example (with a problem)

Let’s have a look at an example, to make it more clear. We want to read article 123on domain.comin german language. We call

https://www.domain.com/articles/123?lang=de_DE 

However, we first need to login to the platform to access the article, so we get redirected to:

https://www.domain.com/login?dest=<target_url> 

Pretty simple, right? If we would write this in full length, this would be

https://www.domain.com/login?dest=https://www.domain.com/articles/123?lang=de_DE

Looks a tad weird, but still readable. This could actually be ok. However, what if, for some reason of the underlying system, we would need to add another parameter to the login URL? Something like force=1 ? This would then result in:

https://www.domain.com/login?dest=https://www.domain.com/articles/123?lang=de_DE&force=1 

Now, the problem is, that a URL parser would not know if the force=1 belongs to the outer (login) or the inner (article) URL, which usually leads to a problem in the later processing.

The solution

The solution is to encode every special char of the inner URL into “something else”, so that the ambiguity will be solved.

If you are a Javascript-nerd like me, you probably know already the function encodeURIComponent() and its pendant decodeURIComponent(). Mainly every language has their own version of it.

If we encode our inner URL through the function, it will look like this:

encodeURIComponent("https://domain.com/articles/123?lang=de_DE");// https%3A%2F%2Fdomain.com%2Farticles%2F123%3Flang%3Dde_DE

Now, this can be easily added as a parameter, as the & symbols are encoded and will not interfere with the parameters from the outer URL:

https://www.domain.com/login?dest=https%3A%2F%2Fdomain.com%2Farticles%2F123%3Flang%3Dde_DE&force=1

Definitely on the PRO side, we can name the pretty easy and simple-to-setup solution. Unfortunately, there are two downsides of this approach.

The first downside is the maximum length of a URL (2048 chars, according to Google). This is quite a lot, but depending on the length of the outer URL and the length of the inner URL, this could be reached pretty fast. Even faster, if the inner URL, also has some sort of embedded URL (#inception).

The second downside comes from a security-angle. As the target URL might contain sensible information and all parameters of the outer URL can be accessed by an admin of the server that serves the outer URL, writing the URL as plain text could leak those sensible information.

You probably say, that these days, no sensible information is added to URLs in plain sight, but you’d be surprised how often such things happen. I mean, we live in a time where you can successfully search for credentials on github.

In addition, the solution from above does not help our third usecase where we want to actually shorten a URL…

Transform URL to unique ID

In order to free two birds with one key, the security issue, as well as our third usecase, there is a second technique that you can use.

Given that you have access to the code that is running on the server, you could dynamically calculate/define a short identifier for each URL that you want to share via another URL and simply use this identifier instead of the URL.

Staying with our example from above with article 123, we would then first need to transform the URL into a unique identifier:

https://www.domain.com/articles/123?lang=de_DEtransform_into_id(...) -> h6f78bg

To then update our login-redirect URL into:

https://www.domain.com/login?dest=h6f78bg

If you then login to the page, the server has to look up the given URL in the database and if it exists, return the assigned URL.

The code to generate the identifier is stored on the server, so no disclosure to an outsider is possible. On the other hand, the client-code needs to know at some point which identifier to navigate to. This is usually handled in a POST request beforehand that will not reveal any data in the log files and hence is more secure than our first approach.

Calculate/define such identifier

I do not want to go too much into detail how this happens. To stay on our current flight level, there are two approaches

  1. assign identifier randomly to each URL and store in database
  2. calculate a short identifier that can be easily transformed back into the original URL

The implementation of the first option is rather easy, if you strip away the problems that come with maintaining a database with limited physical size (*).

“What size does my database need?”

This highly depends on the usage, depending on the number of URLs you expect to be shared and their typical length. In the case of bit.ly, they might need more space than someone that builds a small side-project with only a couple of URLs that need to be shortened. Also, you would have to define, if you want to enforce an automatic-deletion policy after 90 days to free-up space on your database. But this is a topic on its own, so we will not dive deeper in this. If you would like me to write about this, please drop me a line.

The second option is that you have a symmetric encrypt/decrypt algorithm that actually calculates a short identifier from a long URL and can also calculate the original long URL back. While this in general rather easy to implement and comes with the advantage of not caring at all about a database and its size and costs, this will result in slightly larger identifiers and is risky if you have to update your encryption algorithm anytime in the future.

Long story short

I presented two different ways to embed a URL into another URL in order to redirect a user after a successful action: By encoded embedding or by transformation into an identifier.

I showed pros and cons of both techniques without a clear winner — it depends on the usecase.

Happy coding 🤓

Discussion

I am very interested to hear your thoughts and experiences regarding this or other approaches.

  • Have you an addition to the above possibilities?
  • Any fun war-stories that you want to share?

If you are interested in connecting with me, please do so and please drop me a short line where you are coming from.

--

--

Sebastian G.
Sebastian G.

Written by Sebastian G.

Product Manager, Creator of Software, Gen-Y — 😍 to build!

No responses yet