URL Encoding: What Gets Percent-Encoded and Why

A URL can only safely contain a limited set of characters. Everything else — spaces, accented letters, &, ?, #, and friends — has to be percent-encoded: replaced with a % followed by the byte's two-digit hex value. Get the rules wrong and your query parameters silently break, your & splits a value into two, or a + turns into a space where you didn't want one.

What percent-encoding actually does

Each unsafe character is converted to its UTF-8 bytes, and each byte becomes %XX in hexadecimal:

space  → %20
&      → %26
=      → %3D
?      → %3F
#      → %23
é      → %C3%A9   (two bytes in UTF-8)

That last one matters: non-ASCII characters can become multiple %XX pairs, because they're more than one byte in UTF-8. é isn't %E9 — it's %C3%A9.

Reserved vs unreserved characters

The URL spec (RFC 3986) splits characters into groups:

Unreserved — A–Z a–z 0–9 - _ . ~. These are always safe and never need encoding.
Reserved — characters with structural meaning in a URL: : / ? # [ ] @ ! $ & ' ( ) * + , ; =. These are fine when they're doing their job (the / between path segments, the ? before the query) but must be encoded when they appear inside a value.

The whole trick to URL encoding is that last point. A & between two query parameters is structure. A & inside a parameter's value (say, a company name "Tom & Jerry") must become %26, or the parser will think a new parameter started.

?company=Tom %26 Jerry      ✓ value is "Tom & Jerry"
?company=Tom & Jerry        ✗ parsed as company="Tom ", then a stray "Jerry"

Why spaces are sometimes %20 and sometimes +

This confuses everyone. There are two encoding contexts:

In the path and most of a URL, a space is %20.
In a query string using application/x-www-form-urlencoded (the classic HTML form format), a space is +, and a literal + must be encoded as %2B.

So %20 and + can both mean "space," depending on where you are. If you build a query string by hand and your spaces come out as +, that's why — and it's correct for form-encoded data. The flip side: a real + (like in a phone number +1...) must be %2B, or it'll be read as a space.

The encodeURI vs encodeURIComponent trap

JavaScript gives you two functions, and picking the wrong one is the most common URL bug:

encodeURIComponent encodes a single value — it escapes & = ? / and the rest. Use this for each query parameter value or path segment.
encodeURI encodes a whole URL — it deliberately leaves & = ? / : alone because they're structural.

encodeURIComponent("a&b=c");  // "a%26b%3Dc"   ← right for a value
encodeURI("a&b=c");           // "a&b=c"        ← leaves & = alone

Rule of thumb: if you're assembling a parameter value, you almost always want encodeURIComponent. Reach for encodeURI only when you have a complete URL you want to make safe without breaking its structure.

Decoding and double-encoding

Decoding reverses the process: %26 → &. Watch for double-encoding — if a % itself got encoded to %25, then %2520 is really an encoded %20, which decodes to the literal text "%20", not a space. When a value comes back looking like %2520 or Tom%2520Jerry, something encoded it twice.

For a one-off — inspecting a messy redirect URL, decoding a parameter from a log, or encoding a value to drop into a query string — paste it into the URL encoder/decoder. It runs locally in your browser and handles the %XX math for you.

Takeaways

Unreserved characters (A–Z a–z 0–9 - _ . ~) never need encoding; reserved characters need it inside values.
Non-ASCII becomes multiple %XX bytes via UTF-8.
Space is %20 in paths, + in form-encoded query strings; a literal + is %2B.
Use encodeURIComponent for values, encodeURI for whole URLs.
If you see %25 showing up unexpectedly, suspect double-encoding.