# How to Read Any Regex, Token by Token

Nobody reads `^(?=.*\d)[a-z0-9-]{3,16}$` at a glance. But every regex, no matter how hostile it looks, is just a sequence of small tokens read left to right — and there are only about six kinds of token. Learn to segment a pattern into them and you can decode anything you find in a codebase.

## The method: segment first, interpret second

Don't try to understand a regex whole. Split it into tokens, then read each one:

```
^          anchor: start of string
[a-z0-9]   character class: one lowercase letter or digit
+          quantifier: ...one or more times
(?:-...)   group: a hyphen followed by...
*          quantifier: ...zero or more times
$          anchor: end of string
```

That's the [slug pattern](/regex/slug) — lowercase words joined by single hyphens — and read this way it's almost prose.

## The six token types

**1. Literals.** Most characters just mean themselves. `abc` matches the string `abc`. The moment a pattern stops being scary is the moment you realize 80% of it is usually literal text.

**2. Character classes.** `[a-z0-9_]` means "one character from this set." A leading `^` inside the brackets negates it: `[^\s@]` is "anything except whitespace and @." The shorthands are classes too: `\d` (digit), `\w` (word character), `\s` (whitespace), and `.` — any character at all, which is why a *literal* dot must be escaped as `\.`.

**3. Quantifiers.** They attach to whatever came immediately before: `+` (one or more), `*` (zero or more), `?` (zero or one), `{3,16}` (three to sixteen). `[a-z]+` is "one or more lowercase letters"; `https?` is "http, then an optional s."

**4. Anchors.** `^` and `$` pin the match to the start and end of the string. A validation pattern without both is a bug factory — `\d{4}` *finds* four digits inside `abc12345xyz`, while `^\d{4}$` requires the whole string to be exactly four digits.

**5. Groups and alternation.** Parentheses group tokens so a quantifier or alternation applies to all of them. `(0[1-9]|1[0-2])` reads as "01–09 **or** 10–12" — that's how the [ISO date pattern](/regex/date-yyyy-mm-dd) expresses a valid month, and how the [IPv4 pattern](/regex/ipv4) spells out "a number from 0 to 255," which regex can't say any shorter. `(?:…)` is the same thing without capturing — prefer it unless you need the captured value.

**6. Lookarounds.** `(?=…)` peeks ahead without consuming characters. The [password pattern](/regex/password) chains four of them — `(?=.*[a-z])(?=.*[A-Z])(?=.*\d)…` — each scanning the whole string for one requirement before `.{8,}` does the actual matching. One caveat: lookarounds don't exist in Go or Rust's default engines (RE2), where you write separate checks instead.

## Worked example

```
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
```

Segment it: anchor · class+ · literal `@` · class+ · escaped dot · class with `{2,}` · anchor. In words: "start, one or more local-part characters, an @, one or more domain characters, a literal dot, at least two letters, end." That's the [email pattern](/regex/email) — and now it reads like a sentence.

## The traps that bite everyone

- **The unescaped dot.** `devkult.com` as a pattern also matches `devkultXcom`. Escape it: `devkult\.com`.
- **Missing anchors.** Validation without `^…$` accepts garbage with a valid substring inside.
- **Greedy matching.** `".*"` on `say "a" and "b"` matches from the first quote to the *last*. Use `".*?"` (lazy) or better, `"[^"]*"` (explicit).
- **Alternation scope.** `^http|https$` is *not* "http or https" — it's "starts with http, **or** ends with https." Group it: `^https?$`.
- **Quantifier target.** `ab+` matches `abbb`, not `ababab`. The `+` binds only to `b`; you wanted `(?:ab)+`.

## Practice on real patterns

The fastest way to internalize this is reading annotated real-world patterns. Every entry in the [regex pattern library](/regex) — [email](/regex/email), [URL](/regex/url), [UUID](/regex/uuid), [IPv4](/regex/ipv4), [phone](/regex/phone-number), and more — comes with exactly this kind of token-by-token table, match/no-match examples, and a pre-loaded [live tester](/tools/regex/regex-tester) so you can break the pattern and watch what changes.

Six token types, read left to right. Every regex is just those, composed.
