Uniform Resource Locator

Short Definition

A URL is the address bar text that tells your browser where to find something on the internet. https://example.com/products/123 says: use HTTPS protocol, go to example.com, find the products section, get item 123.

Full Definition

A URL (Uniform Resource Locator) is the complete address to access a resource on the web.

URL Structure:

bash

1https://user:pass@www.example.com:443/path/to/page?key=value#section
2│      │ │        │                 │   │            │         │
3│      │ │        │                 │   │            │         └─ Fragment
4│      │ │        │                 │   │            └─ Query
5│      │ │        │                 │   └─ Path
6│      │ │        │                 └─ Port
7│      │ │        └─ Domain (hostname)
8│      │ └─ Credentials (deprecated)
9│      └─ Protocol (scheme)
10└─ https://

Components:

Protocol: How to access (http, https, ftp, file)

Domain: Where to go (example.com)

Port: Which service (80=HTTP, 443=HTTPS)

Path: What resource (/products/123)

Query: Parameters (?color=red&size=large)

Fragment: Page section (#reviews)

Why It Matters

Every web request starts with a URL
URLs control where data goes
URL structure affects security
Search engines rely on clean URLs
APIs use URLs as interfaces

How Attackers Use It

URL-based attacks:

1. URL Injection/Manipulation:

bash

1# Original
2https://bank.com/transfer?to=user1&amount=100
3
4# Attacker modifies
5https://bank.com/transfer?to=attacker&amount=10000

2. SSRF through URL:

bash

1# Normal
2https://site.com/fetch?url=https://cdn.example.com/image.jpg
3
4# Malicious
5https://site.com/fetch?url=http://169.254.169.254/metadata

3. Open Redirect:

bash

1https://trusted.com/redirect?url=http://evil.com

4. Path Traversal:

bash

1https://site.com/download?file=../../../etc/passwd

5. URL Parser Confusion:

bash

1http://evil.com@trusted.com  ← Which domain?
2http://trusted.com.evil.com   ← Subdomain trick

How to Detect or Prevent It

Prevention:

Validate URLs thoroughly
Use URL parsing libraries (don't regex)
Whitelist allowed domains
Check resolved IP after DNS lookup
Encode special characters properly
Use relative URLs when possible
Disable URL credentials in modern apps

For SSRF specifically:

python

1from urllib.parse import urlparse
2import socket
3
4def is_safe_url(url):
5    parsed = urlparse(url)
6    
7    # Check protocol
8    if parsed.scheme not in ['http', 'https']:
9        return False
10    
11    # Check domain whitelist
12    if parsed.hostname not in ALLOWED_DOMAINS:
13        return False
14    
15    # Check resolved IP
16    try:
17        ip = socket.gethostbyname(parsed.hostname)
18        if is_private_ip(ip):
19            return False
20    except:
21        return False
22    
23    return True

Detection:

Log all URLs processed by application
Alert on suspicious patterns:
- Private IP addresses
- Metadata service IPs
- Double encoding
- Unusual protocols (gopher, file)
- Excessive URL length

Common Misconceptions

"URL validation is simple" - Parser differences cause issues
"HTTPS URLs are always safe" - Can still point to malicious content
"Domain in URL is clear" - Parser confusion attacks exist
"URL encoding prevents attacks" - Attackers use it too
"URLs can't contain credentials" - They can (deprecated but works)

Real-World Example

GitHub SSRF (2017)

GitHub's image proxy accepted URLs:

bash

1https://github.com/proxy?url=...

Intended: External images only

Vulnerability: No validation of destination

Exploit:

bash

1url=http://127.0.0.1:6379/

Result: Accessed internal Redis server, executed commands.

URL Parser Confusion - Multiple Cases

Different libraries parse same URL differently:

bash

1http://evil.com@good.com
2
3PHP parse_url(): Host = good.com
4Python requests:  Host = evil.com

Validator sees good.com (passes), requester connects to evil.com (bypassed).

URI, Domain, Endpoint, HTTP Request, URL Parser

Short Definition

Full Definition

A URL (Uniform Resource Locator) is the complete address to access a resource on the web.

URL Structure:

bash

1https://user:pass@www.example.com:443/path/to/page?key=value#section
2│      │ │        │                 │   │            │         │
3│      │ │        │                 │   │            │         └─ Fragment
4│      │ │        │                 │   │            └─ Query
5│      │ │        │                 │   └─ Path
6│      │ │        │                 └─ Port
7│      │ │        └─ Domain (hostname)
8│      │ └─ Credentials (deprecated)
9│      └─ Protocol (scheme)
10└─ https://

Components:

Protocol: How to access (http, https, ftp, file)

Domain: Where to go (example.com)

Port: Which service (80=HTTP, 443=HTTPS)

Path: What resource (/products/123)

Query: Parameters (?color=red&size=large)

Fragment: Page section (#reviews)

Why It Matters

Every web request starts with a URL
URLs control where data goes
URL structure affects security
Search engines rely on clean URLs
APIs use URLs as interfaces

How Attackers Use It

URL-based attacks:

1. URL Injection/Manipulation:

bash

1# Original
2https://bank.com/transfer?to=user1&amount=100
3
4# Attacker modifies
5https://bank.com/transfer?to=attacker&amount=10000

2. SSRF through URL:

bash

1# Normal
2https://site.com/fetch?url=https://cdn.example.com/image.jpg
3
4# Malicious
5https://site.com/fetch?url=http://169.254.169.254/metadata

3. Open Redirect:

bash

1https://trusted.com/redirect?url=http://evil.com

4. Path Traversal:

bash

1https://site.com/download?file=../../../etc/passwd

5. URL Parser Confusion:

bash

1http://evil.com@trusted.com  ← Which domain?
2http://trusted.com.evil.com   ← Subdomain trick

How to Detect or Prevent It

Prevention:

Validate URLs thoroughly
Use URL parsing libraries (don't regex)
Whitelist allowed domains
Check resolved IP after DNS lookup
Encode special characters properly
Use relative URLs when possible
Disable URL credentials in modern apps

For SSRF specifically:

python

1from urllib.parse import urlparse
2import socket
3
4def is_safe_url(url):
5    parsed = urlparse(url)
6    
7    # Check protocol
8    if parsed.scheme not in ['http', 'https']:
9        return False
10    
11    # Check domain whitelist
12    if parsed.hostname not in ALLOWED_DOMAINS:
13        return False
14    
15    # Check resolved IP
16    try:
17        ip = socket.gethostbyname(parsed.hostname)
18        if is_private_ip(ip):
19            return False
20    except:
21        return False
22    
23    return True

Detection:

Log all URLs processed by application
Alert on suspicious patterns:
- Private IP addresses
- Metadata service IPs
- Double encoding
- Unusual protocols (gopher, file)
- Excessive URL length

Common Misconceptions

"URL validation is simple" - Parser differences cause issues
"HTTPS URLs are always safe" - Can still point to malicious content
"Domain in URL is clear" - Parser confusion attacks exist
"URL encoding prevents attacks" - Attackers use it too
"URLs can't contain credentials" - They can (deprecated but works)

Real-World Example

GitHub SSRF (2017)

GitHub's image proxy accepted URLs:

bash

1https://github.com/proxy?url=...

Intended: External images only

Vulnerability: No validation of destination

Exploit:

bash

1url=http://127.0.0.1:6379/

Result: Accessed internal Redis server, executed commands.

URL Parser Confusion - Multiple Cases

Different libraries parse same URL differently:

bash

1http://evil.com@good.com
2
3PHP parse_url(): Host = good.com
4Python requests:  Host = evil.com

Validator sees good.com (passes), requester connects to evil.com (bypassed).

URI, Domain, Endpoint, HTTP Request, URL Parser

Uniform Resource Locator(URL)

Short Definition

Full Definition

Why It Matters

How Attackers Use It

How to Detect or Prevent It

Common Misconceptions

Real-World Example

Sources & References

Uniform Resource Locator(URL)

Short Definition

Full Definition

Why It Matters

How Attackers Use It

How to Detect or Prevent It

Common Misconceptions

Real-World Example

Sources & References

Sources & References

Related Concepts

Domain Name

Endpoint

HTTP Request(HTTP)

Sources & References

Related Concepts

Domain Name

Endpoint

HTTP Request(HTTP)