How to Handle Broken Links in Selenium Python

July 07, 2025

Broken links are a common issue in web applications, and they can significantly impact user experience, SEO, and website credibility. As a QA engineer or automation tester, verifying that all links on a web page are functional is an important task. With Selenium in Python, you can automate the detection of broken links by combining Selenium for web element interaction and Python’s HTTP libraries for link validation. In this blog, we'll walk through how to handle broken links in Selenium Python effectively.

🧠 What Are Broken Links?

A broken link (also called a dead link) is a hyperlink that leads to a page that doesn’t exist or returns an error such as:

404 Not Found

500 Internal Server Error

Timeout

DNS failures

These issues usually arise due to deleted pages, incorrect URLs, or expired domains.

🔧 Tools Required

To detect broken links using Selenium Python, you’ll need the following:

Selenium – For web automation

requests – For sending HTTP requests and checking response codes

Install the necessary packages:

bash

pip install selenium requests

Set up your Selenium WebDriver:

python

from selenium import webdriver

driver = webdriver.Chrome()

driver.get("https://example.com")

🔍 Step-by-Step: Detecting Broken Links

1. Collect All Links

Use Selenium to find all <a> tags (hyperlinks) on the page:

python

links = driver.find_elements("tag name", "a")

urls = [link.get_attribute("href") for link in links if link.get_attribute("href") is not None]

This ensures you collect all valid URLs from anchor tags.

2. Filter and Validate URLs

Use the requests library to check the status code of each URL:

python

import requests

for url in urls:

try:

response = requests.head(url, allow_redirects=True, timeout=5)

if response.status_code >= 400:

print(f"[BROKEN] {url} returned status code: {response.status_code}")

else:

print(f"[OK] {url} returned status code: {response.status_code}")

except requests.exceptions.RequestException as e:

print(f"[ERROR] {url} failed due to: {e}")

requests.head() is faster than requests.get() as it fetches only headers.

Use allow_redirects=True to follow redirects (e.g., 301, 302).

Handle exceptions for timeouts or network issues.

✅ Best Practices

Avoid External URLs (Optional)

You might want to skip third-party links to focus only on internal site URLs:

python

base_url = "https://example.com"

internal_urls = [url for url in urls if base_url in url]

Ignore Empty and JavaScript Links

Filter out anchors like #, javascript:void(0), etc.

Multi-threading for Speed

Use the concurrent.futures module to check multiple URLs in parallel for faster execution.

📊 Sample Output

csharp

[OK] https://example.com/about returned status code: 200

[BROKEN] https://example.com/contact-us returned status code: 404

[ERROR] https://invalidlink.com failed due to: ConnectionError

🧪 Integration with Test Cases

You can turn this into an automated test that fails if broken links are found:

python

assert all(response.status_code < 400 for response in responses), "Broken links detected!"

This is especially useful in CI pipelines to prevent deploying broken links to production.

🚀 Conclusion

Handling broken links using Selenium Python is a crucial step toward ensuring web application quality. By leveraging Selenium to extract links and requests to validate them, you can automate the detection of dead links efficiently. This practice not only enhances user experience but also boosts SEO and brand trust. Add it to your automation suite and keep your websites healthy and error-free!

Learn Selenium with Pyhton Training Hyderabad

Visit IHUB Talent Institute Hyderabad
Get Direction

Search This Blog

IHUB Talent Training Institute