How to Handle Broken Links in Selenium Python

Broken links are a common issue in web applications, and they can significantly impact user experience, SEO, and website credibility. As a QA engineer or automation tester, verifying that all links on a web page are functional is an important task. With Selenium in Python, you can automate the detection of broken links by combining Selenium for web element interaction and Python’s HTTP libraries for link validation. In this blog, we'll walk through how to handle broken links in Selenium Python effectively.


๐Ÿง  What Are Broken Links?

A broken link (also called a dead link) is a hyperlink that leads to a page that doesn’t exist or returns an error such as:

404 Not Found

500 Internal Server Error

Timeout

DNS failures

These issues usually arise due to deleted pages, incorrect URLs, or expired domains.


๐Ÿ”ง Tools Required

To detect broken links using Selenium Python, you’ll need the following:

Selenium – For web automation

requests – For sending HTTP requests and checking response codes

Install the necessary packages:


bash


pip install selenium requests

Set up your Selenium WebDriver:


python


from selenium import webdriver


driver = webdriver.Chrome()

driver.get("https://example.com")


๐Ÿ” Step-by-Step: Detecting Broken Links

1. Collect All Links

Use Selenium to find all <a> tags (hyperlinks) on the page:

python


links = driver.find_elements("tag name", "a")

urls = [link.get_attribute("href") for link in links if link.get_attribute("href") is not None]

This ensures you collect all valid URLs from anchor tags.


2. Filter and Validate URLs

Use the requests library to check the status code of each URL:

python


import requests


for url in urls:

    try:

        response = requests.head(url, allow_redirects=True, timeout=5)

        if response.status_code >= 400:

            print(f"[BROKEN] {url} returned status code: {response.status_code}")

        else:

            print(f"[OK] {url} returned status code: {response.status_code}")

    except requests.exceptions.RequestException as e:

        print(f"[ERROR] {url} failed due to: {e}")

requests.head() is faster than requests.get() as it fetches only headers.


Use allow_redirects=True to follow redirects (e.g., 301, 302).


Handle exceptions for timeouts or network issues.


✅ Best Practices

Avoid External URLs (Optional)

You might want to skip third-party links to focus only on internal site URLs:


python



base_url = "https://example.com"

internal_urls = [url for url in urls if base_url in url]

Ignore Empty and JavaScript Links

Filter out anchors like #, javascript:void(0), etc.


Multi-threading for Speed

Use the concurrent.futures module to check multiple URLs in parallel for faster execution.


๐Ÿ“Š Sample Output

csharp


[OK] https://example.com/about returned status code: 200

[BROKEN] https://example.com/contact-us returned status code: 404

[ERROR] https://invalidlink.com failed due to: ConnectionError


๐Ÿงช Integration with Test Cases

You can turn this into an automated test that fails if broken links are found:


python


assert all(response.status_code < 400 for response in responses), "Broken links detected!"

This is especially useful in CI pipelines to prevent deploying broken links to production.


๐Ÿš€ Conclusion

Handling broken links using Selenium Python is a crucial step toward ensuring web application quality. By leveraging Selenium to extract links and requests to validate them, you can automate the detection of dead links efficiently. This practice not only enhances user experience but also boosts SEO and brand trust. Add it to your automation suite and keep your websites healthy and error-free!


Learn Selenium with Pyhton Training Hyderabad

Read More: Capturing Console Logs using Selenium Python

Read More: Debugging Selenium Python Scripts

Read More:  Advanced XPath Techniques in Selenium Python

Visit IHUB Talent Institute Hyderabad
Get Direction

Comments

Popular posts from this blog

How to Use Tosca's Test Configuration Parameters

Top 5 UX Portfolios You Should Learn From

Using Playwright with Electron-Based Applications