How to Handle Broken Links in Selenium Python
Broken links are a common issue in web applications, and they can significantly impact user experience, SEO, and website credibility. As a QA engineer or automation tester, verifying that all links on a web page are functional is an important task. With Selenium in Python, you can automate the detection of broken links by combining Selenium for web element interaction and Python’s HTTP libraries for link validation. In this blog, we'll walk through how to handle broken links in Selenium Python effectively.
๐ง What Are Broken Links?
A broken link (also called a dead link) is a hyperlink that leads to a page that doesn’t exist or returns an error such as:
404 Not Found
500 Internal Server Error
Timeout
DNS failures
These issues usually arise due to deleted pages, incorrect URLs, or expired domains.
๐ง Tools Required
To detect broken links using Selenium Python, you’ll need the following:
Selenium – For web automation
requests – For sending HTTP requests and checking response codes
Install the necessary packages:
bash
pip install selenium requests
Set up your Selenium WebDriver:
python
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://example.com")
๐ Step-by-Step: Detecting Broken Links
1. Collect All Links
Use Selenium to find all <a> tags (hyperlinks) on the page:
python
links = driver.find_elements("tag name", "a")
urls = [link.get_attribute("href") for link in links if link.get_attribute("href") is not None]
This ensures you collect all valid URLs from anchor tags.
2. Filter and Validate URLs
Use the requests library to check the status code of each URL:
python
import requests
for url in urls:
try:
response = requests.head(url, allow_redirects=True, timeout=5)
if response.status_code >= 400:
print(f"[BROKEN] {url} returned status code: {response.status_code}")
else:
print(f"[OK] {url} returned status code: {response.status_code}")
except requests.exceptions.RequestException as e:
print(f"[ERROR] {url} failed due to: {e}")
requests.head() is faster than requests.get() as it fetches only headers.
Use allow_redirects=True to follow redirects (e.g., 301, 302).
Handle exceptions for timeouts or network issues.
✅ Best Practices
Avoid External URLs (Optional)
You might want to skip third-party links to focus only on internal site URLs:
python
base_url = "https://example.com"
internal_urls = [url for url in urls if base_url in url]
Ignore Empty and JavaScript Links
Filter out anchors like #, javascript:void(0), etc.
Multi-threading for Speed
Use the concurrent.futures module to check multiple URLs in parallel for faster execution.
๐ Sample Output
csharp
[OK] https://example.com/about returned status code: 200
[BROKEN] https://example.com/contact-us returned status code: 404
[ERROR] https://invalidlink.com failed due to: ConnectionError
๐งช Integration with Test Cases
You can turn this into an automated test that fails if broken links are found:
python
assert all(response.status_code < 400 for response in responses), "Broken links detected!"
This is especially useful in CI pipelines to prevent deploying broken links to production.
๐ Conclusion
Handling broken links using Selenium Python is a crucial step toward ensuring web application quality. By leveraging Selenium to extract links and requests to validate them, you can automate the detection of dead links efficiently. This practice not only enhances user experience but also boosts SEO and brand trust. Add it to your automation suite and keep your websites healthy and error-free!
Learn Selenium with Pyhton Training Hyderabad
Read More: Capturing Console Logs using Selenium Python
Read More: Debugging Selenium Python Scripts
Read More: Advanced XPath Techniques in Selenium Python
Visit IHUB Talent Institute Hyderabad
Get Direction
Comments
Post a Comment