Using Sikuli with Selenium Java for Image-Based Automation

June 30, 2025

Automating web applications with Selenium is now a standard in testing. However, Selenium has its limitations—especially when it comes to interacting with non-HTML elements like images, flash components, custom dialogs, or legacy Java applets. That’s where Sikuli comes into play. Sikuli is an image recognition automation tool that lets you automate anything you see on the screen using screenshots.

In this blog, we’ll explore how Sikuli integrates with Selenium in Java to create a powerful image-based automation framework.

What is Sikuli?

Sikuli is an open-source visual automation tool that uses image recognition to automate graphical user interfaces (GUIs). Instead of locating web elements using IDs or XPath (as in Selenium), Sikuli identifies elements based on screenshots.

You provide an image of a button or text field, and Sikuli searches for that image on the screen and interacts with it—just like a human would.

Why Combine Sikuli with Selenium?

While Selenium is excellent for automating browser-based UI elements, it cannot handle:

Captcha-based elements

Native OS file dialogs (like file upload/download windows)

Flash or Silverlight components

Images, charts, and non-DOM elements

Sikuli fills this gap by interacting with screen elements based on visual cues. When combined, Selenium handles DOM-based automation and Sikuli handles image-based operations.

Setting Up Sikuli with Selenium in Java

Step 1: Add Sikuli to Your Project

Download the SikuliX API from https://raiman.github.io/SikuliX1/ and include the JAR files in your Java project. If you’re using Maven, you might need to manually add the dependency.

Step 2: Import Packages

java

import org.openqa.selenium.WebDriver;

import org.openqa.selenium.chrome.ChromeDriver;

import org.sikuli.script.Pattern;

import org.sikuli.script.Screen;

Step 3: Write a Test Using Selenium + Sikuli

java

public class SeleniumSikuliTest {

public static void main(String[] args) throws Exception {

System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");

WebDriver driver = new ChromeDriver();

driver.get("https://example.com/upload");

// Trigger file upload button (Selenium)

driver.findElement(By.id("uploadBtn")).click();

// Wait for native dialog and handle it with Sikuli

Screen screen = new Screen();

Pattern fileInput = new Pattern("C:\\images\\file_input.png");

Pattern openButton = new Pattern("C:\\images\\open_button.png");

screen.wait(fileInput, 10);

screen.type(fileInput, "C:\\path\\to\\file.txt");

screen.click(openButton);

}

In this example, Selenium clicks the upload button, and Sikuli interacts with the OS-level file dialog using stored screenshots.

Best Practices

Use high-resolution, clear screenshots for accurate detection.

Store images in a separate folder and name them descriptively.

Combine wait() with timeouts to prevent unnecessary test failures.

Use Sikuli only when Selenium cannot handle the task.

Limitations

Sikuli scripts may break with screen resolution or UI changes.

It’s dependent on the foreground screen—multi-tasking during execution may interfere.

Slower than DOM-based interactions.

Conclusion

By combining Selenium and Sikuli, testers can automate both web-based and GUI-based components seamlessly. This hybrid approach is especially valuable for applications involving native OS interactions or legacy systems. With proper setup and clear image patterns, Sikuli adds an extra layer of power to your automation toolkit.

Learn Selenium with Java Training

Visit Our IHUB Talent Institute Hyderabad
Get Direction

Search This Blog

IHUB Talent Training Institute