SikuliX: the last chance for automation

This post I will publish in the API section of my blog. However, it is about the situation when there application has no API. Let’s suppose that we have to use in our work some graphical application or web-service. And unfortunately we need constantly repeat some very routine and annoying operations in it. This often happens if the application developers have not thought enough about the real-life cases their end-users will deal with. What can we do in such scenario?

  • First of all, look maybe there is an open and documented API
  • If there is no API, and it is an installed application, maybe you can use it in a console mode
  • If it is a web-service, maybe you can figure out how it works and how to automate it using tools like Firebug

But sometimes it is impossible to do anything at all. And it is even more sad, if this routine task is really elementary and you can easily explain the logic: what menu to choose, what button to push, where you should enter text and so on.

At this point, you just spit on all and use your last resort – SikuliX.

SikuliX Script window

With this tool, you can automate everything. It doesn’t matter if it is a web-service or a GUI application, what operating system it uses and so on. That’s all because SikuliX is working at the highest level. In fact, it just makes screenshots, analyses them as images, trying to find graphical elements that it should somehow use.

Sikuli is God’s Eye. In Huichol Indian culture: the power to see and understand things unknown.

Sikuli was started somewhen in 2009 as an open-source research project at the User Interface Design Group at MIT by Tsung-Hsiang Chang and Tom Yeh. Both left the project at Sikuli-X-1.0rc3 during 2012, when RaiMan decided to take over development and support and name it SikuliX.

In the SikiliX script you can also specify the actions that you want to perform with a founded graphical element: click on it, input some text, wait until appear on the screen or disappear. This is a visual programming in all senses. =) To add elements we just make a screenshots. It is actually very easy and does not require much effort. SikiliX also uses a very popular syntax of Python language. So it is very easy to learn.

How to install SikiliX

Official project site is You can get stable version Sikuli 1.1.0 “SikuliX” on the Launchpad. Download sikulixsetup-1.1.0.jar and run it.

Sikulix setup jar file

If some dependences will be missed, you will see errors in the SikuliX-1.1.0-SetupLog.txt

I installed sikuli in Ubuntu and I need this additional packages:

sudo apt-get install tesseract-ocr wmctrl xdotool

Of course you should also have java:

sudo apt-get install default-jdk default-jre

If everything is fine, you will see a startup sh-script runsikulix

SikuliX files after successful installation

The main screen of the application:

SikuliX clean window

On the left we can select the most common operations (there are much more, read docs). On the right is the editor area, and under it you can see the output area. Everything is like in any IDE.


Here is an example of a very routine operation which I often drives you crazy. Gmail is a very convenient email service, but for some reason it is impossible to forward multiple emails at once there.

gmail interface

Ok, Google, why can’t we, end-users select multiple messages, and choose “Forward” the menu? The same way how it works in local e-mail clients such as Outlook.

In fact there is no such option. Therefore, in order to forward a lot of emails, you need to do it manually or use third-party browser plug-ins, and it is not clear if it is safe to use them. You can also make it with SikuliX.

What my SikuliX script does:

SikuliX script example 1

  • Find all the check boxes for emails
  • Shift coordinate
  • Click on the subject line
  • Wait for everything to load: SikuliX script Example Gmail Forward
  • Select the option Forward in the list of options

SikuliX script example 2

  • Scroll down the forward message editor: SikuliX script Gmail Forward message editor
  • Add email address
  • Wait until the message is sent successfully
  • Click on the Back button and process the next letter

As you can see, you can define a function like in python and iterate the same way you do it in Python:

SikuliX script function

For each image in code (pattern) you can check how it will matche elements on the screen and configure desirable accuracy. You can also specify the target offset if it is needed:

SikuliX matching preview

In conclusion

It should be noted that this method certainly has its disadvantages. If graphic design of the application will be changed, you will have to adjust the script. As you can see this is not so hard. For further automation reasons you can run scripts not only in the IDE, but from console as well.

Quite an interesting topic is how to use in SikuliX application security testing (AST). It is likely that sometimes it will be easier to detect the input areas and provide the attacking input in this simple graphical mode than using the other methods based on application code analysis and some formed queries.

3 thoughts on “SikuliX: the last chance for automation

  1. Pingback: Selenium, SikuliX and Facebook posting | Alexander V. Leonov

  2. Pingback: Automating Opera browser with Selenium WebDriver and Python | Alexander V. Leonov

  3. Pingback: New Nessus 7 Professional and the end of cost-effective Vulnerability Management (as we knew it) | Alexander V. Leonov

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.