This post I will publish in the API section of my blog. However, it is about the situation when there application has no API. Let’s suppose that we have to use in our work some graphical application or web-service. And unfortunately we need constantly repeat some very routine and annoying operations in it. This often happens if the application developers have not thought enough about the real-life cases their end-users will deal with. What can we do in such scenario?
- First of all, look maybe there is an open and documented API
- If there is no API, and it is an installed application, maybe you can use it in a console mode
- If it is a web-service, maybe you can figure out how it works and how to automate it using tools like Firebug
But sometimes it is impossible to do anything at all. And it is even more sad, if this routine task is really elementary and you can easily explain the logic: what menu to choose, what button to push, where you should enter text and so on.
At this point, you just spit on all and use your last resort – SikuliX.
With this tool, you can automate everything. It doesn’t matter if it is a web-service or a GUI application, what operating system it uses and so on. That’s all because SikuliX is working at the highest level. In fact, it just makes screenshots, analyses them as images, trying to find graphical elements that it should somehow use.
Sikuli is God’s Eye. In Huichol Indian culture: the power to see and understand things unknown.
Sikuli was started somewhen in 2009 as an open-source research project at the User Interface Design Group at MIT by Tsung-Hsiang Chang and Tom Yeh. Both left the project at Sikuli-X-1.0rc3 during 2012, when RaiMan decided to take over development and support and name it SikuliX.
In the SikiliX script you can also specify the actions that you want to perform with a founded graphical element: click on it, input some text, wait until appear on the screen or disappear. This is a visual programming in all senses. =) To add elements we just make a screenshots. It is actually very easy and does not require much effort. SikiliX also uses a very popular syntax of Python language. So it is very easy to learn.
How to install SikiliX
Official project site is sikulix.com. You can get stable version Sikuli 1.1.0 “SikuliX” on the Launchpad. Download sikulixsetup-1.1.0.jar and run it.
If some dependences will be missed, you will see errors in the SikuliX-1.1.0-SetupLog.txt
I installed sikuli in Ubuntu and I need this additional packages:
sudo apt-get install tesseract-ocr wmctrl xdotool
Of course you should also have java:
sudo apt-get install default-jdk default-jre
If everything is fine, you will see a startup sh-script runsikulix
The main screen of the application:
On the left we can select the most common operations (there are much more, read docs). On the right is the editor area, and under it you can see the output area. Everything is like in any IDE.
Example
Here is an example of a very routine operation which I often drives you crazy. Gmail is a very convenient email service, but for some reason it is impossible to forward multiple emails at once there.
Ok, Google, why can’t we, end-users select multiple messages, and choose “Forward” the menu? The same way how it works in local e-mail clients such as Outlook.
In fact there is no such option. Therefore, in order to forward a lot of emails, you need to do it manually or use third-party browser plug-ins, and it is not clear if it is safe to use them. You can also make it with SikuliX.
What my SikuliX script does:
- Find all the check boxes for emails
- Shift coordinate
- Click on the subject line
- Wait for everything to load:
- Select the option Forward in the list of options
- Scroll down the forward message editor:
- Add email address
- Wait until the message is sent successfully
- Click on the Back button and process the next letter
As you can see, you can define a function like in python and iterate the same way you do it in Python:
For each image in code (pattern) you can check how it will matche elements on the screen and configure desirable accuracy. You can also specify the target offset if it is needed:
In conclusion
It should be noted that this method certainly has its disadvantages. If graphic design of the application will be changed, you will have to adjust the script. As you can see this is not so hard. For further automation reasons you can run scripts not only in the IDE, but from console as well.
Quite an interesting topic is how to use in SikuliX application security testing (AST). It is likely that sometimes it will be easier to detect the input areas and provide the attacking input in this simple graphical mode than using the other methods based on application code analysis and some formed queries.
Hi! My name is Alexander and I am a Vulnerability Management specialist. You can read more about me here. Currently, the best way to follow me is my Telegram channel @avleonovcom. I update it more often than this site. If you haven’t used Telegram yet, give it a try. It’s great. You can discuss my posts or ask questions at @avleonovchat.
А всех русскоязычных я приглашаю в ещё один телеграмм канал @avleonovrus, первым делом теперь пишу туда.
Pingback: Selenium, SikuliX and Facebook posting | Alexander V. Leonov
Pingback: Automating Opera browser with Selenium WebDriver and Python | Alexander V. Leonov
Pingback: New Nessus 7 Professional and the end of cost-effective Vulnerability Management (as we knew it) | Alexander V. Leonov