PowerShell & Selenium: Automate Web Browser Interactions – Part I

Automation Creativity FTW!

There comes a time when API's or support from web app owners are just  a no-go, when this happens, browser automation tactics come in handy! Selenium and PowerShell complement each other very well. At first glance this match may not sound as fancy as other marketed automation tools, but oh boy, it works and works very well, no need to spend money or run any 3rd party installers.

Selenium is primarily known for automated web app testing, but why limit ourselves? It's a fantastic tool to manage production boring -errr, I mean, repetitive web-based tasks, and by taking advantage of PowerShell's .NET nature we can easily assimilate it into a script!

Depiction of PowerShell assimilating Selenium's powers…(Kirby is Awesome!)

Here in Part I, we'll cover the basics just to get familiar: Write a script that performs a Google search with Chrome. The purpose is not to run a google search, there are simpler and better ways to perform this, the purpose is to tap into Selenium's basics to interact with websites. On Part II  and Part III we'll increase the complexity ("headless" mode and login page automation).

To provide some insight of the possibilities this offers: Once I had to create a tool that logged into an intranet website to pull data from a table, then from that table parse around 100 variables to create hyperlinks and extract data from yet another table from each. Scalabilty requirements made it manually impossible, the table got updated dynamically and this had to be done weekly. No API, my attempts with curl , wget and Invoke-WebRequest failed miserably.  Finally when I was getting somewhere by controlling Internet Explorer via IE COM Object, I got stuck trying to manipulate the textbox elements on the website. According to my google-fu, this COM problem was related to a bug with IE 11 (probably).

Frustrated a bit…? Nah…

Just when I was about to give up hope I found about Selenium and came across Newspaint's blogpost. PowerShell and Selenium provided everything I needed for the job.

Procuring ingredients for our Awesome Automation Ale

My suggestion is to get Chrome updated and also the latest of everything, currently I have the following versions (as of November 2018).

PowerShell: 5.1.14409.1012 (anything above version 2 is probably fine)
Google Chrome : 70.0.3538.77
ChromeDriver: 2.43 (Supports Chrome v69-71)
Selenium Webdriver (.NET Framework v4.5): 3.14.0
Selenium Webdriver Support (.NET Framework v4.5): 3.14.0

This method may work with different/older versions, but I wanted to share what I am using just in case you get stuck with compatibility. If you are using anything above Windows 7, PowerShell is not a concern, for the rest:

  1. Update Chrome or download the latest version here
  2. Get Chromedriver from here
  3. Get the Selenium Webdriver dll and Support dll from here

For your convenience, I've gathered the dll's and chromedriver versions I'm using  inside a zip folder, feel free to skip the above steps and download the zip from here:

DISCLAMER: It's everyone's responsibility to scan any files downloaded from the internet with your AV of choice.

Click for download

Aside from Chrome and PowerShell installed in your machine , you should have the following items now:

A few extra spices!

That's "YO HO BEER" Chili Powder, yo

The url we'll use is: https://www.google.com/In order to tell Selenium what to do, we need to find some kind of tag to identify the elements we'll interact with, in this case the elements we will work with are the Search Bar and the Search Button.

So, for this exercise let's go grab the "search bar" input textbox and the google "search button" element names.

Getting the search textbox's element name can be achieved easily with Chrome, here is how:

  1. Open the site in Chrome and access the developer tools

    Click to "embiggen"
  2. Click on the specified icon to inspect website elements

    Click to "embiggen"
  3. Click on the search input textbox element and copy its name (that'd be "q")

    Click to "embiggen"
  4. Click on the search button element and copy its name (that'd be "btnK")

You can close Chrome now, we have all the ingredients we need.

Let's cook!

Put on some comfy clothes, we are about to get started.

For the sake of this tutorial I'll create a folder named "PSL" in C:\temp (C:\Temp\PSL\). Place chromedriver.exe, Webdriver.dll and Webdriver.support.dll inside it. You'll be saving your PowerShell script here too. Should look like this:

Now open up PowerShell ISE or your IDE of choice, we'll start our script by defining the website where we want to automate our access to, as mentioned earlier I will use Google's search page url and store it in a variable for later use and easy edit in to reuse this script with a different site.

$YourURL = "https://www.google.com/"

Now, we'll be adding the file's directory into the environment path, defining Selenium's assembly .NET library within this PowerShell session, and finally create an instance of Selenium's .NET class by using the New-Object cmdlet to store it into a variable. …If you are new to this it sounds more complicated than it is, they've made it it really easy for us! Here's how it goes:

$env:PATH += ";C:\Temp\PSL\" # Adds the path for ChromeDriver.exe to the environmental variable 
Add-Type -Path "C:\Temp\PSL\WebDriver.dll" # Adding Selenium's .NET assembly (dll) to access it's classes in this PowerShell session
$ChromeDriver = New-Object OpenQA.Selenium.Chrome.ChromeDriver # Creates an instance of this class to control Selenium and stores it in an easy to handle variable

That takes care of "assimilating" Selenium into our script.

Make sure you save it as a .ps1 file under "C:\Temp\PSL\", I will name mine "psl.ps1" . Your script should look like this so far:

$YourURL = "https://www.google.com/" # Website we'll log to
# Invoke Selenium into our script!
$env:PATH += ";C:\Temp\PSL\" # Adds the path for ChromeDriver.exe to the environmental variable 
Add-Type -Path "C:\Temp\PSL\WebDriver.dll" # Adding Selenium's .NET assembly (dll) to access it's classes in this PowerShell session
$ChromeDriver = New-Object OpenQA.Selenium.Chrome.ChromeDriver # Creates an instance of this class to control Selenium and stores it in an easy to handle variable


With that out of the way, we can start tampering into Selenium's methods. Broadly speaking, methods are actions, like run, jump, walk and the such. Our methods here will be navigate, gotourl, sendkeys and submit. We can access these methods from the object we created contained within the $ChromeDriver variable:

$ChromeDriver.Navigate().GoToURL($YourURL) # Browse to the specified website

Now we can start doing some testing, if you run it it should take you straight to the google search page. This is what we have so far:

$YourURL = "https://www.google.com/" # Website we'll log to

# Invoke Selenium into our script!
$env:PATH += ";C:\Temp\PSL\" # Adds the path for ChromeDriver.exe to the environmental variable 
Add-Type -Path "C:\Temp\PSL\WebDriver.dll" # Adding Selenium's .NET assembly (dll) to access it's classes in this PowerShell session
$ChromeDriver = New-Object OpenQA.Selenium.Chrome.ChromeDriver # Creates an instance of this class to control Selenium and stores it in an easy to handle variable

$ChromeDriver.Navigate().GoToURL($YourURL) # Browse to the specified website

Let's take it for a test drive:

Click to "embiggen"

Ok, so honestly the above could have been achieved in a script simply by running "start chrome www.google.com", but this is just the beginning, from now on it gets more and more interesting.

Remember when we looked for the search bar and search button elements? We will use them now:

$ChromeDriver.FindElementByName("q").SendKeys("mavericksevmont tech blog") # Methods to find the input textbox for google search and then to type something in it
$ChromeDriver.FindElementByName("btnK").Submit() # Method to submit request to the button

Here is what we are telling it to do:
1: Find the search input textbox "q" and type something
2: Find the search button and submit our request

I'll also add a few extra lines at the end just to pause the script before we close Chrome and to remove the chromedriver instances, it's a good habit to clean up after ourselves!

# Cleaning up after ourselves!
Pause
Function Stop-ChromeDriver {Get-Process -Name chromedriver -ErrorAction SilentlyContinue | Stop-Process -ErrorAction SilentlyContinue}
$ChromeDriver.Close() # Close selenium browser session method
$ChromeDriver.Quit() # End ChromeDriver process method
Stop-ChromeDriver # Function to make double sure the Chromedriver process is finito (double-tap!)

Our code so far should look like this:

# Website and credential variables
$YourURL = "https://www.google.com" # Website we'll access

# Invoke Selenium into our script!
$env:PATH += ";C:\Temp\PSL\" # Adds the path for ChromeDriver.exe to the environmental variable 
Add-Type -Path "C:\Temp\PSL\WebDriver.dll" # Adding Selenium's .NET assembly (dll) to access it's classes in this PowerShell session
$ChromeDriver = New-Object OpenQA.Selenium.Chrome.ChromeDriver # Creates an instance of this class to control Selenium and stores it in an easy to handle variable

# Make use of Selenium's class methods to manage our browser at will
$ChromeDriver.Navigate().GoToURL($YourURL) # Browse to the specified website
$ChromeDriver.FindElementByName("q").SendKeys("mavericksevmont tech blog") # Methods to find the input textbox for google search and then to type something in it
$ChromeDriver.FindElementByName("btnK").Submit() # Method to submit request to the button

# Cleaning up after ourselves!
Pause
Function Stop-ChromeDriver {Get-Process -Name chromedriver -ErrorAction SilentlyContinue | Stop-Process -ErrorAction SilentlyContinue}
$ChromeDriver.Close() # Close selenium browser session method
$ChromeDriver.Quit() # End ChromeDriver process method
Stop-ChromeDriver # Function to make double sure the Chromedriver process is finito (double-tap!)And in practice, here is what it does:

And here is how it looks in action, working on its own:

Click to "embiggen"

Not too shabby huh? Remember the purpose of this was not to google something, but to start tapping into the potential that web browser automation has to offer, you can make your script click on stuff around, extract page source data, download, upload, update, send messages and pretty much whatever you want or require regarding any web based tool. This tool can effectively bridge any gaps between scripts and RPA tools, API's, HTTP/HTTPS web requests and other UI automation methods.

For now that's it! You are ready to start automating your browser with PowerShell and Selenium.

On Part II we'll look at performing browser automation "headless" or "hidden", this means everything runs in the background while you use your computer freely, minding your own business.

Thank you for reading this post!

Go to:
PowerShell & Selenium: Automate Web Browser Interactions – Part II
PowerShell & Selenium: Automate Web Browser Interactions – Part III

5 Replies to “PowerShell & Selenium: Automate Web Browser Interactions – Part I”

  1. Great Post!!! Just what I was looking for.
    How do I download the latest WebDriver.dll's? I can find them zipped on the seleniumhq site.
    I run the latest Chrome browser 73.x and Chromedriver but the script will not work with the old dll's from your package.
    My PC runs dotnet 4.7.

    I will appreciate your help. Thank you

    1. Hi Lars! Thank you for your comment, I understand you have downloaded the latest Chromedriver, correct? Then to get the latest WebDriver and Support dll files, you can find the latest here: https://www.seleniumhq.org/download/ (look for C#) or download version 3.14.0 directly from here: https://goo.gl/uJJ5Sc . Inside the ".zip" you will find ".nupkg" files, those are glorified zip files, just change the extension to ".zip" and you will be able to access the contents. What you are looking for is:

      Selenium.Support.3.14.0.zip\lib\net45\WebDriver.Support.dll
      Selenium.WebDriver.3.14.0.zip\lib\net45\WebDriver.dll

      Also, make sure you right-click->Security->Unblock the zip folders you download, sometimes Windows blocks them and you won't know, they will only fail. I think the 4.5 .NET versions should work, but if not, there's older versions available in the same lib folder within the zip downloads, either way if you keep having problems, you may install older .NET versions, you can stack them and have different versions installed at the same time, before that, check your .NET versions currently installed in cmd:

      reg query "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP"
      reg query "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\full" /v version

      Feel free to reach out if you are still having any problems.

        1. Glad to hear that! Thank you for your comment, it's always a pleasure to hear I'm not the only one reading my own posts haha. Just checked out your blog, great posts. For anyone else reading this comment, be sure to check out Mads' blog: https://ifconfig.dk/

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.