PowerShell & Selenium: Automate Web Browser Interactions – Part I

Automation Creativity FTW!

There comes a time when API's or support from web app owners are just  a no-go, when this happens, browser automation tactics come in handy! Selenium and PowerShell complement each other very well. At first glance this match may not sound as fancy as other marketed automation tools, but oh boy, it works and works very well, no need to spend money or run any 3rd party installers.

Selenium is primarily known for automated web app testing, but why limit ourselves? It's a fantastic tool to manage production boring -errr, I mean, repetitive web-based tasks, and by taking advantage of PowerShell's .NET nature we can easily assimilate it into a script!

Depiction of PowerShell assimilating Selenium's powers…(Kirby is Awesome!)

Here in Part I, we'll cover the basics just to get familiar: Write a script that performs a Google search with Chrome. The purpose is not to run a google search, there are simpler and better ways to perform this, the purpose is to tap into Selenium's basics to interact with websites. On Part II  and Part III we'll increase the complexity ("headless" mode and login page automation).

To provide some insight of the possibilities this offers: Once I had to create a tool that logged into an intranet website to pull data from a table, then from that table parse around 100 variables to create hyperlinks and extract data from yet another table from each. Scalabilty requirements made it manually impossible, the table got updated dynamically and this had to be done weekly. No API, my attempts with curl , wget and Invoke-WebRequest failed miserably.  Finally when I was getting somewhere by controlling Internet Explorer via IE COM Object, I got stuck trying to manipulate the textbox elements on the website. According to my google-fu, this COM problem was related to a bug with IE 11 (probably).

Frustrated a bit…? Nah…

Just when I was about to give up hope I found about Selenium and came across Newspaint's blogpost. PowerShell and Selenium provided everything I needed for the job.

Procuring ingredients for our Awesome Automation Ale

My suggestion is to get Chrome updated and also the latest of everything, currently I have the following versions (as of November 2018).

PowerShell: 5.1.14409.1012 (anything above version 2 is probably fine)
Google Chrome : 70.0.3538.77
ChromeDriver: 2.43 (Supports Chrome v69-71)
Selenium Webdriver (.NET Framework v4.5): 3.14.0
Selenium Webdriver Support (.NET Framework v4.5): 3.14.0

This method may work with different/older versions, but I wanted to share what I am using just in case you get stuck with compatibility. If you are using anything above Windows 7, PowerShell is not a concern, for the rest:

  1. Update Chrome or download the latest version here
  2. Get Chromedriver from here
  3. Get the Selenium Webdriver dll and Support dll from here

For your convenience, I've gathered the dll's and chromedriver versions I'm using  inside a zip folder, feel free to skip the above steps and download the zip from here:

DISCLAMER: It's everyone's responsibility to scan any files downloaded from the internet with your AV of choice.

Click for download

Aside from Chrome and PowerShell installed in your machine , you should have the following items now:

A few extra spices!

That's "YO HO BEER" Chili Powder, yo

The url we'll use is: https://www.google.com/In order to tell Selenium what to do, we need to find some kind of tag to identify the elements we'll interact with, in this case the elements we will work with are the Search Bar and the Search Button.

So, for this exercise let's go grab the "search bar" input textbox and the google "search button" element names.

Getting the search textbox's element name can be achieved easily with Chrome, here is how:

  1. Open the site in Chrome and access the developer tools

    Click to "embiggen"
  2. Click on the specified icon to inspect website elements

    Click to "embiggen"
  3. Click on the search input textbox element and copy its name (that'd be "q")

    Click to "embiggen"
  4. Click on the search button element and copy its name (that'd be "btnK")

You can close Chrome now, we have all the ingredients we need.

Let's cook!

Put on some comfy clothes, we are about to get started.

For the sake of this tutorial I'll create a folder named "PSL" in C:\temp (C:\Temp\PSL\). Place chromedriver.exe, Webdriver.dll and Webdriver.support.dll inside it. You'll be saving your PowerShell script here too. Should look like this:

Now open up PowerShell ISE or your IDE of choice, we'll start our script by defining the website where we want to automate our access to, as mentioned earlier I will use Google's search page url and store it in a variable for later use and easy edit in to reuse this script with a different site.

$YourURL = "https://www.google.com/"

Now, we'll be adding the file's directory into the environment path, defining Selenium's assembly .NET library within this PowerShell session, and finally create an instance of Selenium's .NET class by using the New-Object cmdlet to store it into a variable. …If you are new to this it sounds more complicated than it is, they've made it it really easy for us! Here's how it goes:

$env:PATH += ";C:\Temp\PSL\" # Adds the path for ChromeDriver.exe to the environmental variable 
Add-Type -Path "C:\Temp\PSL\WebDriver.dll" # Adding Selenium's .NET assembly (dll) to access it's classes in this PowerShell session
$ChromeDriver = New-Object OpenQA.Selenium.Chrome.ChromeDriver # Creates an instance of this class to control Selenium and stores it in an easy to handle variable

That takes care of "assimilating" Selenium into our script.

Make sure you save it as a .ps1 file under "C:\Temp\PSL\", I will name mine "psl.ps1" . Your script should look like this so far:

$YourURL = "https://www.google.com/" # Website we'll log to
# Invoke Selenium into our script!
$env:PATH += ";C:\Temp\PSL\" # Adds the path for ChromeDriver.exe to the environmental variable 
Add-Type -Path "C:\Temp\PSL\WebDriver.dll" # Adding Selenium's .NET assembly (dll) to access it's classes in this PowerShell session
$ChromeDriver = New-Object OpenQA.Selenium.Chrome.ChromeDriver # Creates an instance of this class to control Selenium and stores it in an easy to handle variable


With that out of the way, we can start tampering into Selenium's methods. Broadly speaking, methods are actions, like run, jump, walk and the such. Our methods here will be navigate, gotourl, sendkeys and submit. We can access these methods from the object we created contained within the $ChromeDriver variable:

$ChromeDriver.Navigate().GoToURL($YourURL) # Browse to the specified website

Now we can start doing some testing, if you run it it should take you straight to the google search page. This is what we have so far:

$YourURL = "https://www.google.com/" # Website we'll log to

# Invoke Selenium into our script!
$env:PATH += ";C:\Temp\PSL\" # Adds the path for ChromeDriver.exe to the environmental variable 
Add-Type -Path "C:\Temp\PSL\WebDriver.dll" # Adding Selenium's .NET assembly (dll) to access it's classes in this PowerShell session
$ChromeDriver = New-Object OpenQA.Selenium.Chrome.ChromeDriver # Creates an instance of this class to control Selenium and stores it in an easy to handle variable

$ChromeDriver.Navigate().GoToURL($YourURL) # Browse to the specified website

Let's take it for a test drive:

Click to "embiggen"

Ok, so honestly the above could have been achieved in a script simply by running "start chrome www.google.com", but this is just the beginning, from now on it gets more and more interesting.

Remember when we looked for the search bar and search button elements? We will use them now:

$ChromeDriver.FindElementByName("q").SendKeys("mavericksevmont tech blog") # Methods to find the input textbox for google search and then to type something in it
$ChromeDriver.FindElementByName("btnK").Submit() # Method to submit request to the button

Here is what we are telling it to do:
1: Find the search input textbox "q" and type something
2: Find the search button and submit our request

I'll also add a few extra lines at the end just to pause the script before we close Chrome and to remove the chromedriver instances, it's a good habit to clean up after ourselves!

# Cleaning up after ourselves!
Pause
Function Stop-ChromeDriver {Get-Process -Name chromedriver -ErrorAction SilentlyContinue | Stop-Process -ErrorAction SilentlyContinue}
$ChromeDriver.Close() # Close selenium browser session method
$ChromeDriver.Quit() # End ChromeDriver process method
Stop-ChromeDriver # Function to make double sure the Chromedriver process is finito (double-tap!)

Our code so far should look like this:

# Website and credential variables
$YourURL = "https://www.google.com" # Website we'll access

# Invoke Selenium into our script!
$env:PATH += ";C:\Temp\PSL\" # Adds the path for ChromeDriver.exe to the environmental variable 
Add-Type -Path "C:\Temp\PSL\WebDriver.dll" # Adding Selenium's .NET assembly (dll) to access it's classes in this PowerShell session
$ChromeDriver = New-Object OpenQA.Selenium.Chrome.ChromeDriver # Creates an instance of this class to control Selenium and stores it in an easy to handle variable

# Make use of Selenium's class methods to manage our browser at will
$ChromeDriver.Navigate().GoToURL($YourURL) # Browse to the specified website
$ChromeDriver.FindElementByName("q").SendKeys("mavericksevmont tech blog") # Methods to find the input textbox for google search and then to type something in it
$ChromeDriver.FindElementByName("btnK").Submit() # Method to submit request to the button

# Cleaning up after ourselves!
Pause
Function Stop-ChromeDriver {Get-Process -Name chromedriver -ErrorAction SilentlyContinue | Stop-Process -ErrorAction SilentlyContinue}
$ChromeDriver.Close() # Close selenium browser session method
$ChromeDriver.Quit() # End ChromeDriver process method
Stop-ChromeDriver # Function to make double sure the Chromedriver process is finito (double-tap!)And in practice, here is what it does:

And here is how it looks in action, working on its own:

Click to "embiggen"

Not too shabby huh? Remember the purpose of this was not to google something, but to start tapping into the potential that web browser automation has to offer, you can make your script click on stuff around, extract page source data, download, upload, update, send messages and pretty much whatever you want or require regarding any web based tool. This tool can effectively bridge any gaps between scripts and RPA tools, API's, HTTP/HTTPS web requests and other UI automation methods.

For now that's it! You are ready to start automating your browser with PowerShell and Selenium.

On Part II we'll look at performing browser automation "headless" or "hidden", this means everything runs in the background while you use your computer freely, minding your own business.

Thank you for reading this post!

Go to:
PowerShell & Selenium: Automate Web Browser Interactions – Part II
PowerShell & Selenium: Automate Web Browser Interactions – Part III

38 Replies to “PowerShell & Selenium: Automate Web Browser Interactions – Part I”

  1. Great Post!!! Just what I was looking for.
    How do I download the latest WebDriver.dll's? I can find them zipped on the seleniumhq site.
    I run the latest Chrome browser 73.x and Chromedriver but the script will not work with the old dll's from your package.
    My PC runs dotnet 4.7.

    I will appreciate your help. Thank you

    1. Hi Lars! Thank you for your comment, I understand you have downloaded the latest Chromedriver, correct? Then to get the latest WebDriver and Support dll files, you can find the latest here: https://www.seleniumhq.org/download/ (look for C#) or download version 3.14.0 directly from here: https://goo.gl/uJJ5Sc . Inside the ".zip" you will find ".nupkg" files, those are glorified zip files, just change the extension to ".zip" and you will be able to access the contents. What you are looking for is:

      Selenium.Support.3.14.0.zip\lib\net45\WebDriver.Support.dll
      Selenium.WebDriver.3.14.0.zip\lib\net45\WebDriver.dll

      Also, make sure you right-click->Security->Unblock the zip folders you download, sometimes Windows blocks them and you won't know, they will only fail. I think the 4.5 .NET versions should work, but if not, there's older versions available in the same lib folder within the zip downloads, either way if you keep having problems, you may install older .NET versions, you can stack them and have different versions installed at the same time, before that, check your .NET versions currently installed in cmd:

      reg query "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP"
      reg query "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\full" /v version

      Feel free to reach out if you are still having any problems.

        1. Glad to hear that! Thank you for your comment, it's always a pleasure to hear I'm not the only one reading my own posts haha. Just checked out your blog, great posts. For anyone else reading this comment, be sure to check out Mads' blog: https://ifconfig.dk/

  2. Awesooooome !!! – you described exactly a situation I was facing too .

    Man , you deserve a medal for making this post and clearly describing each step , you made everything so easy.

    Thank you !!!!

      1. Oh nice, that is very useful thank you. hmmm I just tried something like this: getElementsByClassName("svg-icon") and this returns this : Exception calling "FindElementsByClassName" with "1" argument(s): "invalid selector: Compound class names not permitted… in the help , it's saying only one string, do i need other arguments there? Sorry to bother you, but I think you are the only source on this 😀 Thanks for the tutorials again, I love the movie reference gifx :))

        1. Hi, I was unable to replicate the problem getting class names with a hyphen, using "https://www.google.com" as a test site, getting "ctr-p" or "spch-dlg" the method finds the class names just fine.

          However, I get that same exception you talk about when trying to get elements separated by spaces, from the same site, e.g. "hp vasq" or "tsf nj".

          The reason for me getting an exception, is because the class attribute itself is space-separated, meaning the element belongs to 2 different classes and it interprets it as a compound class name.

          In case it helps you, there's a couple of workarounds for dealing with this, one is to search for a partial name. e.g. FindElementsByClassName("vasq") or FindElementsByClassName("tsf") –just make sure it's grabbing the element you want, it might show multiple results if other classes share similar names. Another and more precise option is to use other methods to find the element you are looking for, such as xpath or css selector, e.g.

          FindElementByXPath("//*[@class='hp vasq']")
          FindElementByCssSelector(".hp.vasq")

          If this doesn't help and you can replicate the problem in a site I can access, feel free to share it and I'll have a look.

          It's no bother, glad you like the movie graphics 🙂

          1. Hi, this finally worked 😀 with xpath. Specifying the class or the full path works that way perfectly. Thank you so much for the help. Is the //* there to specify to look all children under the root? Thanks

          2. Glad to hear that!!! 🙂

            You are entering absolute vs relative XPaths and related territory, in this case:

            //* = Select all element nodes in the whole document, regardless of the name, if you wanted to specify, you could do for example:
            //div = Select all elements named "div" in the whole document

            For reference on the construction:

            / = An absolute location path, starting at the root
            . = A relative location path, starting at the context node
            /div = A div root element
            ./div = All div child elements of the current node
            /* = The root element, regardless of name
            ./* or * = All child elements of the context node, regardless of name
            //div = All div elements in a document
            .//div = All div elements at or beneath the context node
            //* = All elements in a document, regardless of name
            .//* = All elements, regardless of name, at or beneath the context node

  3. Thank you so much for the clarification :D. Have you been able to get the mouseMove to work at all? I tried a few combinations and nothing is working. $value = New-Object System.Management.Automation.PSObject -Property @{Value='120, 120'}
    $ChromeDriver.Mouse.MouseMove = $value

    Also , is there an easy way to get process ID of a newly created chromeDriver object. Or create a new object from an already running chromeDriver process 😐
    Thanks!

    1. Whoa! Curious one ain't ya? You are already getting better than me at this, that's cool!!! I bet your school teachers remember you a lot, I was like that too (still am I guess :P).

      To tell the truth, I've never had to use the mouse before with Selenium other than for clicking, I tried playing around with it and I failed miserably, may have to get back to you on that one, if you figure it out before me, let me know please! Any mouse interactions I've had to automate other than clicking, I've done them with AutoIT over the OS itself, I will write an AutoIT series soon :). However, I do now that with Selenium you can simulate mouse over, mouse dragging etc over websites to uncover and test interactions, but since I'm not a Tester I haven't had the need as of yet.

      This is how far I got today, perhaps try playing around with that a little bit, keep in mind you won't actually see your mouse hovering over, as Selenium can't control the OS UI itself, just whatever happens inside your webpage via your browser. Using google.com as the URL:

      $Coordinates = $Chrome.FindElementByName('q').Coordinates
      $Coordinates # This is just to visualize the coordinates
      $Chrome.Mouse.MouseMove($Coordinates)

      Regarding the process, right after you create the process you can run the following:
      (Get-Process -Name chromedriver).Id
      You may have a few, in which it'd be useful to know the creation time to sort them out:
      Get-Process -Name chromedriver | select Name, Id, StartTime

      1. Ah nice, I actually tried all of that too inside autoit before going back to powershell… Autoit is pretty neat, the issue I got with autoit, is that it does use actual mouse movement, which would not work on a server. Also I only need to test link highlighting, without moving the actual mouse. You have the exact idea I had about the get process ID. I was just hoping chromeDriver would have a function that directly grab the instance created. I also used the size of get process and get the extra element after chrome object is out. I am really a beginner at this and thanks alot for your help.

        1. No prob Emily, glad I can help, I'm happy to see I can alleviate some struggle for others, it's definitely been a bumpy ride for me since I started, but it's been worth the headaches. Feel free to reach out if you get stuck anytime, also to share here if you figure anything else out, if you don't have a blog yourself I will document for another blog post and give you credit for your findings.

  4. Great article. I've been trying automate with this but I'm having issues identifying a specific element to click and sendkeys on when i use "byElements" instead of "byElement". How do I specify a second or third one?

    1. Thanks John!

      Not sure if this is what you are looking for, using google.com as an example, this works for me, now, I'd rather use the findelementsbyname etc instead, but using byelement() and byelements() you'd do it like this if say, you were searching for a "name" element (which is a textbox we can sendkeys to):

      $Chrome.FindElement([OpenQA.Selenium.By]::Name('q'))
      $Chrome.FindElements([OpenQA.Selenium.By]::Name('q'))

      $Chrome.FindElement([OpenQA.Selenium.By]::Name('q')).SendKeys('IamTheLaw')
      $Chrome.FindElements([OpenQA.Selenium.By]::Name('q')).SendKeys('AreYouLookinAtMe?')

      If there was more than one with the same element name, id or whatever, lacking a precise method to identify the element you want to interact to perhaps you can try adding [0], [1], [2] etc, for the order e.g.

      $Chrome.FindElements([OpenQA.Selenium.By]::Name('q')[0]).SendKeys('Im all out of bubblegum')

      That'd point to the first item. Increase the number to move along. Does this help?

      1. Yes, that helps. Thanks a lot. Before I was trying Selenium with PowerShell, I was using SeleniumBasic with AutoHotKey. I had a period before something like [1] so the script wasn't accepting it. There are minor differences between the Selenium C# and the version I was using before.

  5. I noticed that there isn't a lot of Selenium/PowerShell resources. About half involve using the Selenium module in PowerShell. I've been able to create a lot of scripts without the module.

    I haven't figured out if I have the option of using get() for the URL instead of navigate. It hasn't worked for me so far. From what I've read, using get() will allow the webpage to load completely before going onto the next line. With navigate, I sometimes have to put in wait times so that the script won't execute the next line. Is there a way to use get()?

    1. Hi John,

      I didn't find a Get() method in C#, the only alternative I found to Navigate().GoToUrl() was this:

      $Chrome.Url = 'https://www.google.com/'

      Let me know if that works for you as expected.

      It seems like .Url has get;set; accessors, "set" lets it go to any url like navigate() does, not sure if this will wait for the page to load though, I don't have a site to test it accurately at the moment. An alternative as you mention would be to set the wait values for page and/or elements:

      #Page loading
      $Chrome.Manage().Timeouts().Pageload = [TimeSpan]::FromSeconds(180)

      #Element loading
      $Chrome.Manage().Timeouts().ImplicitWait = [TimeSpan]::FromSeconds(180)

      I think I'll be expanding the series to at least another post soon to explore a few more intermediate to advanced things we can do with Selenium + PowerShell as I don't see much around either, I usually look for C# docs and Q&A's when I get stuck and try to adapt to PowerShell best of my knowledge.

      Personally I really like Adam Driscoll's Module, but to tell the truth I don't use the cmdlets :P, I prefer using the methods since I'm more familiar with them or maybe because I'm just lazy, at some point I'd like to contribute to the module. So, for now what I do is just import the module and use the cmdlet to start-sechrome and save it in a variable, that saves a chunk of code and maintenance, from then onwards I drive it my way.

      1. Thanks. I'm sorry I didn't get back to you sooner. I did not notice any difference between get and navigate. I'm trying to make configurations on a web GUI for a Cisco product and there are popups that come up as I click on buttons so maybe that complicates things. Also, thanks for the code for wait time. In some cases, that worked better than the PowerShell "start-sleep -s 5".

  6. Hello,

    Thanks for your blog, I try on my side but it's not working ( anymore ? )

    I have this error : Exception calling "Submit" with "0" argument(s): "element not interactable (Session info: chrome=77.0.3865.90)"
    $ChromeDriver.FindElementByName("btnK").Submit() # Method to submit r …

    Do you have an idea ?
    Regards

    1. Hi Richard, I can't reproduce the error, it's working for me, are you using the latest Chrome and Drivers? Does it throw back the element when using just the following?:

      $ChromeDriver.FindElementByName("btnK")

  7. This is a great post, thank you!
    I am getting this error: Exception calling "Submit" with "0" argument(s): "element not interactable
    Seems to not want to Submit on the btnK element. I've tried the Click() method also, same error.
    Having a Google around, some commentors on StackOverflow say one should pass options to Chrome. How would one do this in the PowerShell example?
    The options are "start-maximized", "disable-infobars" and "–disable-extensions"

    1. Hi Marko, seems weird that I can't repro, you are not the first to report this! Are you are using the latest drivers and chrome version?
      To add arguments:

      $ChromeOptions = New-Object OpenQA.Selenium.Chrome.ChromeOptions # Importing ChromeOptions class
      $ChromeOptions.addArguments('yourargument')
      $ChromeDriver = New-Object OpenQA.Selenium.Chrome.ChromeDriver($ChromeOptions)

      e.g.


      $ChromeOptions = New-Object OpenQA.Selenium.Chrome.ChromeOptions # Importing ChromeOptions class

      $Chrome_Options.AddArguments('headless')
      $Chrome_Options.AddArguments('Incognito')
      $Chrome_Options.AddArguments('start-maximized')
      $Chrome_Options.AddArguments('start-fullscreen')


      $ChromeDriver = New-Object OpenQA.Selenium.Chrome.ChromeDriver($ChromeOptions) # This $ChromeDriver is pulling the arguments.

      Let me know how that works for you. A quick and dirty workaround would be to send the "Enter" key in case you just need to get the hob done and Submit/Click are not working.

      $ChromeDriver.FindElementByName("btnK").SendKeys([OpenQA.Selenium.Keys]::Enter)

      1. A bit of google-fu threw this:
        https://github.com/SeleniumHQ/selenium/issues/4491

        It's also possible you are looking at the main google search page from a different country (e.g. google.co.uk) and that might vary slightly on its components, also there might be some kind of duplicate with the same name that is not interactable or that you might have to give it a few seconds before it becomes interactable, again it's hard for me to tell since I can't repro :/

  8. This is so very helpful! I have automated a bunch of websites already. But, its one button that I've never managed to click!

    What do you need in order to help me out?

    Anyhow, thanks alot for all the things you've learned me already!

        1. Try recording it with the Katalon Recorder Chrome plugin, then use the code button and pick C# or Python, it should give you a good pointer on how to interact with the element, if you get stuck, paste the code here and I can help you translate it to PowerShell

  9. Hi MaverickSevmont,

    This is very useful and worked for me , Just tried this patiently because I want to automatically go to the power BI url and click on … and Export Data in CSV.

    Your explanation is very clear and gif's are funny.

    Thank you

  10. Hi MaverickSevmont,

    Well explained procedure to start with PowerShell & Selenium Automate Web Browser Interactions.

    Thank a Lot

  11. I'm in Canada, and was getting the following error for "$ChromeDriver.FindElementByName("btnK").Submit()":

    Exception calling "Submit" with "0" argument(s): "element not interactable
    (Session info: chrome=78.0.3904.97)"
    At C:\Temp\PSL\psl.ps1:18 char:1
    + $ChromeDriver.FindElementByName("btnK").Submit() # Method to submit r …
    + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : ElementNotInteractableException

    For me the fix was to replace "$ChromeDriver.FindElementByName("btnK").Submit()" with "$ChromeDriver.FindElementsByName("btnK")[1].Submit()", since it was the second element I needed to pass the Submit() method to search Google.

    1. Thanks Keith! That must be the solution for most people with the same problem, I was able to reproduce the error from "https://google.ca". You are absolutely right with your approach. A common troubleshoot step is to use FindElementsByName().count (e.g. $Chrome.FindElementsByName("btnK").count ) to find out if there's more than one and select accordingly, this should help everyone with the same case. I also see there's 2 elements with the same name available, you'd select them from [0] to [n] as you would with an array.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.