PowerShell & Selenium: Automate Web Browser Interactions – Part II

Part deux is here! Let's pick where we left off.

On the previous post we integrated Selenium within our PowerShell script and ran its methods to open Chrome and perform a google search just as if we were doing so manually.

For the next proof of concept, we will build upon the previous script. This time we will download the pagesource contents from the google search results as an html file for offline view.  The purpose of doing this is to visualize and understand what's going on in the background when we ask Selenium to run in "headless" mode. Once we go headless, the browser won't pop up as we saw in the previous post, we will only see the html file appearing out of nowhere!

Now you see me…

In order to access the page's source code from our google search, I'll use the .PageSource property. That's as simple as:

$ChromeDriver.PageSource

Now, for the "evidence" that something is going on behind the scenes, we will export this into an html file.

$ChromeDriver.PageSource | Out-File "C:\Temp\PSL\GoogleSearchResults.html" -Force

This is the full script so far (highlighting what we added):

# Website and credential variables
$YourURL = "https://www.google.com" # Website we'll access

# Invoke Selenium into our script!
$env:PATH += ";C:\Temp\PSL\" # Adds the path for ChromeDriver.exe to the environmental variable 
Add-Type -Path "C:\Temp\PSL\WebDriver.dll" # Adding Selenium's .NET assembly (dll) to access it's classes in this PowerShell session
$ChromeDriver = New-Object OpenQA.Selenium.Chrome.ChromeDriver # Creates an instance of this class to control Selenium and stores it in an easy to handle variable

# Make use of Selenium's class methods to manage our browser at will
$ChromeDriver.Navigate().GoToURL($YourURL) # Browse to the specified website
$ChromeDriver.FindElementByName("q").SendKeys("mavericksevmont tech blog") # Methods to find the input textbox for google search and then to type something in it
$ChromeDriver.FindElementByName("btnK").Submit() # Method to submit request to the button
$ChromeDriver.PageSource | Out-File "C:\Temp\PSL\GoogleSearchResults.html" -Force

# Cleaning up after ourselves!
Pause 
Function Stop-ChromeDriver {Get-Process -Name chromedriver -ErrorAction SilentlyContinue | Stop-Process -ErrorAction SilentlyContinue}
$ChromeDriver.Close() # Close selenium browser session method
$ChromeDriver.Quit() # End ChromeDriver process method
Stop-ChromeDriver # Function to make double sure the Chromedriver process is finito (double-tap!)

And here it is live:

Click to "embiggen"

No that we have a good idea of what the script does, it's time…

Now you don't…going headless!

This is not what I meant with headless!

We'll do the same exact thing, but headless, this means that you can use your PC to do whatever you like while PowerShell+Selenium get the job done in parallel, the browser actions will be invisible for you.

To achieve this we'll store the Chromeoptions class within a variable using the New-Object cmdlet, then we can use its ".addArguments" method to specify the headless argument within $Chromedriver. Again, sounds more complicated than it is, we just need to add 2 lines and edit 1 line to add $ChromeOptions.

We'll replace this:

$ChromeDriver = New-Object OpenQA.Selenium.Chrome.ChromeDriver

With this:

$ChromeOptions = New-Object OpenQA.Selenium.Chrome.ChromeOptions
$ChromeOptions.addArguments('headless')
$ChromeDriver = New-Object OpenQA.Selenium.Chrome.ChromeDriver($ChromeOptions)

Here is the code, I've highlighted what changed:

# Website and credential variables
$YourURL = "https://www.google.com" # Website we'll access

# Invoke Selenium into our script!
$env:PATH += ";C:\Temp\PSL\" # Adds the path for ChromeDriver.exe to the environmental variable 
Add-Type -Path "C:\Temp\PSL\WebDriver.dll" # Adding Selenium's .NET assembly (dll) to access it's classes in this PowerShell session
$ChromeOptions = New-Object OpenQA.Selenium.Chrome.ChromeOptions
$ChromeOptions.addArguments('headless')
$ChromeDriver = New-Object OpenQA.Selenium.Chrome.ChromeDriver($ChromeOptions)

# Make use of Selenium's class methods to manage our browser at will
$ChromeDriver.Navigate().GoToURL($YourURL) # Browse to the specified website
$ChromeDriver.FindElementByName("q").SendKeys("mavericksevmont tech blog") # Methods to find the input textbox for google search and then to type something in it
$ChromeDriver.FindElementByName("btnK").Submit() # Method to submit request to the button
$ChromeDriver.PageSource | Out-File "C:\Temp\PSL\GoogleSearchResults.html" -Force

# Cleaning up after ourselves!
Pause 
Function Stop-ChromeDriver {Get-Process -Name chromedriver -ErrorAction SilentlyContinue | Stop-Process -ErrorAction SilentlyContinue}
$ChromeDriver.Close() # Close selenium browser session method
$ChromeDriver.Quit() # End ChromeDriver process method
Stop-ChromeDriver # Function to make double sure the Chromedriver process is finito (double-tap!)

This is how it rolls:

Click to "embiggen"

So, the same as before except Google Chrome doesn't open up, magic! Pretty cool right? So not only you can control any website or web app, but you can also have your desktop free to take care of any other task while the automated task runs the job silently, like any regular script would.

This is it for now, next we'll automate a page login by sending stored user/password credentials on Part III of this series.

Go to:
PowerShell & Selenium: Automate Web Browser Interactions – Part III
PowerShell & Selenium: Automate Web Browser Interactions – Part I

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.