Test Category

Test Blog Post

Starter template for writing out a blog post using MDX/JSX and Next.js.

No Name Exists

Abdullah Muhammad

Published on May 17, 20265 min read 5 views

Share:
Article Cover Image

Introduction

Today, we will dive into a fun exercise involving the Selenium WebDriver tool. We have used Selenium for test automation, but there is another fun, “hacky” way of utilizing this tool.

Web Scraping, the act of surfing web pages to “scrape” or “mine” data can be done using various APIs.

In fact, a tool I built months ago, makes use of a web scraper API to extract article information from a user requested Medium article. It uses the article data to generate an audio file from it.

The process is straight forward, but with potential drawbacks. For instance, many of these APIs are custom-built and for more robust scraping features, you might need to pay up.

For a simple example such as this, a simple web scraping API would suffice. However, if the problem you are trying to solve is complex in nature, a more advanced scraping tool might be required.

This is where the Selenium WebDriver tool comes into play. Selenium WebDriver is primarily used for testing, but as you will see in this tutorial, it can be much more than that.

We are going to use Selenium WebDriver to implement web scraping!

Selenium WebDriver for Web Scraping

As you have seen in previous tutorials, we can incorporate the Selenium WebDriver tool to search, access, and perform actions on web elements.

The act of retrieving web elements from a page using the built-in selectors allows one to “scrape” data.

We can verify the data (as we do for testing), store it in a flat file/database or perform ETL operations which is a common practice for working with large amounts of data.

In this article, we will create an application which incorporates the Selenium WebDriver to scrape a cryptocurrency website.

We will extract real-time prices of the three most popular cryptocurrencies:

  • Bitcoin (BTC)
  • Ethereum (ETH)
  • Solana (SOL)

CoinGecko is the website we will be using for retrieving these prices. The application will also send an email to a desired email address containing the latest prices of the three cryptocurrencies.

Recall that we can use SMTP to send emails. We saw this in the Nodemailer tutorial where we were able to send emails from a Node.js application using SMTP and the Nodemailer NPM dependency.

Code Overview

You can follow along by cloning this repository. The directory we will work with is /demos/Demo43_Selenium_WebDriver_Web_Scraping.

Since we are working with a live site, there is no web application for this project.

We simply have a Java project that incorporates Selenium WebDriver and other dependencies to successfully scrape coin prices and send the email.

All of the code for the project resides in /selenium_webdriver_web_scraping. A .jar file containing all the project code is also available to you to download if you would like to do that.

Like before, we implement POM when working with Selenium. We have three different packages in this project /src/main/java:

  • pages — Contains a Java class for retrieving web elements from the CoinGecko cryptocurrency website
  • runner — Contains a Runner class which acts as the entry point for the application
  • util — Contains a utility class for working with SMTP and preparing/sending the email

We will begin by first examining the Runner class.


Web Scraping Runner Class

We mentioned the need for an entry point for the application.

The Runner class provides us that entry point and is located in the runner package (named Runner.java):

GitHub GistJava
package runner;

import java.net.URL;
import java.time.Duration;
import java.util.Scanner;

import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

import jakarta.mail.MessagingException;
import pages.PricesPage;

public class Runner {
	
	public static void main(String[] args) throws MessagingException, InterruptedException {
		
		try (Scanner scanner = new Scanner(System.in)) {
		
			// Retrieve email and password from user input
			System.out.println("Please enter the email address: ");
			String email = scanner.nextLine();
			
			System.out.println("Please enter the password: ");
			String password = scanner.nextLine();
			
	        // Get the URL of the chromedriver.exe file
	        URL chromedriverUrl = Runner.class.getResource("chromedriver.exe");
	        
	        // Obtain dynamically, the absolute path location of the chrome driver
	        String chromedriverPath = chromedriverUrl.getPath();
			
			System.setProperty("webdriver.chrome.driver", chromedriverPath);
			WebDriver driver = new ChromeDriver();
			
			driver.manage().window().maximize();
			
			driver.manage().timeouts().implicitlyWait(Duration.ofSeconds(2));
			driver.get("https://www.coingecko.com/"); // Navigate to the CoinGecko cryptocurrency price page
			
			PricesPage pricesPage = new PricesPage(driver, email, password); // Pass in web driver to Page class
			driver = pricesPage.getWebDriver();	
			
			driver.close(); // Close the driver
		}
	}
}
Runner.java is the starting point for the application where it initializes the web driver and user information

We make use of the Scanner class to obtain user input (email and password) and pass it along to the PricesPage constructor for the purposes of retrieving and processing data and preparing/sending the email.

We initialize the web driver and make use of the ChromeDriver. The chromedriver.exe file resides in the same location as this Runner.java class.

We prepare the driver by setting initial properties and navigating to the CoinGecko cryptocurrency website.

Now let us examine how we retrieve coin prices.


CoinGecko Cryptocurrency Prices Page

In the pages package, you will find a lone class named PricesPage.java. The implementation of this class can be seen below:

GitHub GistJava
package pages;

import org.openqa.selenium.WebDriver;
import org.openqa.selenium.support.PageFactory;

import jakarta.mail.MessagingException;
import util.JavaMailUtil;

import org.openqa.selenium.WebElement;
import org.openqa.selenium.support.FindBy;

public class PricesPage {
	private WebDriver driver;
	private String email;
	private String password;
	
	// Initializing the page and start by retrieving coin prices and collect data to send email
	public PricesPage(WebDriver driver, String email, String password) throws MessagingException {
		this.driver = driver;
		this.email = email;
		this.password = password;
		PageFactory.initElements(driver, this);
		
		findCoinPrices();
	}
			
	// Retrieving the prices of the three coins from the CoinGecko cryptocurrency price page
	@FindBy(xpath="//table[1]/tbody[1]/tr[1]/td[5]/span[1]")
	private WebElement bitcoinPriceWebElement;

	@FindBy(xpath="//table[1]/tbody[1]/tr[2]/td[5]/span[1]")
	private WebElement ethereumPriceWebElement;
		
	@FindBy(xpath="//table[1]/tbody[1]/tr[5]/td[5]/span[1]")
	private WebElement solanaPriceWebElement;
	
	// Helper methods for sending back web driver and cryptocurrency prices	
	public String getBitcoinPrice() {
		return this.bitcoinPriceWebElement.getText();
	}
		
	public String getEthereumPrice() {
		return this.ethereumPriceWebElement.getText();
	}
		
	public String getSolanaPrice() {
		return this.solanaPriceWebElement.getText();
	}
		
	public WebDriver getWebDriver() {
		return this.driver;
	}
	
	public void findCoinPrices() throws MessagingException {
		String msg = "Cryptocurrency Prices: \n";
		msg += "Bitcoin: " + getBitcoinPrice() + "\n";
		msg += "Ethereum: " + getEthereumPrice() + "\n";
		msg += "Solana: " + getSolanaPrice() + "\n";

		// Once all the data formatting is complete, pass in the msg for the email along with email and password
		JavaMailUtil.sendEmail(msg, email, password);
	}		
}
PricesPage.java class containing the relevant web elements for retrieving coin prices

Since we are concerned with retrieving only three cryptocurrency prices, we create a separate web element variable to represent each.

We make use of PageFactory for efficient web element retrieval and use the XPath selector provided by the By class to retrieve the three different web elements.

We initialize user information (email and password) and prepare the email message using the findCoinPrices() function. This is where we make use of the web elements containing price information.

Lastly, we invoke a utility class function and pass in the email, password, and email message to complete the final step.


Utility Class for Preparing and Sending the Email

The utility class allows us to gather all the information to prepare and send the email.

If you are not familiar with SMTP, you can complete this article before proceeding.

Like the many other protocols out there such as HTTP/S, FTP, DNS, TCP, and SSH, SMTP is another one that deals with sending email messages using a server.

We can use Maven dependencies to work with SMTP similar to how we worked with it using Node.js and the Nodemailer dependency.

In the util package, you will find the JavaMailUtil.java class. Its implementation can be found below:

GitHub GistJava
package util;

import java.util.Properties;

import jakarta.mail.Authenticator;
import jakarta.mail.Message;
import jakarta.mail.MessagingException;
import jakarta.mail.PasswordAuthentication;
import jakarta.mail.Session;
import jakarta.mail.Transport;
import jakarta.mail.internet.InternetAddress;
import jakarta.mail.internet.MimeMessage;

public class JavaMailUtil {
	
	// Verify the credentials of the email and use SMTP to send it
	// Set the properties to TTLS, authentication to true and GMail configurations
	public static void sendEmail(String msg, String emailAddress, String password) throws MessagingException {
		Properties properties = new Properties();
		
		properties.put("mail.smtp.auth", "true");
		properties.put("mail.smtp.starttls.enable", "true");
		properties.put("mail.smtp.host", "smtp.gmail.com"); // Configurations set to GMail by default accessing port 587
		properties.put("mail.smtp.port", "587");
	
		Session session = Session.getInstance(properties, new Authenticator() {
		
			@Override
			protected PasswordAuthentication getPasswordAuthentication() {
				return new PasswordAuthentication(emailAddress, password); // Authenticate credentials here
			}
		});
		
		Message message = prepareMessage(session, emailAddress, password, msg); // Specify all the information required for messaging
		
		Transport.send(message);
		System.out.println("Message sent successfully");
	}

	// Prepare the actual email message prior to sending
	public static Message prepareMessage(Session session, String emailAddress, String password, String msg) {
		try {
			
			Message message = new MimeMessage(session);
			message.setFrom(new InternetAddress(emailAddress));
			message.setRecipient(Message.RecipientType.TO, new InternetAddress(emailAddress)); // Recipient is the same as sender
			message.setSubject("BTC, ETH, SOL Cryptocurrency Prices");
			message.setText(msg); // Populate the message

			return message;
		}
		catch (Exception e) {
			System.out.println(e.getMessage());
		}
		
		return null;
	}
}
JavaMailUtil.java class sets the necessary properties, sessions and authentication for email delivery

We make use of the Jakarta Mail dependency to easily prepare and send the email.

In the sendMail() function, we set properties for working with SMTP. For this protocol, we typically work on port 587.

We enable ttls and set authentication to true. We are working with Gmail accounts only so we provide the Gmail host address as well (smtp.gmail.com).

After that, we create a session by passing in the properties set earlier. We use the Session class (provided by Jakarta Mail) and verify the credentials provided by the user (email and password) using an anonymous class of type Authenticator (also provided by Jakarta Mail).

Once that is complete, we proceed to create the email message. Using the Message class (provided by Jakarta Mail), we make use of a custom function that returns an object of type Message which configures all the required information.

We set the session and ensure that the recipient and sender are set to same. We provide a subject line and pass in the formatted text which contains all the cryptocurrency price information as the email text.

Finally, we make use of the Transport class (provided by Jakarta Mail) and the static method send() to send the email.

If all goes well, the console should print the successful message.

Quite a bit is taking place here, but once you understand that most of this is setting configurations, you will realize most of this boilerplate code.


Project Configuration File

Lastly, we look at the pom.xml file to confirm the required dependencies for working with this project. You can find the pom.xml file in /selenium_webdriver_web_scraping:

GitHub GistMaven POM
<project xmlns="https://maven.apache.org/POM/4.0.0" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>selenium_webdriver_web_scraping</groupId>
  <artifactId>selenium_webdriver_web_scraping</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <dependencies>
	<!-- https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java -->
	<dependency>
	    <groupId>org.seleniumhq.selenium</groupId>
	    <artifactId>selenium-java</artifactId>
	    <version>4.19.1</version>
	</dependency>
	<!-- https://mvnrepository.com/artifact/jakarta.mail/jakarta.mail-api -->
	<dependency>
	    <groupId>com.sun.mail</groupId>
	    <artifactId>jakarta.mail</artifactId>
	    <version>2.0.1</version>
	</dependency>
  </dependencies>
</project>
pom.xml file containing the required dependencies for web scraping using Selenium

We are making use of Selenium (as seen before), but also the Jakarta Mail dependency for working with SMTP to prepare and send emails.

All of this should be pretty straight forward.

Demo Time!

Now that we have covered the codebase for web scraping, it is time for a demo!

We will be working with Gmail in this tutorial. If you do not have a Gmail account, please proceed to create one.

We will need to set some security features.

Google has an option for allowing less secure apps to access your email address. However, by fall of 2024, this will no longer be an option.

For you to use SMTP, you will need to ensure that two-factor verification is enabled on your account.

Once you have setup two-factor verification, you will need to create an app password. This is a unique code which you provide instead of the password to access your email via SMTP.

You can only create an app password for your account after you have turned on two-factor verification.

Ensure that you note the app password somewhere safe as it is visible once. However, should you forget it, you can always create a new one.

When the application launches and prompts you to enter the email address and password, you will need to provide the email address and the app password you created above.


Selecting the IDE and Running the Project

When working with Java projects, there are several IDEs you can choose. Eclipse, NetBeans, and IntelliJ are some of most common ones.

We are going to work with Eclipse in this tutorial. You will need to import the project in your own desired workspace. Proceed to select the Runner class and simply hit run.

Upon launch, you should be prompted to enter the email address and password like this:

No Image Found
Runner.java class live in action

After that, you should notice ChromeDriver start up and navigate to the CoinGecko website.

Ignore the warning messages for now as they are related to Chrome and the ChromeDriver versions.

When prompted to provide the email address and password, you should be able to enter your Gmail account (inked out mine for security reasons) and the app password you created earlier.

You can clearly see mine, but this is irrelevant because by the time you will view this tutorial, I will have it deleted ;)

This is for demonstrative purposes only. If all goes accordingly, you should see the successful message print on the console (as seen in the screen).

Going to your Gmail inbox, you should see the following email:

No Image Found
Email successfully sent with the correct subject, recipient, and email text

The email subject, recipient, and text match what was set in the utility class. The prices should conform to what was on the CoinGecko website when the ChromeDriver navigated to it.

Let it be known that on this date (May 15, 2024), the markets rallied hard! So this will definitely be a worthwhile email notification to receive!

Conclusion

We covered web scraping using the Selenium WebDriver and looked at how we can process the retrieved data.

We brought together concepts learned in previous tutorials such as the PageFactory and POM to create a class to help with cryptocurrency price retrieval.

In addition, we worked with SMTP and the Jakarta Mail dependency to prepare and send emails on the Gmail platform.

We looked at a custom Runner class for working with web scraping as well as a utility class which handled the preparation and sending of emails.

We explored a very basic use case, but it is sufficient to provide a solid understanding of how you can use Selenium WebDriver for web scraping.

It is an awesome tool for you to use to do awesome things!

In the list below, you will find links to the GitHub repository used in this article, the official Selenium WebDriver docs, and the Jakarta Mail docs:

As always, I hope you found this article helpful and look forward to more in the future.

Thank you!

No Name

Abdullah Muhammad

Blogger. Software Engineer. Designer.

Subscribe to the newsletter

Get new articles, code samples, and project updates delivered straight to your inbox.