Web crawling in Ruby with Capybara

Adler Hsieh

Sep 11, 2015 — 1 min read

In a Rails project, we use Capybara for feature (end to end) testing. However, Capybara is also good for crawling a page without Rails. Any data within a specific pattern can be obtained with ease.

We can create a simple web crawler within a single file with Capybara DSL.

How To

Create a folder with a Gemfile, because we require multiple gems.

$ mkdir crawler

Create a Gemfile:

source "https://rubygems.org"

gem 'capybara'
gem 'selenium-webdriver'

Run bundle after that.

Setup your crawler:

require 'capybara'

Capybara.run_server = false
Capybara.current_driver = :selenium
Capybara.app_host = "https://google.com.tw"

You can pick other drivers from the list in Capybara repo. Basically, install relative gems if required.

After setup, create a class and include the DSL from Capybara:

module MyCapybara
  class Crawler
    include Capybara::DSL
  end
end

crawler = MyCapybara::Crawler.new

If a instance is initiated, create a method, fill in with your patterns, and deal with the data. The following is a complete example:

require 'capybara'

Capybara.run_server = false
Capybara.current_driver = :selenium
Capybara.app_host = "https://google.com"

module MyCapybara
  class Crawler
    include Capybara::DSL
    def query(params)
      visit("/")
      fill_in "#search", with: params
      click_button "search"
      return find("#result").text
    end
  end
end

crawler = MyCapybara::Crawler.new
item = crawler.query(item)
File.open("query.txt","a") {|file|
  file.write("#{item}\n")
}

More complex operation can be done through other methods offered by Capybara.

Working at Merpay / Mercari: My Honest Review as a Software Engineer in Japan

In 2019, I joined Merpay, a part of the Mercari Group, in Japan. As I write this review, more than five years have passed. There are so many memories, learning experiences, and insights. This article aims to provide an honest and detailed review of my time at Merpay and Mercari,

Is Ruby on Rails Worth Learning in 2024? A Quick Insight

As we delve into 2024, many developers might wonder: is Ruby on Rails still worth learning in 2024? In recent years, other emerging technologies and frameworks have gained popularity. However, Ruby on Rails, based on the Ruby programming language, maintains its reputation. It is a highly efficient, mature, and developer-friendly

Managing Up Examples: Tips for Software Engineers

💡Key Takeaways: 1. Managing up is essentially building a positive relationship with your manager. It helps us succeed in the organization, allowing us to get closer to promotion and rewards. 2. Managing up helps us boost productivity. We avoid wasting time and effort in the wrong direction. 3. However, managing

How to Become a Software Engineering Manager in 2024: An Essential Guide

If you are a Software Engineer interested in transitioning to an Engineering Manager position, you are in the right place! We're here to demystify that process. Discover the skills we'll need and the mindset shift essential for success.

How To

Read more

Working at Merpay / Mercari: My Honest Review as a Software Engineer in Japan

Is Ruby on Rails Worth Learning in 2024? A Quick Insight

Managing Up Examples: Tips for Software Engineers

How to Become a Software Engineering Manager in 2024: An Essential Guide