Gluecrawl

Jobs

What a job is and how Gluecrawl processes your target page.

A job is a saved scraping configuration. It combines a target URL with a description of the data you want to extract, so Gluecrawl knows exactly what to look for.

Creating a Job

When you create a job you provide two things:

  1. A URL — the web page (or first page of a paginated list) you want to scrape.
  2. An extraction definition — what data to pull from that page.

There are two ways to define the extraction:

  • Goal-based — describe what you need in plain language, e.g. "Extract all product names, prices, and ratings." Gluecrawl figures out the rest.
  • Columns-based — list the exact columns you want (name, type, and an optional description for each). This gives you precise control over the output schema.

You can also set a max pages limit to control how far Gluecrawl paginates.

Processing

After a job is created, Gluecrawl's AI visits the target page and analyzes its structure. During processing, the AI:

  • Identifies where each piece of data lives on the page
  • Figures out how pagination works (next-page links, infinite scroll, etc.)
  • Detects whether the site links to detail pages with additional data
  • Determines the appropriate protection level for the site

Processing costs a fixed 10 credits and typically takes 30–90 seconds.

Job States

A job moves through the following states:

StateMeaning
In progressProcessing is underway — the AI is analyzing the page
ReadyProcessing succeeded — the job can now be run
FailedProcessing could not complete — check the error message and create a new job

Once a job reaches ready, it stays ready permanently. You can run it as many times as you like without repeating the processing step.

On this page