Jobs

A job is a saved scraping configuration. It combines a target URL with a description of the data you want to extract, so Gluecrawl knows exactly what to look for.

Creating a Job

When you create a job you provide two things:

A URL — the web page (or first page of a paginated list) you want to scrape.
An extraction definition — what data to pull from that page.

There are two ways to define the extraction:

Goal-based — describe what you need in plain language, e.g. "Extract all product names, prices, and ratings." Gluecrawl figures out the rest.
Columns-based — list the exact columns you want (name, type, and an optional description for each). This gives you precise control over the output schema.

You can also set a max pages limit to control how far Gluecrawl paginates.

Processing

After a job is created, Gluecrawl's AI visits the target page and analyzes its structure. During processing, the AI:

Identifies where each piece of data lives on the page
Figures out how pagination works (next-page links, infinite scroll, etc.)
Detects whether the site links to detail pages with additional data
Determines the appropriate protection level for the site

Processing costs a fixed 10 credits and typically takes 30–90 seconds.

Job States

A job moves through the following states:

State	Meaning
In progress	Processing is underway — the AI is analyzing the page
Ready	Processing succeeded — the job can now be run
Failed	Processing could not complete — check the error message and create a new job

Once a job reaches ready, it stays ready permanently. You can run it as many times as you like without repeating the processing step.

Creating a Job

Processing

Job States

On this page