Jobs
What a job is and how Gluecrawl processes your target page.
A job is a saved scraping configuration. It combines a target URL with a description of the data you want to extract, so Gluecrawl knows exactly what to look for.
Creating a Job
When you create a job you provide two things:
- A URL — the web page (or first page of a paginated list) you want to scrape.
- An extraction definition — what data to pull from that page.
There are two ways to define the extraction:
- Goal-based — describe what you need in plain language, e.g. "Extract all product names, prices, and ratings." Gluecrawl figures out the rest.
- Columns-based — list the exact columns you want (name, type, and an optional description for each). This gives you precise control over the output schema.
You can also set a max pages limit to control how far Gluecrawl paginates.
Processing
After a job is created, Gluecrawl's AI visits the target page and analyzes its structure. During processing, the AI:
- Identifies where each piece of data lives on the page
- Figures out how pagination works (next-page links, infinite scroll, etc.)
- Detects whether the site links to detail pages with additional data
- Determines the appropriate protection level for the site
Processing costs a fixed 10 credits and typically takes 30–90 seconds.
Job States
A job moves through the following states:
| State | Meaning |
|---|---|
| In progress | Processing is underway — the AI is analyzing the page |
| Ready | Processing succeeded — the job can now be run |
| Failed | Processing could not complete — check the error message and create a new job |
Once a job reaches ready, it stays ready permanently. You can run it as many times as you like without repeating the processing step.