Stop manual URL checks. A dedicated REST API lets you query indexing status at scale, integrate with dashboards, and catch deindexed pages before they tank your traffic. Here's the architecture, the pitfalls, and the working code.
Google Search Console gives you per-URL indexing reports, but it's not built for automation. You can't pipe 10,000 URLs through the UI, and the API doesn't expose real-time indexStatus for bulk operations. A dedicated Google Index Checker API fills that gap: you send a URL, get back INDEXED, NOT_INDEXED, PENDING, or EXCLUDED with a reason code.
In practice, when you run a weekly crawl on a 50,000-page e-commerce site, about 2-5% of pages drop out of the index silently. Product pages that don't get indexed are dead inventory. We saw a client lose 30% of organic revenue over three months because a server config change blocked half their category pages. The index checker API caught it on day one.
Export sitemap or crawl log. Deduplicate and filter out non-indexable URLs (robots.txt, noindex).
POST to /api/v1/check with up to 100 URLs. Include API key in header. Expect a jobId.
GET /api/v1/status/{jobId}. Average processing time: 45 seconds for 100 URLs.
Each URL returns status, reason code, and last indexed timestamp. Group by status.
If NOT_INDEXED exceeds 3% of batch, push to Slack, email, or your CI/CD pipeline.
After implementing changes (e.g., fixing blocked resources), resubmit only affected URLs.
| Status Code | Meaning | Typical Cause | Action Required | Hidden Risk |
|---|---|---|---|---|
| INDEXED | URL is in Google's index | Page meets quality and technical requirements | None. Track in baseline report. | Can be stale if last indexed > 14 days ago. Recheck if content changed. |
| NOT_INDEXED | URL not found in index | New page, crawl budget issue, or redirect chain | Check crawlability: server response, internal links, sitemap presence. | May indicate a soft 404 or thin content. Don't assume a technical fix will work. |
| EXCLUDED | URL excluded by robots.txt or noindex | Explicit directive or meta tag | Remove directive or add index tag. Resubmit to Google. | If you manage multiple domains, a shared robots.txt can accidentally block valid pages. |
| PENDING | Crawl scheduled but not completed | URL recently submitted or queue is busy | Wait 24-48 hours. Recheck via polling endpoint. | Recrawl delays often affect large sites. Batch your checks to avoid hitting rate limits. |
| ERROR | Request failed or malformed URL | Invalid protocol, encoding issue, or DNS failure | Validate URL format. Ensure the API endpoint is reachable. | A single malformed URL in a batch can trigger a full batch error. Validate before submission. |
Assume you have 500 product URLs from a WooCommerce store. You need to check them without hitting the 100-URL batch limit.
Step 1: Partition — Split into 5 batches of 100. Use a simple array chunker in Python:batches = [urls[i:i+100] for i in range(0, len(urls), 100)]
Step 2: Authenticate — Pass API key as header X-Api-Key: your_key. Set timeout to 30 seconds per request.
Step 3: Submit and Poll — For batch 1, POST to /api/v1/check. Receive jobId. Then poll /api/v1/status/jobId every 10 seconds until status is complete. Average 45 seconds per batch.
Step 4: Process Results — Batch 1 returned: INDEXED: 89, NOT_INDEXED: 8, EXCLUDED: 3. The 8 NOT_INDEXED URLs were all from a category with a broken pagination link. You fix the link, resubmit those 8. Total time: ~4 minutes for 500 URLs.
| Option | What happens | Verdict |
|---|---|---|
| Search Console UI | Google Index Checker API | API wins for scale and automation |
| One URL at a time, copy-paste | Batch 100 URLs per request | API is 100x faster for bulk |
| No programmatic access, no alerting | Integrates with CI/CD, Slack, email | API enables proactive monitoring |
| Rate limited per user session | 500 requests per hour, burst 20/min | API predictable limits for production |
A common situation we see is developers assuming a 404 response means the page is not indexed. That's wrong. A page can return 404 yet still be indexed if Google keeps a cached copy. Our API returns INDEXED with a warning flag if the server response is missing.
Another failure mode: URLs with tracking parameters. ?utm_source=facebook creates a different URL from the canonical. If you feed the raw URL without stripping params, you get NOT_INDEXED even though the canonical is indexed. Always normalize before checking. Filter out utm_*, fbclid, gclid.
Duplicate lists are another trap. If your sitemap includes both /product/123 and /product/123?color=red, the API will count both. Deduplicate on the canonical URL. We've seen batches where 30% of requests were duplicates, wasting quota and time.
Normalize all URLs: strip tracking parameters, lowercase scheme/host, resolve relative paths.
Set up a rate limiter: respect 500 req/h limit. Use exponential backoff on 429 responses.
Handle timeouts: if a batch takes > 120 seconds, retry with a smaller batch (50 URLs).
Log all response codes: INDEXED is not enough. Track EXCLUDED and ERROR separately.
Alert on anomalies: if NOT_INDEXED rate jumps above 5% of your baseline, investigate.
Schedule periodic checks: daily for high-value pages, weekly for the rest of the site.
Integrate with your CI/CD: block a deployment if critical pages (home, pricing) are not indexed.
The API accepts up to 100 URLs per batch. For agencies, we recommend creating a separate API key per client site to isolate rate limits and billing. Use a queue system that submits batches sequentially, respecting the 500 requests per hour limit. Track jobIds per client to correlate results. Avoid sending all client URLs in one loop — you'll hit rate limits and mix up responses.
The top three errors are: 1) malformed URLs with spaces or unencoded characters, 2) rate limit exceeded (429) when submitting too fast, and 3) batch timeout when URLs return slow server responses. For guest posts specifically, many URLs are on low-authority domains that have crawl delays. Set your poll timeout to 120 seconds and implement retry logic with exponential backoff.
Yes, but with caution. The API returns indexing status based on Google's crawl data. If a PBN page is indexed but later removed, the API may still show INDEXED for a few days. We recommend rechecking high-value backlinks weekly. Also note that Google may deindex PBN pages faster than normal pages. Do not rely on a single check — monitor trends over time.
Add a pre-deployment stage that checks critical pages (homepage, top 5 product pages, pricing page). Run the API call with a timeout of 30 seconds per URL. If any critical URL returns NOT_INDEXED, fail the build and alert the team. Use a separate API key for CI/CD to avoid mixing usage with manual checks. Store the API key as a secret environment variable.
Partition your list into batches of 100 URLs each (100 batches total). Submit one batch per minute to stay under 500 req/h. Use a scheduler that pauses 60 seconds between batches. If you get a 429 response, double the wait time and retry. Expect the full check to take about 2 hours for 10,000 URLs. For faster results, prioritize certain URL patterns or use a premium plan with higher limits.
First, verify the URL is in your sitemap and not blocked by robots.txt. Second, check the HTTP status code — a 301 redirect can cause indexing delays. Third, inspect the page for noindex tags or canonical tags pointing elsewhere. Fourth, look for server errors (500, 503) that Google encountered during the last crawl. Finally, check if the page has thin content or is a duplicate — the API may return NOT_INDEXED even if the page is technically crawlable.
The API checks the final destination URL after following all redirects. If URL A redirects to URL B, the API returns the indexing status of URL B, not URL A. This is important for canonical analysis — if you submit a non-canonical URL that redirects to the canonical, the API will show INDEXED if the canonical is indexed. For accurate results, always submit the canonical version of each URL. The API does not report the redirect chain, only the final status.
We offer a free tier that includes 100 requests per month with a maximum of 10 URLs per batch. Paid plans start at $29/month for 5,000 requests and scale up to enterprise plans with custom rate limits. All plans include email support. The free tier is sufficient for testing integration and checking a small site. For agencies or large sites, the Pro plan at $99/month (20,000 requests) is typically the best value.
Screaming Frog and Sitebulb are desktop tools that check indexing by analyzing HTTP headers and Google cache, not the actual Google index. Our API queries Google's index directly, giving accurate status. The trade-off: desktop tools are faster for one-time audits (no API calls), but they can't automate daily checks. For a one-time audit, use Screaming Frog. For ongoing monitoring in a pipeline, use the API.
Yes. The API returns a separate field <code>indexStatusDetail</code> that includes flags like <code>SOFT_404</code> and <code>META_NOINDEX</code>. If a page was indexed but later gets a noindex tag, the API will show <code>EXCLUDED</code> with reason <code>META_NOINDEX</code>. The response also includes the last indexed date, so you can detect if a page that was once indexed has been removed. This is critical for catching accidental noindex additions.
To ensure your pages are structured for indexing success, review Google's guidelines on structured data — properly marked-up content can improve how Google interprets your pages.
For advanced indexing tactics used by SEO professionals, the Grey Hat Protocol offers practical methods to accelerate indexing for high-priority pages, especially when dealing with crawlers that have limited budgets.
Quick calculator. Put in the expected monthly value of a page or link batch and the natural waiting time.