Scraper API - Instagram & Google Reviews

A comprehensive, production-ready API service that provides Instagram posts and Google Reviews on-demand for Shopify apps and websites. Built with intelligent caching, rate-limit protection, and zero-configuration scraping.

🌐 Live API: https://scraper.capula.co 📚 Documentation: https://scraper.capula.co/docs 🛍️ Shopify Guide: https://scraper.capula.co/docs/shopify ⭐ Google Reviews Guide: https://scraper.capula.co/docs/reviews


📋 Table of Contents


🚀 Quick Start

For Shopify Developers

This API provides 3 main endpoints that work together:

// 1. Get Instagram Photos (no videos/reels)
fetch('https://scraper.capula.co/api/scrape?username=pascuccicoffee&count=7&type=photos')

// 2. Get Instagram Reels Only
fetch('https://scraper.capula.co/api/scrape?username=pascuccicoffee&count=5&type=reels')

// 3. Get Google Reviews
fetch('https://scraper.capula.co/api/reviews?organizationId=0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d&count=10')

That's it! No API keys needed on your end. Everything is handled server-side.


📸 Instagram API

Endpoint: /api/scrape

Fetch Instagram posts (photos or reels) from any public Instagram username.

URL:

GET https://scraper.capula.co/api/scrape

Parameters:

Parameter Required Type Default Values Description
username YES string - Any Instagram username Username without @ symbol
count No integer 7 1-50 Number of posts to return
type No string photos photos or reels Type of content to fetch

Examples:

# Get 7 photos from @pascuccicoffee
https://scraper.capula.co/api/scrape?username=pascuccicoffee&count=7&type=photos

# Get 10 reels from @nike
https://scraper.capula.co/api/scrape?username=nike&count=10&type=reels

# Get 5 photos (type defaults to photos)
https://scraper.capula.co/api/scrape?username=starbucks&count=5

Response Format (Photos):

{
  "data": [
    {
      "media_url": "https://scraper.capula.co/media/pascuccicoffee_123456.webp",
      "permalink": "https://www.instagram.com/p/ABC123/",
      "timestamp": "2025-10-08T09:30:00Z",
      "caption": "Fresh coffee this morning! #coffee",
      "media_type": "photo"
    }
  ],
  "count": 7,
  "username": "pascuccicoffee",
  "type": "photos",
  "cached": true,
  "scraped_at": "2025-10-09T01:24:27Z",
  "cache_expires_in_hours": 18.5
}

Response Format (Reels):

{
  "data": [
    {
      "media_url": "https://scraper.capula.co/media/pascuccicoffee_123456.mp4",
      "thumbnail_url": "https://scraper.capula.co/media/pascuccicoffee_123456_thumb.webp",
      "permalink": "https://www.instagram.com/p/ABC123/",
      "timestamp": "2025-10-08T09:30:00Z",
      "caption": "Fresh coffee reel! #coffee",
      "media_type": "reel"
    }
  ],
  "count": 7,
  "username": "pascuccicoffee",
  "type": "reels",
  "cached": true,
  "scraped_at": "2025-10-09T01:24:27Z",
  "cache_expires_in_hours": 18.5
}

Response Fields:

Field Type Description
data array Array of Instagram posts
data[].media_url string Direct URL to media file (WebP for photos, MP4 for reels)
data[].thumbnail_url string [Reels only] Direct URL to thumbnail/cover image (WebP, optimized)
data[].permalink string Link to original Instagram post
data[].timestamp string Publication date (ISO 8601)
data[].caption string Post caption (truncated to 200 chars)
data[].media_type string "photo" or "reel"
count integer Number of posts returned
username string Instagram username
type string Content type ("photos" or "reels")
cached boolean true = from cache (no API quota used), false = fresh scrape (1 API request)
scraped_at string When data was originally fetched
cache_expires_in_hours float Hours until cache expires and data refreshes

Cache Duration: 24 hours per username + type combination


⭐ Google Reviews API

Endpoint: /api/reviews

Fetch Google Reviews for any business using its Google Maps organization ID.

URL:

GET https://scraper.capula.co/api/reviews

Parameters:

Parameter Required Type Default Description
organizationId YES string - Google Maps organization ID (e.g., 0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d)
count No integer 10 Number of reviews to return (1-50)

🔍 How to Find organizationId

Step-by-Step:

  1. Go to Google Maps: https://maps.google.com
  2. Search for the business: Type business name + location
  3. Click on the business to open its profile
  4. Copy the URL from your browser address bar

Example URL:

https://www.google.com/maps/place/Empire+State+Plumbing+Heating+%26+Air+Conditioning/@42.9354131,-73.8094899,17z/data=!3m1!4b1!4m6!3m5!1s0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d!8m2!3d42.9354131!4d-73.806715!16s%2Fg%2F11b6g2tcd1
  1. Find the organizationId: Look for 1s followed by the ID:
1s0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  1. Remove the 1s prefix:
0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d  ← This is your organizationId

Examples:

# Get 10 reviews for Empire State Plumbing
https://scraper.capula.co/api/reviews?organizationId=0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d&count=10

# Get 20 reviews
https://scraper.capula.co/api/reviews?organizationId=0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d&count=20

# Get maximum 50 reviews
https://scraper.capula.co/api/reviews?organizationId=0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d&count=50

Response Format:

{
  "data": [
    {
      "rating": 5,
      "comment": "Excellent service! Very professional and knowledgeable staff...",
      "date": "2025-06-16T18:27:39.118Z",
      "author": null,
      "photos": [
        "https://lh3.googleusercontent.com/..."
      ],
      "owner_response": "Thank you so much for your feedback! We're thrilled..."
    }
  ],
  "count": 10,
  "organizationId": "0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d",
  "cached": true,
  "scraped_at": "2025-10-09T01:13:45Z",
  "cache_expires_in_days": 6.9
}

Response Fields:

Field Type Description
data array Array of Google reviews
data[].rating integer Star rating (1-5)
data[].comment string Review text/comment (full text)
data[].date string Publication date (ISO 8601)
data[].author string/null Reviewer name (may be null for privacy)
data[].photos array URLs to review photos (empty if none)
data[].owner_response string/null Business owner's response (null if no response)
count integer Number of reviews returned
organizationId string The business organization ID
cached boolean true = from cache, false = fresh scrape
scraped_at string When data was originally fetched
cache_expires_in_days float Days until cache expires

Cache Duration: 7 days per organizationId


🌐 All Endpoints

API Endpoints

Endpoint Method Purpose Cache Parameters
/api/scrape GET Instagram photos/reels 24h username, count, type
/api/reviews GET Google Reviews 7d organizationId, count
/api/search-business GET Find organizationId helper - query
/health GET Health check - none

Documentation Endpoints

Endpoint Method Purpose
/ GET Main documentation (HTML)
/docs GET Main documentation (HTML)
/docs/shopify GET Shopify integration guide (HTML)
/docs/reviews GET Google Reviews guide (HTML)

Media Endpoints

Endpoint Method Purpose Cache
/media/{filename}.webp GET Serve optimized images 12h
/feeds GET List all cached feeds -

🔐 Authentication

No authentication required for API consumers (Shopify apps, websites).

All authentication is handled server-side with RapidAPI. The API uses: - Instagram120 API (RapidAPI) - Google Maps API Unofficial (RapidAPI) - Single RapidAPI key configured on server

For developers/admins: - Server-side RapidAPI key stored in .env file - Same key works for both Instagram and Google Reviews APIs


💾 Caching Strategy

Instagram Caching (24-Hour)

Why 24 hours? - Instagram users post frequently (daily/multiple times per day) - Fresh content is important for engagement - Keeps data current without excessive API calls

How it works: 1. First Request: Fetch from Instagram API → Save to cache → Return data 2. Subsequent Requests (< 24h): Serve from cache (instant, no API quota) 3. After 24 Hours: Cache expires → Next request fetches fresh data

Cache Key: username + type (photos or reels)

Example: - Request @nike photos at 10 AM → Fresh scrape (1 API request) - Request @nike photos at 2 PM → Cached (0 API requests) - Request @nike photos at 10 AM next day → Fresh scrape (1 API request) - Request @nike reels at 11 AM → Fresh scrape (different type, 1 API request)

Google Reviews Caching (7-Day)

Why 7 days? - Reviews don't change frequently (businesses get reviews weekly, not daily) - Longer cache = better API quota management - Review data is less time-sensitive than social media

How it works: 1. First Request: Fetch from Google API → Save to cache → Return data 2. Subsequent Requests (< 7 days): Serve from cache (instant, no API quota) 3. After 7 Days: Cache expires → Next request fetches fresh data

Cache Key: organizationId

Example: - Request business reviews Monday → Fresh scrape (1 API request) - Request same business Tuesday-Sunday → Cached (0 API requests) - Request same business next Monday → Cached (0 API requests) - Request same business 8 days later → Fresh scrape (1 API request)

Cache Location

All cached data stored in: /app/data/

Instagram: - ig_user_{username}.json (all posts) - ig_user_{username}_photos.json (photos only) - ig_user_{username}_reels.json (reels only)

Google Reviews: - google_reviews_{safe_org_id}.json

Images: - media/{username}_{post_id}.webp


🚦 Rate Limits

RapidAPI Free Tier Limits

Instagram120 API: - 35 requests per day (free tier) - Each unique username + type = 1 request per 24 hours - Cached requests = 0 API quota

Google Maps API (Unofficial): - Varies by plan (check RapidAPI dashboard) - Each unique organizationId = 1 request per 7 days - Cached requests = 0 API quota

Optimization Strategy

With caching, you can serve: - ✅ Unlimited requests from Shopify/websites - ✅ Only 1 API call per username per day (Instagram) - ✅ Only 1 API call per business per week (Google Reviews)

Example Scenario: - 10 Shopify stores use the API - Each store requests 3 Instagram users + 1 Google Review business - Each store makes 100 requests per day

API Usage: - Instagram: 30 unique usernames = 30 API requests (within 35 limit ✅) - Google: 10 unique businesses = 10 API requests (for entire week ✅) - All other requests (999+ daily) = 0 API requests (cached ✅)


❌ Error Handling

Common Errors

400 Bad Request - Missing Parameter

Instagram:

{
  "error": "Missing parameter",
  "message": "The \"username\" parameter is required",
  "example": "/api/scrape?username=pascuccicoffee&count=7&type=photos"
}

Google Reviews:

{
  "error": "Missing parameter",
  "message": "The \"organizationId\" parameter is required",
  "example": "/api/reviews?organizationId=0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d&count=10",
  "help": "Find organizationId from Google Maps URL or use Google Maps Place ID"
}

400 Bad Request - Invalid Type

{
  "error": "Invalid type",
  "message": "Type must be \"photos\" or \"reels\"",
  "example": "/api/scrape?username=pascuccicoffee&count=7&type=photos"
}

400 Bad Request - Invalid Count

{
  "error": "Invalid count",
  "message": "Count must be between 1 and 50",
  "provided": 100
}

404 Not Found - Username Not Found

{
  "error": "No posts found",
  "message": "Could not fetch posts for @invaliduser123. Username may not exist or may be private.",
  "username": "invaliduser123"
}

404 Not Found - Organization Not Found

{
  "error": "No reviews found",
  "message": "Could not fetch reviews for organization 0xinvalidid",
  "organizationId": "0xinvalidid"
}

500 Internal Server Error

{
  "error": "Internal server error",
  "message": "Detailed error message here..."
}

Error Handling Best Practices

async function fetchInstagramPosts(username, count = 7, type = 'photos') {
  try {
    const response = await fetch(
      `https://scraper.capula.co/api/scrape?username=${username}&count=${count}&type=${type}`
    );

    const data = await response.json();

    // Check for errors
    if (!response.ok) {
      console.error(`Error ${response.status}:`, data.error);
      console.error('Message:', data.message);

      // Handle specific error cases
      switch (response.status) {
        case 400:
          // Bad request - show user-friendly message
          alert(`Invalid request: ${data.message}`);
          break;
        case 404:
          // Not found - show fallback content
          console.log('Username not found, showing fallback');
          break;
        case 500:
          // Server error - retry later
          console.log('Server error, will retry in 5 minutes');
          setTimeout(() => fetchInstagramPosts(username, count, type), 300000);
          break;
      }

      return null;
    }

    // Success - use the data
    console.log(`Got ${data.count} ${type} from @${username}`);
    console.log(`Cached: ${data.cached}`);

    return data.data;

  } catch (error) {
    console.error('Network error:', error);
    return null;
  }
}

💻 Code Examples

JavaScript (Fetch API)

Instagram Photos

async function getInstagramPhotos(username, count = 7) {
  const url = `https://scraper.capula.co/api/scrape?username=${username}&count=${count}&type=photos`;

  const response = await fetch(url);
  const data = await response.json();

  if (response.ok) {
    data.data.forEach(post => {
      console.log(`Image: ${post.media_url}`);
      console.log(`Caption: ${post.caption}`);
      console.log(`Link: ${post.permalink}`);
    });
    return data.data;
  }

  return null;
}

// Usage
getInstagramPhotos('pascuccicoffee', 7);

Instagram Reels

async function getInstagramReels(username, count = 5) {
  const url = `https://scraper.capula.co/api/scrape?username=${username}&count=${count}&type=reels`;

  const response = await fetch(url);
  const data = await response.json();

  if (response.ok) {
    data.data.forEach(reel => {
      console.log(`Reel: ${reel.media_url}`);
      console.log(`Caption: ${reel.caption}`);
    });
    return data.data;
  }

  return null;
}

// Usage
getInstagramReels('pascuccicoffee', 5);

Google Reviews

async function getGoogleReviews(organizationId, count = 10) {
  const url = `https://scraper.capula.co/api/reviews?organizationId=${organizationId}&count=${count}`;

  const response = await fetch(url);
  const data = await response.json();

  if (response.ok) {
    data.data.forEach(review => {
      console.log(`⭐ ${review.rating}/5`);
      console.log(`💬 ${review.comment}`);
      if (review.owner_response) {
        console.log(`💼 Owner: ${review.owner_response}`);
      }
    });
    return data.data;
  }

  return null;
}

// Usage
getGoogleReviews('0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d', 10);

Python (Requests)

import requests

def get_instagram_photos(username, count=7):
    url = 'https://scraper.capula.co/api/scrape'
    params = {
        'username': username,
        'count': count,
        'type': 'photos'
    }

    response = requests.get(url, params=params)

    if response.status_code == 200:
        data = response.json()
        for post in data['data']:
            print(f"Image: {post['media_url']}")
            print(f"Caption: {post['caption']}")
        return data['data']

    return None

def get_google_reviews(organization_id, count=10):
    url = 'https://scraper.capula.co/api/reviews'
    params = {
        'organizationId': organization_id,
        'count': count
    }

    response = requests.get(url, params=params)

    if response.status_code == 200:
        data = response.json()
        for review in data['data']:
            print(f"⭐ {review['rating']}/5")
            print(f"💬 {review['comment']}")
        return data['data']

    return None

# Usage
get_instagram_photos('pascuccicoffee', 7)
get_google_reviews('0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d', 10)

PHP

<?php

function getInstagramPhotos($username, $count = 7) {
    $url = 'https://scraper.capula.co/api/scrape?' . http_build_query([
        'username' => $username,
        'count' => $count,
        'type' => 'photos'
    ]);

    $response = file_get_contents($url);
    $data = json_decode($response, true);

    foreach ($data['data'] as $post) {
        echo "Image: {$post['media_url']}\n";
        echo "Caption: {$post['caption']}\n";
    }

    return $data['data'];
}

function getGoogleReviews($organizationId, $count = 10) {
    $url = 'https://scraper.capula.co/api/reviews?' . http_build_query([
        'organizationId' => $organizationId,
        'count' => $count
    ]);

    $response = file_get_contents($url);
    $data = json_decode($response, true);

    foreach ($data['data'] as $review) {
        echo "⭐ {$review['rating']}/5\n";
        echo "💬 {$review['comment']}\n";
    }

    return $data['data'];
}

// Usage
getInstagramPhotos('pascuccicoffee', 7);
getGoogleReviews('0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d', 10);
?>

cURL (Command Line)

# Instagram Photos
curl "https://scraper.capula.co/api/scrape?username=pascuccicoffee&count=7&type=photos"

# Instagram Reels
curl "https://scraper.capula.co/api/scrape?username=pascuccicoffee&count=5&type=reels"

# Google Reviews
curl "https://scraper.capula.co/api/reviews?organizationId=0x89de0b0b3cdbe1d3%3A0x27519164cd8d3b5d&count=10"

# With jq for pretty JSON
curl -s "https://scraper.capula.co/api/reviews?organizationId=0x89de0b0b3cdbe1d3%3A0x27519164cd8d3b5d&count=10" | jq

🚀 Deployment

Current Setup

Container: instagram_scraper Port: 5050 Domain: https://scraper.capula.co Reverse Proxy: Nginx Proxy Manager Networks: instagram_scraper_default, my_shared_proxy_network Restart Policy: always (auto-restart on crash)

Docker Compose

version: '3.8'

services:
  instagram-scraper:
    build: .
    container_name: instagram_scraper
    ports:
      - "5050:5050"
    networks:
      - default
      - my_shared_proxy_network
    environment:
      - RAPIDAPI_KEY=${RAPIDAPI_KEY}
      - INSTAGRAM_USERNAME_TARGET=${INSTAGRAM_USERNAME_TARGET:-pascuccicoffee}
      - POST_COUNT=${POST_COUNT:-7}
      - BASE_URL=${BASE_URL:-https://scraper.capula.co}
    volumes:
      - ./data:/app/data
      - ./logs:/app/logs
    restart: always
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:5050/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

networks:
  my_shared_proxy_network:
    external: true

Environment Variables

Required: - RAPIDAPI_KEY - Your RapidAPI key (works for both APIs)

Optional: - INSTAGRAM_USERNAME_TARGET - Default Instagram username (for cron) - POST_COUNT - Default number of posts (for cron) - BASE_URL - Your domain for absolute media URLs

Nginx Proxy Manager Setup

Proxy Host Configuration:

Domain Names: scraper.capula.co
Scheme: http
Forward Hostname/IP: instagram_scraper
Forward Port: 5050
Block Common Exploits: Yes
Websockets Support: No

SSL:
- Force SSL: Yes
- HTTP/2 Support: Yes
- HSTS Enabled: Yes
- SSL Certificate: Let's Encrypt

Commands

# Start services
docker-compose up -d

# View logs
docker-compose logs -f

# Restart
docker-compose restart

# Rebuild (after code changes)
docker-compose down && docker-compose up -d --build

# Check health
curl https://scraper.capula.co/health

# Manual scrape (Instagram)
docker exec instagram_scraper python scraper.py

# View cached data
ls -lh /home/docker/instagram_scraper/data/

🔧 Troubleshooting

Instagram Issues

"Username not found" error: - Verify username exists on Instagram - Check if account is public (private accounts don't work) - Try the username directly: https://instagram.com/{username}

No posts returned: - Account may have no posts - All posts might be videos (if requesting type=photos) - Check response: "count": 0 indicates no matching content

Slow response times: - First request: 5-10 seconds (fresh scrape from Instagram) - Cached requests: <100ms (instant) - Check "cached": false means it's fetching fresh data

Google Reviews Issues

"organizationId not found" error: - Verify the organization ID format: 0xHEX:0xHEX - Check Google Maps URL contains the ID - Business must have Google My Business profile

No reviews returned: - Business may have no reviews - Reviews may be disabled - Check if business exists: Google Maps search

Wrong reviews shown: - Verify organizationId is correct - Check you copied the full ID (including both hex parts) - Ensure URL encoding if special characters

API/Server Issues

502 Bad Gateway: - Container not running: docker ps | grep instagram - Not on proxy network: docker network connect my_shared_proxy_network instagram_scraper - Check Nginx Proxy Manager settings

503 Service Unavailable: - Container unhealthy: docker logs instagram_scraper - Restart: docker-compose restart

SSL Certificate Error: - Let's Encrypt not configured - Force SSL not enabled - Check Nginx Proxy Manager SSL settings


📊 Monitoring

Health Check

curl https://scraper.capula.co/health

Response:

{
  "status": "healthy",
  "timestamp": "2025-10-09T01:24:27Z"
}

Cache Status

Instagram (check cached field):

curl "https://scraper.capula.co/api/scrape?username=pascuccicoffee&count=1&type=photos" | jq '{cached, cache_expires_in_hours}'

Google Reviews (check cached field):

curl "https://scraper.capula.co/api/reviews?organizationId=0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d&count=1" | jq '{cached, cache_expires_in_days}'

List All Cached Feeds

curl https://scraper.capula.co/feeds | jq

📚 Additional Documentation

Detailed Guides: - Shopify Integration Guide - Complete Shopify app integration - Google Reviews Guide - Detailed Google Reviews documentation - Local Files - Markdown documentation files

API Reference: - Health: GET /health - Instagram: GET /api/scrape - Google Reviews: GET /api/reviews - Business Search Helper: GET /api/search-business


🎯 Use Cases

E-Commerce Stores

Shopify Apps

Marketing Websites


🔒 Security & Privacy


📈 Performance

Response Times:

Scenario Instagram Google Reviews
Cached (hit) <100ms <100ms
Fresh scrape 5-10 sec 5-10 sec
Not found 3-5 sec 3-5 sec

Image Optimization: - Format: WebP (modern, compressed) - Max dimension: 1200px - Quality: 85% - Typical size: 50-200KB (vs 1-5MB original)


🚧 Future Enhancements

Planned features: - [ ] Twitter/X scraper - [ ] TikTok scraper - [ ] LinkedIn posts - [ ] Yelp reviews - [ ] Facebook reviews - [ ] Webhook notifications - [ ] Admin dashboard - [ ] Analytics/usage stats


📞 Support

Live Service: https://scraper.capula.co Health Check: https://scraper.capula.co/health Documentation: https://scraper.capula.co/docs

For Issues: 1. Check container logs: docker logs instagram_scraper 2. Verify API status: /health endpoint 3. Review RapidAPI dashboard for quota 4. Check network connectivity


📄 License

MIT


✨ Summary

Three simple endpoints. Zero configuration. Unlimited possibilities.

// Instagram Photos
fetch('https://scraper.capula.co/api/scrape?username=nike&count=10&type=photos')

// Instagram Reels
fetch('https://scraper.capula.co/api/scrape?username=nike&count=10&type=reels')

// Google Reviews
fetch('https://scraper.capula.co/api/reviews?organizationId=YOUR_ID&count=20')

Everything you need: - ✅ Automatic caching (24h Instagram, 7d Google Reviews) - ✅ Rate limit protection - ✅ Error handling - ✅ Optimized images - ✅ CORS enabled - ✅ SSL/HTTPS - ✅ Auto-restart on crash - ✅ Complete documentation

Start building today! 🚀