Instagram Scraper API - Shopify Integration Guide

🎯 Overview

This API allows your Shopify app to fetch: - Instagram posts (photos or reels) from any public Instagram user on-demand - Google Reviews for any business with a Google Maps profile

The API automatically caches data to stay within free API limits: - Instagram: 24-hour cache (35 requests/day limit) - Google Reviews: 7-day cache (500 requests/month limit)

Base URL: https://scraper.capula.co


📡 API Endpoints

1. Instagram Scraping: GET /api/scrape

Fetch Instagram photos or reels for any username.

2. Google Reviews: GET /api/reviews

Fetch Google Reviews for any business.

URL:

https://scraper.capula.co/api/scrape

📋 Request Parameters

Parameter Required Type Default Description
username YES string - Instagram username (without @)
count No integer 7 Number of posts to return (min: 1, max: 50)
type No string photos Type of media: photos or reels

📝 Examples

Example 1: Get 7 photos from @pascuccicoffee

GET https://scraper.capula.co/api/scrape?username=pascuccicoffee&count=7&type=photos

Example 2: Get 10 reels from @pascuccicoffee

GET https://scraper.capula.co/api/scrape?username=pascuccicoffee&count=10&type=reels

Example 3: Get 5 photos from @nike (default type is photos)

GET https://scraper.capula.co/api/scrape?username=nike&count=5

✅ Success Response

Status Code: 200 OK

Response Body:

{
  "data": [
    {
      "media_url": "https://scraper.capula.co/media/pascuccicoffee_123456.webp",
      "permalink": "https://www.instagram.com/p/ABC123/",
      "timestamp": "2025-10-08T09:30:00Z",
      "caption": "Fresh coffee this morning! #coffee",
      "media_type": "photo"
    },
    {
      "media_url": "https://scraper.capula.co/media/pascuccicoffee_789012.webp",
      "permalink": "https://www.instagram.com/p/DEF456/",
      "timestamp": "2025-10-07T14:20:00Z",
      "caption": "New blend available now",
      "media_type": "photo"
    }
  ],
  "count": 2,
  "username": "pascuccicoffee",
  "type": "photos",
  "cached": true,
  "scraped_at": "2025-10-09T00:36:13Z",
  "cache_expires_in_hours": 18.5
}

Response Fields:

Field Type Description
data array Array of Instagram posts
data[].media_url string Direct URL to the optimized image (WebP format)
data[].permalink string Link to the original Instagram post
data[].timestamp string When the post was published (ISO 8601 format)
data[].caption string Post caption text (truncated to 200 chars)
data[].media_type string Type of media: "photo" or "reel"
count integer Number of posts returned
username string Instagram username that was scraped
type string Type of media requested: "photos" or "reels"
cached boolean true if served from cache, false if freshly scraped
scraped_at string When the data was originally scraped
cache_expires_in_hours float Hours remaining until cache expires

❌ Error Responses

Missing Username (400 Bad Request)

{
  "error": "Missing parameter",
  "message": "The \"username\" parameter is required",
  "example": "/api/scrape?username=pascuccicoffee&count=7&type=photos"
}

Invalid Type (400 Bad Request)

{
  "error": "Invalid type",
  "message": "Type must be \"photos\" or \"reels\"",
  "example": "/api/scrape?username=pascuccicoffee&count=7&type=photos"
}

Invalid Count (400 Bad Request)

{
  "error": "Invalid count",
  "message": "Count must be between 1 and 50",
  "provided": 100
}

Username Not Found (404 Not Found)

{
  "error": "No posts found",
  "message": "Could not fetch posts for @invaliduser123. Username may not exist or may be private.",
  "username": "invaliduser123"
}

Server Error (500 Internal Server Error)

{
  "error": "Internal server error",
  "message": "Error details here..."
}

🔄 Caching Behavior (IMPORTANT!)

How Caching Works:

  1. First Request: When you request a username for the first time (or after 24 hours), the API will:
  2. Fetch fresh data from Instagram
  3. Save it to cache
  4. Return "cached": false
  5. This counts as 1 API request

  6. Subsequent Requests (within 24 hours): All requests for the same username will:

  7. Serve data from cache (instant response)
  8. Return "cached": true
  9. This does NOT count as an API request

  10. After 24 Hours: The cache expires and the next request will fetch fresh data again.

Staying Within API Limits:

Example Scenario: - Website A requests @pascuccicoffee at 9 AM → 1 API request (fresh scrape) - Website A requests @pascuccicoffee at 3 PM → 0 API requests (cached) - Website B requests @pascuccicoffee at 6 PM → 0 API requests (cached) - Website A requests @nike at 10 AM → 1 API request (fresh scrape) - Total API requests used: 2 out of 35


💻 Code Examples

JavaScript (Fetch API)

async function getInstagramPosts(username, count = 7, type = 'photos') {
  const url = `https://scraper.capula.co/api/scrape?username=${username}&count=${count}&type=${type}`;

  try {
    const response = await fetch(url);
    const data = await response.json();

    if (response.ok) {
      console.log(`Got ${data.count} ${type} from @${username}`);
      console.log(`Cached: ${data.cached}`);
      console.log(`Cache expires in: ${data.cache_expires_in_hours} hours`);

      // Use the posts
      data.data.forEach(post => {
        console.log(`Image: ${post.media_url}`);
        console.log(`Caption: ${post.caption}`);
        console.log(`Link: ${post.permalink}`);
      });

      return data.data;
    } else {
      console.error('Error:', data.error, data.message);
      return null;
    }
  } catch (error) {
    console.error('Request failed:', error);
    return null;
  }
}

// Usage examples:
getInstagramPosts('pascuccicoffee', 7, 'photos');
getInstagramPosts('nike', 10, 'reels');

JavaScript (Axios)

const axios = require('axios');

async function getInstagramPosts(username, count = 7, type = 'photos') {
  try {
    const response = await axios.get('https://scraper.capula.co/api/scrape', {
      params: {
        username: username,
        count: count,
        type: type
      }
    });

    console.log(`Got ${response.data.count} ${type} from @${username}`);
    return response.data.data;

  } catch (error) {
    if (error.response) {
      // Server responded with error
      console.error('Error:', error.response.data.error);
      console.error('Message:', error.response.data.message);
    } else {
      // Network error
      console.error('Request failed:', error.message);
    }
    return null;
  }
}

// Usage:
getInstagramPosts('pascuccicoffee', 7, 'photos');

Python (Requests)

import requests

def get_instagram_posts(username, count=7, media_type='photos'):
    url = 'https://scraper.capula.co/api/scrape'
    params = {
        'username': username,
        'count': count,
        'type': media_type
    }

    try:
        response = requests.get(url, params=params)
        response.raise_for_status()

        data = response.json()
        print(f"Got {data['count']} {media_type} from @{username}")
        print(f"Cached: {data['cached']}")
        print(f"Cache expires in: {data['cache_expires_in_hours']} hours")

        for post in data['data']:
            print(f"Image: {post['media_url']}")
            print(f"Caption: {post['caption']}")

        return data['data']

    except requests.exceptions.HTTPError as e:
        error_data = e.response.json()
        print(f"Error: {error_data.get('error')}")
        print(f"Message: {error_data.get('message')}")
        return None
    except Exception as e:
        print(f"Request failed: {e}")
        return None

# Usage:
get_instagram_posts('pascuccicoffee', 7, 'photos')

PHP

<?php

function getInstagramPosts($username, $count = 7, $type = 'photos') {
    $url = 'https://scraper.capula.co/api/scrape?' . http_build_query([
        'username' => $username,
        'count' => $count,
        'type' => $type
    ]);

    $response = file_get_contents($url);
    $data = json_decode($response, true);

    if ($data) {
        echo "Got {$data['count']} {$type} from @{$username}\n";
        echo "Cached: " . ($data['cached'] ? 'yes' : 'no') . "\n";

        foreach ($data['data'] as $post) {
            echo "Image: {$post['media_url']}\n";
            echo "Caption: {$post['caption']}\n";
        }

        return $data['data'];
    }

    return null;
}

// Usage:
getInstagramPosts('pascuccicoffee', 7, 'photos');
?>

cURL (Command Line)

# Get 7 photos from @pascuccicoffee
curl "https://scraper.capula.co/api/scrape?username=pascuccicoffee&count=7&type=photos"

# Get 10 reels from @nike
curl "https://scraper.capula.co/api/scrape?username=nike&count=10&type=reels"

🎨 Displaying Images in Your Shopify App

The media_url field contains a direct link to an optimized WebP image. You can use it directly in HTML:

<img src="https://scraper.capula.co/media/pascuccicoffee_123456.webp" alt="Instagram post">

Image Details: - Format: WebP (modern, compressed format) - Max Size: 1200px (maintains aspect ratio) - Quality: 85% (high quality, optimized file size) - Cache: Images are cached with 12-hour headers


🔧 Testing the API

Test Endpoint

https://scraper.capula.co/health

Response:

{
  "status": "healthy",
  "timestamp": "2025-10-09T00:36:34Z"
}

Test Scraping

https://scraper.capula.co/api/scrape?username=pascuccicoffee&count=3&type=photos

📊 Monitoring Your Usage

Check Cache Status

The API response includes helpful fields to monitor cache usage:

{
  "cached": true,
  "scraped_at": "2025-10-09T00:36:13Z",
  "cache_expires_in_hours": 18.5
}

Best Practices for Shopify Apps

  1. Don't make repeated requests for the same username within 24 hours
  2. Store the response in your Shopify app's database/cache
  3. Check cached field to know if you're using API quota
  4. Monitor cache_expires_in_hours to schedule refreshes efficiently
  5. Handle errors gracefully (username not found, rate limits, etc.)

⚠️ Important Notes

1. Rate Limiting

2. Data Freshness

3. Private Accounts

4. Media Types

5. Count Limits


🆘 Troubleshooting

"Username not found" error

Slow response times

"Configuration error"

Images not loading


📞 Support

API Status: https://scraper.capula.co/health Documentation: https://scraper.capula.co/docs Base URL: https://scraper.capula.co

For technical support or API upgrades, please contact your system administrator.


🚀 Quick Start Checklist


📈 Response Time Expectations

Scenario Response Time
Cached data (within 24h) <100ms
Fresh scrape (first time) 5-10 seconds
Username not found ~3-5 seconds


⭐ Google Reviews Integration

Endpoint: GET /api/reviews

Fetch Google Reviews for any business with a Google Maps profile.

URL:

https://scraper.capula.co/api/reviews

Request Parameters

Parameter Required Type Default Description
organizationId YES string - Google Maps organization ID
count No integer 10 Number of reviews (min: 1, max: 50)

How to Find organizationId

The organizationId is extracted from the Google Maps URL for your business:

Step-by-Step Guide:

  1. Go to Google Maps
  2. Search for your business name (e.g., "Empire State Plumbing NY")
  3. Click on the business to open its profile
  4. Look at the URL in the address bar
  5. Find the long code after 1s in the URL
  6. That's your organizationId!

Example URL:

https://www.google.com/maps/place/...data=!3m1!4b1!4m6!3m5!1s0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d!...
                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                                             This is the organizationId

The organizationId in this example: 0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d

Example Request

GET https://scraper.capula.co/api/reviews?organizationId=0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d&count=10

Success Response (200 OK)

{
  "data": [
    {
      "rating": 5,
      "comment": "Excellent service! Very professional and quick response.",
      "date": "2025-10-01T14:30:00Z",
      "author": "John Smith",
      "photos": [],
      "owner_response": "Thank you for your kind words!"
    },
    {
      "rating": 4,
      "comment": "Good quality work, would recommend.",
      "date": "2025-09-28T10:15:00Z",
      "author": "Jane Doe",
      "photos": ["https://...photo1.jpg"],
      "owner_response": null
    }
  ],
  "count": 2,
  "organizationId": "0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d",
  "cached": true,
  "scraped_at": "2025-10-09T01:24:27Z",
  "cache_expires_in_days": 5.2
}

Response Fields

Field Type Description
data array Array of Google reviews
data[].rating integer Star rating (1-5)
data[].comment string Review text
data[].date string When the review was posted (ISO 8601)
data[].author string Reviewer's name
data[].photos array URLs of review photos (if any)
data[].owner_response string or null Business owner's response
count integer Number of reviews returned
organizationId string The organization ID that was queried
cached boolean true if served from cache, false if freshly scraped
scraped_at string When the data was originally scraped
cache_expires_in_days float Days remaining until cache expires

Caching Behavior (Google Reviews)

  1. First Request: Fetches from Google Maps API, caches for 7 days → Uses 1 API request
  2. Subsequent Requests (within 7 days): Served from cache → Uses 0 API requests
  3. After 7 Days: Cache expires, next request fetches fresh data

Why 7 days? Reviews don't change as frequently as Instagram posts, so we cache longer to save API quota.

Code Example: JavaScript

async function getGoogleReviews(organizationId, count = 10) {
  const url = `https://scraper.capula.co/api/reviews?organizationId=${organizationId}&count=${count}`;

  try {
    const response = await fetch(url);
    const data = await response.json();

    if (response.ok) {
      console.log(`Got ${data.count} reviews`);
      console.log(`Cached: ${data.cached}`);
      console.log(`Cache expires in: ${data.cache_expires_in_days} days`);

      data.data.forEach(review => {
        console.log(`⭐ ${review.rating}/5 - ${review.author}`);
        console.log(`Comment: ${review.comment}`);
        if (review.owner_response) {
          console.log(`Owner replied: ${review.owner_response}`);
        }
      });

      return data.data;
    } else {
      console.error('Error:', data.error, data.message);
      return null;
    }
  } catch (error) {
    console.error('Request failed:', error);
    return null;
  }
}

// Usage:
getGoogleReviews('0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d', 10);

Error Responses

Missing organizationId (400 Bad Request):

{
  "error": "Missing parameter",
  "message": "The \"organizationId\" parameter is required",
  "example": "/api/reviews?organizationId=0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d&count=10",
  "help": "Find organizationId from Google Maps URL or use Google Maps Place ID"
}

Invalid Count (400 Bad Request):

{
  "error": "Invalid count",
  "message": "Count must be between 1 and 50",
  "provided": 100
}

No Reviews Found (404 Not Found):

{
  "error": "No reviews found",
  "message": "Could not fetch reviews for organization 0xinvalid",
  "organizationId": "0xinvalid"
}

Best Practices for Reviews

  1. Find organizationId once and store it in your database
  2. Don't request more frequently than 7 days per business
  3. Cache the response in your Shopify app
  4. Check cached field to monitor API quota usage
  5. Handle errors gracefully (invalid ID, no reviews, etc.)

✨ Summary

Two main endpoints for your Shopify app:

Instagram:

GET https://scraper.capula.co/api/scrape?username={user}&count={num}&type={photos|reels}

Google Reviews:

GET https://scraper.capula.co/api/reviews?organizationId={id}&count={num}

The API handles: - ✅ Fetching data from Instagram and Google - ✅ Smart caching (24h for Instagram, 7d for Reviews) - ✅ Serving optimized images - ✅ Rate limit management - ✅ Error handling

Your Shopify app just makes simple HTTP GET requests and receives ready-to-use data.


📚 Additional Documentation