Instagram Scraper API - Shopify Integration Guide

🎯 Overview

This API allows your Shopify app to fetch: - Instagram posts (photos or reels) from any public Instagram user on-demand - Google Reviews for any business with a Google Maps profile

The API automatically caches data to stay within free API limits: - Instagram: 24-hour cache (35 requests/day limit) - Google Reviews: 7-day cache (500 requests/month limit)

Base URL: https://scraper.capula.co

📡 API Endpoints

1. Instagram Scraping: `GET /api/scrape`

Fetch Instagram photos or reels for any username.

2. Google Reviews: `GET /api/reviews`

Fetch Google Reviews for any business.

URL:

https://scraper.capula.co/api/scrape

📋 Request Parameters

Parameter	Required	Type	Default	Description
`username`	YES	string	-	Instagram username (without @)
`count`	No	integer	7	Number of posts to return (min: 1, max: 50)
`type`	No	string	`photos`	Type of media: `photos` or `reels`

📝 Examples

Example 1: Get 7 photos from @pascuccicoffee

GET https://scraper.capula.co/api/scrape?username=pascuccicoffee&count=7&type=photos

Example 2: Get 10 reels from @pascuccicoffee

GET https://scraper.capula.co/api/scrape?username=pascuccicoffee&count=10&type=reels

Example 3: Get 5 photos from @nike (default type is photos)

GET https://scraper.capula.co/api/scrape?username=nike&count=5

✅ Success Response

Status Code: 200 OK

Response Body:

{
  "data": [
    {
      "media_url": "https://scraper.capula.co/media/pascuccicoffee_123456.webp",
      "permalink": "https://www.instagram.com/p/ABC123/",
      "timestamp": "2025-10-08T09:30:00Z",
      "caption": "Fresh coffee this morning! #coffee",
      "media_type": "photo"
    },
    {
      "media_url": "https://scraper.capula.co/media/pascuccicoffee_789012.webp",
      "permalink": "https://www.instagram.com/p/DEF456/",
      "timestamp": "2025-10-07T14:20:00Z",
      "caption": "New blend available now",
      "media_type": "photo"
    }
  ],
  "count": 2,
  "username": "pascuccicoffee",
  "type": "photos",
  "cached": true,
  "scraped_at": "2025-10-09T00:36:13Z",
  "cache_expires_in_hours": 18.5
}

Response Fields:

Field	Type	Description
`data`	array	Array of Instagram posts
`data[].media_url`	string	Direct URL to the optimized image (WebP format)
`data[].permalink`	string	Link to the original Instagram post
`data[].timestamp`	string	When the post was published (ISO 8601 format)
`data[].caption`	string	Post caption text (truncated to 200 chars)
`data[].media_type`	string	Type of media: "photo" or "reel"
`count`	integer	Number of posts returned
`username`	string	Instagram username that was scraped
`type`	string	Type of media requested: "photos" or "reels"
`cached`	boolean	true if served from cache, false if freshly scraped
`scraped_at`	string	When the data was originally scraped
`cache_expires_in_hours`	float	Hours remaining until cache expires

❌ Error Responses

Missing Username (400 Bad Request)

{
  "error": "Missing parameter",
  "message": "The \"username\" parameter is required",
  "example": "/api/scrape?username=pascuccicoffee&count=7&type=photos"
}

Invalid Type (400 Bad Request)

{
  "error": "Invalid type",
  "message": "Type must be \"photos\" or \"reels\"",
  "example": "/api/scrape?username=pascuccicoffee&count=7&type=photos"
}

Invalid Count (400 Bad Request)

{
  "error": "Invalid count",
  "message": "Count must be between 1 and 50",
  "provided": 100
}

Username Not Found (404 Not Found)

{
  "error": "No posts found",
  "message": "Could not fetch posts for @invaliduser123. Username may not exist or may be private.",
  "username": "invaliduser123"
}

Server Error (500 Internal Server Error)

{
  "error": "Internal server error",
  "message": "Error details here..."
}

🔄 Caching Behavior (IMPORTANT!)

How Caching Works:

First Request: When you request a username for the first time (or after 24 hours), the API will:
Fetch fresh data from Instagram
Save it to cache
Return "cached": false
This counts as 1 API request
Subsequent Requests (within 24 hours): All requests for the same username will:
Serve data from cache (instant response)
Return "cached": true
This does NOT count as an API request
After 24 Hours: The cache expires and the next request will fetch fresh data again.

Staying Within API Limits:

Free tier limit: 35 API requests per day
Cache duration: 24 hours per username
Best practice: Don't request the same username more than once per day

Example Scenario: - Website A requests @pascuccicoffee at 9 AM → 1 API request (fresh scrape) - Website A requests @pascuccicoffee at 3 PM → 0 API requests (cached) - Website B requests @pascuccicoffee at 6 PM → 0 API requests (cached) - Website A requests @nike at 10 AM → 1 API request (fresh scrape) - Total API requests used: 2 out of 35

💻 Code Examples

JavaScript (Fetch API)

async function getInstagramPosts(username, count = 7, type = 'photos') {
  const url = `https://scraper.capula.co/api/scrape?username=${username}&count=${count}&type=${type}`;

  try {
    const response = await fetch(url);
    const data = await response.json();

    if (response.ok) {
      console.log(`Got ${data.count} ${type} from @${username}`);
      console.log(`Cached: ${data.cached}`);
      console.log(`Cache expires in: ${data.cache_expires_in_hours} hours`);

      // Use the posts
      data.data.forEach(post => {
        console.log(`Image: ${post.media_url}`);
        console.log(`Caption: ${post.caption}`);
        console.log(`Link: ${post.permalink}`);
      });

      return data.data;
    } else {
      console.error('Error:', data.error, data.message);
      return null;
    }
  } catch (error) {
    console.error('Request failed:', error);
    return null;
  }
}

// Usage examples:
getInstagramPosts('pascuccicoffee', 7, 'photos');
getInstagramPosts('nike', 10, 'reels');

JavaScript (Axios)

const axios = require('axios');

async function getInstagramPosts(username, count = 7, type = 'photos') {
  try {
    const response = await axios.get('https://scraper.capula.co/api/scrape', {
      params: {
        username: username,
        count: count,
        type: type
      }
    });

    console.log(`Got ${response.data.count} ${type} from @${username}`);
    return response.data.data;

  } catch (error) {
    if (error.response) {
      // Server responded with error
      console.error('Error:', error.response.data.error);
      console.error('Message:', error.response.data.message);
    } else {
      // Network error
      console.error('Request failed:', error.message);
    }
    return null;
  }
}

// Usage:
getInstagramPosts('pascuccicoffee', 7, 'photos');

Python (Requests)

import requests

def get_instagram_posts(username, count=7, media_type='photos'):
    url = 'https://scraper.capula.co/api/scrape'
    params = {
        'username': username,
        'count': count,
        'type': media_type
    }

    try:
        response = requests.get(url, params=params)
        response.raise_for_status()

        data = response.json()
        print(f"Got {data['count']} {media_type} from @{username}")
        print(f"Cached: {data['cached']}")
        print(f"Cache expires in: {data['cache_expires_in_hours']} hours")

        for post in data['data']:
            print(f"Image: {post['media_url']}")
            print(f"Caption: {post['caption']}")

        return data['data']

    except requests.exceptions.HTTPError as e:
        error_data = e.response.json()
        print(f"Error: {error_data.get('error')}")
        print(f"Message: {error_data.get('message')}")
        return None
    except Exception as e:
        print(f"Request failed: {e}")
        return None

# Usage:
get_instagram_posts('pascuccicoffee', 7, 'photos')

PHP

<?php

function getInstagramPosts($username, $count = 7, $type = 'photos') {
    $url = 'https://scraper.capula.co/api/scrape?' . http_build_query([
        'username' => $username,
        'count' => $count,
        'type' => $type
    ]);

    $response = file_get_contents($url);
    $data = json_decode($response, true);

    if ($data) {
        echo "Got {$data['count']} {$type} from @{$username}\n";
        echo "Cached: " . ($data['cached'] ? 'yes' : 'no') . "\n";

        foreach ($data['data'] as $post) {
            echo "Image: {$post['media_url']}\n";
            echo "Caption: {$post['caption']}\n";
        }

        return $data['data'];
    }

    return null;
}

// Usage:
getInstagramPosts('pascuccicoffee', 7, 'photos');
?>

cURL (Command Line)

# Get 7 photos from @pascuccicoffee
curl "https://scraper.capula.co/api/scrape?username=pascuccicoffee&count=7&type=photos"

# Get 10 reels from @nike
curl "https://scraper.capula.co/api/scrape?username=nike&count=10&type=reels"

🎨 Displaying Images in Your Shopify App

The media_url field contains a direct link to an optimized WebP image. You can use it directly in HTML:

<img src="https://scraper.capula.co/media/pascuccicoffee_123456.webp" alt="Instagram post">

Image Details: - Format: WebP (modern, compressed format) - Max Size: 1200px (maintains aspect ratio) - Quality: 85% (high quality, optimized file size) - Cache: Images are cached with 12-hour headers

🔧 Testing the API

Test Endpoint

https://scraper.capula.co/health

Response:

{
  "status": "healthy",
  "timestamp": "2025-10-09T00:36:34Z"
}

Test Scraping

https://scraper.capula.co/api/scrape?username=pascuccicoffee&count=3&type=photos

📊 Monitoring Your Usage

Check Cache Status

The API response includes helpful fields to monitor cache usage:

{
  "cached": true,
  "scraped_at": "2025-10-09T00:36:13Z",
  "cache_expires_in_hours": 18.5
}

cached: true → Data served from cache (no API request used)
cached: false → Fresh scrape (1 API request used)
cache_expires_in_hours → Time remaining before next scrape is needed

Best Practices for Shopify Apps

Don't make repeated requests for the same username within 24 hours
Store the response in your Shopify app's database/cache
Check cached field to know if you're using API quota
Monitor cache_expires_in_hours to schedule refreshes efficiently
Handle errors gracefully (username not found, rate limits, etc.)

⚠️ Important Notes

1. Rate Limiting

Maximum 35 API requests per day (free tier)
Each unique username = 1 request per 24 hours
Cached responses don't count toward the limit
Plan accordingly if you have many different usernames

2. Data Freshness

Data is cached for 24 hours
If you need fresh data more often, contact us about upgrading

3. Private Accounts

Only public Instagram accounts can be scraped
Private accounts will return a 404 error

4. Media Types

type=photos → Returns only photo posts (no videos/reels)
type=reels → Returns only Instagram Reels
Videos are excluded from photos

5. Count Limits

Minimum: 1 post
Maximum: 50 posts per request
Default: 7 posts

🆘 Troubleshooting

"Username not found" error

Check spelling (usernames are case-insensitive)
Verify account exists on Instagram
Check if account is public (private accounts won't work)

Slow response times

First request for a username takes ~5-10 seconds (live scraping)
Cached requests are instant (<100ms)

"Configuration error"

Server-side issue with RapidAPI key
Contact support

Images not loading

Images are served from https://scraper.capula.co/media/
Check CORS settings (already enabled on our end)
Verify media_url is valid

📞 Support

API Status: https://scraper.capula.co/health Documentation: https://scraper.capula.co/docs Base URL: https://scraper.capula.co

For technical support or API upgrades, please contact your system administrator.

🚀 Quick Start Checklist

[ ] Test the health endpoint: https://scraper.capula.co/health
[ ] Test scraping: https://scraper.capula.co/api/scrape?username=pascuccicoffee&count=3&type=photos
[ ] Integrate the API into your Shopify app
[ ] Implement caching on your end to avoid repeated requests
[ ] Handle errors (404, 400, 500) gracefully
[ ] Monitor cached field to track API usage
[ ] Deploy to production

📈 Response Time Expectations

Scenario	Response Time
Cached data (within 24h)	<100ms
Fresh scrape (first time)	5-10 seconds
Username not found	~3-5 seconds

⭐ Google Reviews Integration

Endpoint: `GET /api/reviews`

Fetch Google Reviews for any business with a Google Maps profile.

URL:

https://scraper.capula.co/api/reviews

Request Parameters

Parameter	Required	Type	Default	Description
`organizationId`	YES	string	-	Google Maps organization ID
`count`	No	integer	10	Number of reviews (min: 1, max: 50)

How to Find organizationId

The organizationId is extracted from the Google Maps URL for your business:

Step-by-Step Guide:

Go to Google Maps
Search for your business name (e.g., "Empire State Plumbing NY")
Click on the business to open its profile
Look at the URL in the address bar
Find the long code after 1s in the URL
That's your organizationId!

Example URL:

https://www.google.com/maps/place/...data=!3m1!4b1!4m6!3m5!1s0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d!...
                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                                             This is the organizationId

The organizationId in this example: 0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d

Example Request

GET https://scraper.capula.co/api/reviews?organizationId=0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d&count=10

Success Response (200 OK)

{
  "data": [
    {
      "rating": 5,
      "comment": "Excellent service! Very professional and quick response.",
      "date": "2025-10-01T14:30:00Z",
      "author": "John Smith",
      "photos": [],
      "owner_response": "Thank you for your kind words!"
    },
    {
      "rating": 4,
      "comment": "Good quality work, would recommend.",
      "date": "2025-09-28T10:15:00Z",
      "author": "Jane Doe",
      "photos": ["https://...photo1.jpg"],
      "owner_response": null
    }
  ],
  "count": 2,
  "organizationId": "0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d",
  "cached": true,
  "scraped_at": "2025-10-09T01:24:27Z",
  "cache_expires_in_days": 5.2
}

Response Fields

Field	Type	Description
`data`	array	Array of Google reviews
`data[].rating`	integer	Star rating (1-5)
`data[].comment`	string	Review text
`data[].date`	string	When the review was posted (ISO 8601)
`data[].author`	string	Reviewer's name
`data[].photos`	array	URLs of review photos (if any)
`data[].owner_response`	string or null	Business owner's response
`count`	integer	Number of reviews returned
`organizationId`	string	The organization ID that was queried
`cached`	boolean	true if served from cache, false if freshly scraped
`scraped_at`	string	When the data was originally scraped
`cache_expires_in_days`	float	Days remaining until cache expires

Caching Behavior (Google Reviews)

First Request: Fetches from Google Maps API, caches for 7 days → Uses 1 API request
Subsequent Requests (within 7 days): Served from cache → Uses 0 API requests
After 7 Days: Cache expires, next request fetches fresh data

Why 7 days? Reviews don't change as frequently as Instagram posts, so we cache longer to save API quota.

Code Example: JavaScript

async function getGoogleReviews(organizationId, count = 10) {
  const url = `https://scraper.capula.co/api/reviews?organizationId=${organizationId}&count=${count}`;

  try {
    const response = await fetch(url);
    const data = await response.json();

    if (response.ok) {
      console.log(`Got ${data.count} reviews`);
      console.log(`Cached: ${data.cached}`);
      console.log(`Cache expires in: ${data.cache_expires_in_days} days`);

      data.data.forEach(review => {
        console.log(`⭐ ${review.rating}/5 - ${review.author}`);
        console.log(`Comment: ${review.comment}`);
        if (review.owner_response) {
          console.log(`Owner replied: ${review.owner_response}`);
        }
      });

      return data.data;
    } else {
      console.error('Error:', data.error, data.message);
      return null;
    }
  } catch (error) {
    console.error('Request failed:', error);
    return null;
  }
}

// Usage:
getGoogleReviews('0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d', 10);

Error Responses

Missing organizationId (400 Bad Request):

{
  "error": "Missing parameter",
  "message": "The \"organizationId\" parameter is required",
  "example": "/api/reviews?organizationId=0x89de0b0b3cdbe1d3:0x27519164cd8d3b5d&count=10",
  "help": "Find organizationId from Google Maps URL or use Google Maps Place ID"
}

Invalid Count (400 Bad Request):

{
  "error": "Invalid count",
  "message": "Count must be between 1 and 50",
  "provided": 100
}

No Reviews Found (404 Not Found):

{
  "error": "No reviews found",
  "message": "Could not fetch reviews for organization 0xinvalid",
  "organizationId": "0xinvalid"
}

Best Practices for Reviews

Find organizationId once and store it in your database
Don't request more frequently than 7 days per business
Cache the response in your Shopify app
Check cached field to monitor API quota usage
Handle errors gracefully (invalid ID, no reviews, etc.)

✨ Summary

Two main endpoints for your Shopify app:

Instagram:

GET https://scraper.capula.co/api/scrape?username={user}&count={num}&type={photos|reels}

24-hour cache
35 requests/day limit
Photos and reels

Google Reviews:

GET https://scraper.capula.co/api/reviews?organizationId={id}&count={num}

7-day cache
500 requests/month limit
Star ratings, comments, owner responses

The API handles: - ✅ Fetching data from Instagram and Google - ✅ Smart caching (24h for Instagram, 7d for Reviews) - ✅ Serving optimized images - ✅ Rate limit management - ✅ Error handling

Your Shopify app just makes simple HTTP GET requests and receives ready-to-use data.

📚 Additional Documentation

Main Documentation: https://scraper.capula.co/docs
Google Reviews Guide: https://scraper.capula.co/docs/reviews
API Health Check: https://scraper.capula.co/health

🛍️ Shopify Integration Guide