Skip to main content

Command Palette

Search for a command to run...

Optimize GCS Costs with Python: Intelligent Storage Class Recommendations

Updated
β€’6 min read
Optimize GCS Costs with Python: Intelligent Storage Class Recommendations

Google Cloud Storage (GCS) offers multiple storage classes with varying pricing models depending on data access frequency. While "Standard" is great for active data, long-retained infrequently accessed data is better suited for "Nearline", "Coldline", or "Archive".

In this post, we’ll explore a Python-based tool that analyzes your GCS buckets, recommends optimal storage classes based on object age, and estimates cost savings automatically.

Overview of What This Script Does

This script:

  1. Lists all buckets in a specified GCP project.

  2. Inspects each object to determine how old it is.

  3. Suggests a better storage class if applicable.

  4. Calculates monthly and annual savings from switching classes.

  5. Fetches the current lifecycle policy and recommends a better one.

  6. Prints a detailed report and saves it as a JSON file.


πŸ”„ Workflow Diagram

Here’s a high-level visual representation of how the tool operates:

πŸ§ͺ Use Cases

1. Cost Optimization

You might be paying for Standard storage for data that hasn't been accessed in months. This tool suggests when to move data to Nearline, Coldline, or Archive, potentially saving thousands annually.

2. Storage Auditing

Helps with understanding what's inside your buckets and how they're aged. Critical for compliance and cleanup tasks.

3. Policy Recommendations

Lifecycle rules help automate tier transitions. This script suggests sensible rules:

  • 30 days β†’ Nearline

  • 90 days β†’ Coldline

  • 365 days β†’ Archive


🧰 Requirements

  • Python 3.x

  • Google Cloud SDK is configured and authenticated

  • Python package: google-cloud-storage

Install the required package:

pip install google-cloud-storage

Make sure you authenticate using:

gcloud auth application-default login

πŸ” Authenticate with Google Cloud (Service Account Method)

To enable your script to access GCS securely, follow these steps:

βœ… Step 1: Create a Service Account

  1. Go to IAM & Admin β†’ Service Accou****nts

  2. Click β€œ+ CREATE SERVICE ACCOUNT”

  3. Enter a name like gcs-cost-optimizer

  4. Click Create and Continue

βœ… Step 2: Assign IAM Roles

Attach the following role:

  • Storage Viewer (roles/sto``rage.viewe``r) β€” minimum for read access

  • Stor``age Admin (roles/storage.admin) β€” required if you want to manage lifecycle policies

Click Done.

βœ… Step 3: Create and Download a JSON Key

  1. In the service account page, go to the Keys tab.

  2. Click β€œAdd Key” β†’ β€œCreate Ne****w Key”

  3. Select JSON, download, and store it safely:

     vernal-zone-452806-a4-2fd65f98b08a.json
    

πŸ”’ Keep this file secure β€” it contains credentials.

βœ… Step 4: Set the Environment Variable

Export the credential path to your shell:

export GOOGLE_APPLICATION_CREDENTIALS="vernal-zone-452806-a4-2fd65f98b08a.json"

Or inline:

GOOGLE_APPLICATION_CREDENTIALS="vernal-zone-452806-a4-2fd65f98b08a.json" python gcs_cost_opti

πŸ’» Key Code Snippet

Here’s how to kickstart the analysis:

from google.cloud import storage
from datetime import datetime, timezone
import json

# Mumbai region pricing per GB per month in USD
STORAGE_PRICING = {
    "Standard": 0.023,
    "Nearline": 0.016,
    "Coldline": 0.006,
    "Archive": 0.0025
}

def recommend_storage_class(last_modified):
    now = datetime.now(timezone.utc)
    age_days = (now - last_modified).days

    if age_days <= 30:
        return "Standard"
    elif 30 < age_days <= 90:
        return "Nearline"
    elif 90 < age_days <= 365:
        return "Coldline"
    else:
        return "Archive"

def calculate_cost_and_savings(size_bytes, current_class, recommended_class):
    size_gb = size_bytes / (1024 ** 3)
    current_price = STORAGE_PRICING.get(current_class, 0)
    recommended_price = STORAGE_PRICING.get(recommended_class, 0)

    current_monthly_cost = current_price * size_gb
    new_monthly_cost = recommended_price * size_gb

    if current_price <= recommended_price:
        monthly_savings = 0
        annual_savings = 0
    else:
        monthly_savings = current_monthly_cost - new_monthly_cost
        annual_savings = monthly_savings * 12

    return round(current_monthly_cost, 4), round(new_monthly_cost, 4), round(monthly_savings, 4), round(annual_savings, 4)

def suggest_lifecycle_policy():
    return {
        "rule": [
            {"action": {"type": "SetStorageClass", "storageClass": "NEARLINE"}, "condition": {"age": 30}},
            {"action": {"type": "SetStorageClass", "storageClass": "COLDLINE"}, "condition": {"age": 90}},
            {"action": {"type": "SetStorageClass", "storageClass": "ARCHIVE"}, "condition": {"age": 365}}
        ]
    }



def print_report(report):
    print(f"\nProject ID: {report['project_id']}")
    for bucket in report["buckets"]:
        print(f"\nπŸ“¦ Bucket: {bucket['bucket_name']}")

        if "error" in bucket:
            print(f"  ❌ Error: {bucket['error']}")
            continue

        if bucket.get("empty_bucket"):
            print("  ⚠️  Empty bucket.")
        else:
            print("  πŸ“„ Objects:")
            for obj in bucket["objects"]:
                print(f"    β€’ {obj['object_name']}")
                print(f"      Last Modified: {obj['last_modified']}")
                print(f"      Recommended Class: {obj['recommended_storage_class']}")
                print(f"      Monthly Cost (Current): ${obj['estimated_monthly_cost_current']}")
                print(f"      Monthly Cost (After):   ${obj['estimated_monthly_cost_after_policy']}")
                print(f"      Monthly Savings:        ${obj['estimated_monthly_savings']}")
                print(f"      Annual Savings:         ${obj['estimated_annual_savings']}")

        # Print current lifecycle policy
        print("  πŸ“‹ Current Lifecycle Policy:")
        current_policy = bucket.get("current_lifecycle_policy", [])
        if current_policy:
            for i, rule in enumerate(current_policy, 1):
                action = rule.get("action", {})
                condition = rule.get("condition", {})
                print(f"    Rule {i}:")
                print(f"      Action: {action}")
                print(f"      Condition: {condition}")
        else:
            print("    No lifecycle policy set.")

        # Print suggested lifecycle policy
        print("  πŸ’‘ Suggested Lifecycle Policy:")
        for rule in bucket["lifecycle_policy_suggestion"]["rule"]:
            age = rule["condition"]["age"]
            storage_class = rule["action"]["storageClass"]
            print(f"    - After {age} days β†’ {storage_class}")


def list_buckets_and_generate_json_report(project_id, output_file="gcs_report.json"):
    storage_client = storage.Client(project=project_id)
    buckets = storage_client.list_buckets()

    report = {"project_id": project_id, "buckets": []}

    for bucket in buckets:
        bucket_info = {
            "bucket_name": bucket.name,
            "lifecycle_policy_suggestion": suggest_lifecycle_policy(),
            "objects": []
        }

        try:
            # Fetch current lifecycle policy
            bucket.reload()
            bucket_info["current_lifecycle_policy"] = list(bucket.lifecycle_rules)

            # List objects
            blobs = list(storage_client.list_blobs(bucket.name))
            if not blobs:
                bucket_info["empty_bucket"] = True
            else:
                for blob in blobs:
                    recommended_class = recommend_storage_class(blob.updated)
                    current_monthly, new_monthly, monthly_savings, annual_savings = calculate_cost_and_savings(
                        blob.size, "Standard", recommended_class
                    )
                    object_info = {
                        "object_name": blob.name,
                        "last_modified": str(blob.updated),
                        "current_storage_class": "Standard",
                        "recommended_storage_class": recommended_class,
                        "estimated_monthly_cost_current": current_monthly,
                        "estimated_monthly_cost_after_policy": new_monthly,
                        "estimated_monthly_savings": monthly_savings,
                        "estimated_annual_savings": annual_savings
                    }
                    bucket_info["objects"].append(object_info)

        except Exception as e:
            bucket_info["error"] = str(e)

        report["buckets"].append(bucket_info)

    # Print and save
    print_report(report)
    with open(output_file, "w") as f:
        json.dump(report, f, indent=4)
    print(f"\nβœ… Report saved to: {output_file}")

# Example usage
if __name__ == "__main__":
    project_id = "vernal-zone-452806-a4"
    list_buckets_and_generate_json_report(project_id)

πŸ“œ Code Walkthrough

1. Storage Class Cost Mapping

Defines GCP Mumbai pricing:

STORAGE_PRICING = {
    "Standard": 0.023,
    "Nearline": 0.016,
    "Coldline": 0.006,
    "Archive": 0.0025
}

2. Determine Best Storage Class

This function checks how old an object is:

def recommend_storage_class(last_modified):
    age_days = (now - last_modified).days
    ...

3. Cost Comparison

Calculates what you'd save by transitioning storage class:

def calculate_cost_and_savings(size_bytes, current_class, recommended_class):
    ...

4. Lifecycle Policy Suggestion

Hardcoded lifecycle suggestion for automation:

def suggest_lifecycle_policy():
    return {
        "rule": [
            {"action": {"type": "SetStorageClass", "storageClass": "NEARLINE"}, "condition": {"age": 30}},
            ...
        ]
    }

5. Bucket Analysis and Report Generation

The main function:

pythonCopyEditdef list_buckets_and_generate_json_report(project_id):
    ...

πŸ“Š Output Example

Console Printout:

πŸ“¦ Bucket: my-backup-data
  πŸ“„ Objects:
    β€’ database_backup_2022.sql
      Last Modified: 2022-04-10
      Recommended Class: Archive
      Monthly Cost (Current): $0.0344
      Monthly Cost (After):   $0.0037
      Monthly Savings:        $0.0307
      Annual Savings:         $0.3684

JSON Report (partial):

{
  "project_id": "my-gcp-project",
  "buckets": [
    {
      "bucket_name": "my-backup-data",
      "objects": [
        {
          "object_name": "database_backup_2022.sql",
          "recommended_storage_class": "Archive"
        }
      ]
    }
  ]
}

πŸš€ How to Run It

  1. Update the project ID in if __name__ == "__main__" block:
project_id = "your-gcp-project-id"
  1. Run the script:
python gcs_cost_optimizer.py
  1. Review output:
  • Console for a human-readable report

  • gcs_report.json for structured data


βœ… Final Thoughts

This script empowers cloud cost optimization with minimal effort. Automating such audits monthly can lead to:

  • Improved data hygiene

  • Significant cost reductions

  • Strategic policy automation

More from this blog

D

DevOps and Cloud Mastery Online - DevOps' World

34 posts