# Optimize GCS Costs with Python: Intelligent Storage Class Recommendations

Google Cloud Storage (GCS) offers multiple storage classes with varying pricing models depending on data access frequency. While "Standard" is great for active data, long-retained infrequently accessed data is better suited for "Nearline", "Coldline", or "Archive".

In this post, we’ll explore a **Python-based tool** that analyzes your GCS buckets, recommends optimal storage classes based on object age, and estimates cost savings automatically.

## Overview of What This Script Does

This script:

1. **Lists all buckets** in a specified GCP project.
    
2. **Inspects each object** to determine how old it is.
    
3. **Suggests a better storage class** if applicable.
    
4. **Calculates monthly and annual savings** from switching classes.
    
5. **Fetches the current lifecycle policy** and recommends a better one.
    
6. **Prints a detailed report** and saves it as a JSON file.
    

---

## 🔄 Workflow Diagram

Here’s a high-level visual representation of how the tool operates:

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1747552362726/ca9ae05a-342b-4aa2-b6a3-7eb7faf8dec8.png align="center")

## 🧪 Use Cases

### 1\. **Cost Optimization**

You might be paying for `Standard` storage for data that hasn't been accessed in months. This tool suggests when to move data to `Nearline`, `Coldline`, or `Archive`, potentially saving thousands annually.

### 2\. **Storage Auditing**

Helps with understanding what's inside your buckets and how they're aged. Critical for compliance and cleanup tasks.

### 3\. **Policy Recommendations**

Lifecycle rules help automate tier transitions. This script suggests sensible rules:

* > 30 days → `Nearline`
    
* > 90 days → `Coldline`
    
* > 365 days → `Archive`
    

---

## 🧰 Requirements

* Python 3.x
    
* Google Cloud SDK is configured and authenticated
    
* Python package: `google-cloud-storage`
    

Install the required package:

```bash
pip install google-cloud-storage
```

Make sure you authenticate using:

```json
gcloud auth application-default login
```

---

## 🔐 Authenticate with Google Cloud (Service Account Method)

To enable your script to access GCS securely, follow these steps:

### ✅ Step 1: Create a Service Account

1. Go to **IAM & Admin → Service Accou\*\*\*\*nts**
    
2. Click **“+ CREATE SERVICE** **ACCOUNT”**
    
3. Enter a name like `gcs-cost-optimizer`
    
4. Click **Create and Continue**
    

### ✅ Step 2: Assign IAM Roles

Attach the following role:

* `Storage Viewer` (`roles/sto``rage.viewe``r`) — minimum for read access
    
* `Stor``age Admin` (`roles/storage.admin`) — required if you want to manage lifecycle policies
    

Click **Done**.

### ✅ Step 3: Create and Download a JSON Key

1. In the service account page, go to the **Keys** tab.
    
2. Click **“Add Key” → “Create Ne\*\*\*\*w Key”**
    
3. Select **JSON**, download, and store it safely:
    
    ```bash
    vernal-zone-452806-a4-2fd65f98b08a.json
    ```
    

> 🔒 Keep this file secure — it contains credentials.

### ✅ Step 4: Set the Environment Variable

Export the credential path to your shell:

```bash
export GOOGLE_APPLICATION_CREDENTIALS="vernal-zone-452806-a4-2fd65f98b08a.json"
```

Or inline:

```bash
GOOGLE_APPLICATION_CREDENTIALS="vernal-zone-452806-a4-2fd65f98b08a.json" python gcs_cost_opti
```

## 💻 Key Code Snippet

Here’s how to kickstart the analysis:

```bash
from google.cloud import storage
from datetime import datetime, timezone
import json

# Mumbai region pricing per GB per month in USD
STORAGE_PRICING = {
    "Standard": 0.023,
    "Nearline": 0.016,
    "Coldline": 0.006,
    "Archive": 0.0025
}

def recommend_storage_class(last_modified):
    now = datetime.now(timezone.utc)
    age_days = (now - last_modified).days

    if age_days <= 30:
        return "Standard"
    elif 30 < age_days <= 90:
        return "Nearline"
    elif 90 < age_days <= 365:
        return "Coldline"
    else:
        return "Archive"

def calculate_cost_and_savings(size_bytes, current_class, recommended_class):
    size_gb = size_bytes / (1024 ** 3)
    current_price = STORAGE_PRICING.get(current_class, 0)
    recommended_price = STORAGE_PRICING.get(recommended_class, 0)

    current_monthly_cost = current_price * size_gb
    new_monthly_cost = recommended_price * size_gb

    if current_price <= recommended_price:
        monthly_savings = 0
        annual_savings = 0
    else:
        monthly_savings = current_monthly_cost - new_monthly_cost
        annual_savings = monthly_savings * 12

    return round(current_monthly_cost, 4), round(new_monthly_cost, 4), round(monthly_savings, 4), round(annual_savings, 4)

def suggest_lifecycle_policy():
    return {
        "rule": [
            {"action": {"type": "SetStorageClass", "storageClass": "NEARLINE"}, "condition": {"age": 30}},
            {"action": {"type": "SetStorageClass", "storageClass": "COLDLINE"}, "condition": {"age": 90}},
            {"action": {"type": "SetStorageClass", "storageClass": "ARCHIVE"}, "condition": {"age": 365}}
        ]
    }



def print_report(report):
    print(f"\nProject ID: {report['project_id']}")
    for bucket in report["buckets"]:
        print(f"\n📦 Bucket: {bucket['bucket_name']}")
        
        if "error" in bucket:
            print(f"  ❌ Error: {bucket['error']}")
            continue

        if bucket.get("empty_bucket"):
            print("  ⚠️  Empty bucket.")
        else:
            print("  📄 Objects:")
            for obj in bucket["objects"]:
                print(f"    • {obj['object_name']}")
                print(f"      Last Modified: {obj['last_modified']}")
                print(f"      Recommended Class: {obj['recommended_storage_class']}")
                print(f"      Monthly Cost (Current): ${obj['estimated_monthly_cost_current']}")
                print(f"      Monthly Cost (After):   ${obj['estimated_monthly_cost_after_policy']}")
                print(f"      Monthly Savings:        ${obj['estimated_monthly_savings']}")
                print(f"      Annual Savings:         ${obj['estimated_annual_savings']}")

        # Print current lifecycle policy
        print("  📋 Current Lifecycle Policy:")
        current_policy = bucket.get("current_lifecycle_policy", [])
        if current_policy:
            for i, rule in enumerate(current_policy, 1):
                action = rule.get("action", {})
                condition = rule.get("condition", {})
                print(f"    Rule {i}:")
                print(f"      Action: {action}")
                print(f"      Condition: {condition}")
        else:
            print("    No lifecycle policy set.")

        # Print suggested lifecycle policy
        print("  💡 Suggested Lifecycle Policy:")
        for rule in bucket["lifecycle_policy_suggestion"]["rule"]:
            age = rule["condition"]["age"]
            storage_class = rule["action"]["storageClass"]
            print(f"    - After {age} days → {storage_class}")


def list_buckets_and_generate_json_report(project_id, output_file="gcs_report.json"):
    storage_client = storage.Client(project=project_id)
    buckets = storage_client.list_buckets()

    report = {"project_id": project_id, "buckets": []}

    for bucket in buckets:
        bucket_info = {
            "bucket_name": bucket.name,
            "lifecycle_policy_suggestion": suggest_lifecycle_policy(),
            "objects": []
        }

        try:
            # Fetch current lifecycle policy
            bucket.reload()
            bucket_info["current_lifecycle_policy"] = list(bucket.lifecycle_rules)

            # List objects
            blobs = list(storage_client.list_blobs(bucket.name))
            if not blobs:
                bucket_info["empty_bucket"] = True
            else:
                for blob in blobs:
                    recommended_class = recommend_storage_class(blob.updated)
                    current_monthly, new_monthly, monthly_savings, annual_savings = calculate_cost_and_savings(
                        blob.size, "Standard", recommended_class
                    )
                    object_info = {
                        "object_name": blob.name,
                        "last_modified": str(blob.updated),
                        "current_storage_class": "Standard",
                        "recommended_storage_class": recommended_class,
                        "estimated_monthly_cost_current": current_monthly,
                        "estimated_monthly_cost_after_policy": new_monthly,
                        "estimated_monthly_savings": monthly_savings,
                        "estimated_annual_savings": annual_savings
                    }
                    bucket_info["objects"].append(object_info)

        except Exception as e:
            bucket_info["error"] = str(e)

        report["buckets"].append(bucket_info)

    # Print and save
    print_report(report)
    with open(output_file, "w") as f:
        json.dump(report, f, indent=4)
    print(f"\n✅ Report saved to: {output_file}")

# Example usage
if __name__ == "__main__":
    project_id = "vernal-zone-452806-a4"
    list_buckets_and_generate_json_report(project_id)
```

## 📜 Code Walkthrough

### 1\. **Storage Class Cost Mapping**

Defines GCP Mumbai pricing:

```json
STORAGE_PRICING = {
    "Standard": 0.023,
    "Nearline": 0.016,
    "Coldline": 0.006,
    "Archive": 0.0025
}
```

### 2\. **Determine Best Storage Class**

This function checks how old an object is:

```python
def recommend_storage_class(last_modified):
    age_days = (now - last_modified).days
    ...
```

### 3\. **Cost Comparison**

Calculates what you'd save by transitioning storage class:

```python
def calculate_cost_and_savings(size_bytes, current_class, recommended_class):
    ...
```

### 4\. **Lifecycle Policy Suggestion**

Hardcoded lifecycle suggestion for automation:

```python
def suggest_lifecycle_policy():
    return {
        "rule": [
            {"action": {"type": "SetStorageClass", "storageClass": "NEARLINE"}, "condition": {"age": 30}},
            ...
        ]
    }
```

### 5\. **Bucket Analysis and Report Generation**

The main function:

```python
pythonCopyEditdef list_buckets_and_generate_json_report(project_id):
    ...
```

---

## 📊 Output Example

### Console Printout:

```python
📦 Bucket: my-backup-data
  📄 Objects:
    • database_backup_2022.sql
      Last Modified: 2022-04-10
      Recommended Class: Archive
      Monthly Cost (Current): $0.0344
      Monthly Cost (After):   $0.0037
      Monthly Savings:        $0.0307
      Annual Savings:         $0.3684
```

### JSON Report (partial):

```json
{
  "project_id": "my-gcp-project",
  "buckets": [
    {
      "bucket_name": "my-backup-data",
      "objects": [
        {
          "object_name": "database_backup_2022.sql",
          "recommended_storage_class": "Archive"
        }
      ]
    }
  ]
}
```

---

## 🚀 How to Run It

1. **Update the project ID** in `if __name__ == "__main__"` block:
    

```python
project_id = "your-gcp-project-id"
```

2. **Run the script**:
    

```bash
python gcs_cost_optimizer.py
```

3. **Review output**:
    

* Console for a human-readable report
    
* `gcs_report.json` for structured data
    

---

## ✅ Final Thoughts

This script empowers cloud cost optimization with minimal effort. Automating such audits monthly can lead to:

* Improved data hygiene
    
* Significant cost reductions
    
* Strategic policy automation
