What The Heck Are These Cloud Storage Buckets?!

I enjoy using the Google Cloud Platform (GCP) for hobby projects (check out why I use GCP here) and Google’s Cloud Storage (GCS) product has made its way into my design several times. However, I quickly realized that other GCP services leverage GCS, creating buckets and filling them with objects.

At first, my use of GCS was light, and these system buckets didn’t bother me, but then I started seeing charges on my bill (albeit just a few cents), and decided it was time to understand what these buckets were for and how I could remove or reduce the charges.

Understanding these charges can be a challenge and what creates the system buckets is loosely documented. I have documented my experience here hoping that it will help others avoid similar frustrations.

Below includes my investigation processes, discoveries, and the steps I took to minimize my GCS charges. If you are just interested in the solution you can skip to The Solution section.

WARNING: Do not alter any buckets auto-generated by GCP (or their contents) without understanding their purpose. Some are tied to active processes and altering them can cause irreversible object corruption!

The Investigation

All this started when I began incurring GCS costs, and if you are here investigating your charges you know that there isn’t much detail in the billing dashboard. Here is what I see for February of 2021.

There is a charge of 5 cents for 1.45 GB month of storage under the US Multi-region SKU. I see two issues here:

  1. The first 5GB of storage should be free (thank you free tier!)
  2. My usage of Cloud Storage should not come close to 1.45 GB

The Google Cloud Storage pricing page addresses my first issue. The free tier only applies to certain regions.

Cloud Storage Always Free quotas apply to usage in US-WEST1, US-CENTRAL1, and US-EAST1 regions. Usage is aggregated across these 3 regions. Always Free is subject to change. Please see our FAQ for eligibility requirements and other restrictions.

Maybe I selected the wrong storage type for my buckets?

Cloud Storage Browser

Bucket details can be found on the Cloud Storage browser page:

In this case, I have 6 buckets in total. Highlighted in green is an active bucket used to house some IoT device data. I did not directly create the others and some have the location type of Multi-region.

That partially explains the charges, but it’s not clear what process created these buckets and what GCP uses them for. On that note…

WARNING: Do not alter any buckets auto-generated by GCP (or their contents) without understanding their purpose. Some are tied to active processes and altering them can cause irreversible object corruption!*

*Repeated intentionally due to its importance!

The next step is to analyze bucket space utilization and understand where the costs are originating.

The Monitoring Page

Although the GCS browser does not show total space utilization by bucket there are a few different ways of getting this information. I prefer the GCP monitoring page. Here are Google’s setup instructions when using the monitoring page for the first time:

If you have never used Cloud Monitoring, then on your first access of Monitoring in the Google Cloud Console, a Workspace is automatically created and your project is associated with that Workspace. Otherwise, if your project isn’t associated with a Workspace, then a dialog appears and you can either create a Workspace or add your project to an existing Workspace. We recommend that you create a Workspace. After you make your selection, click Add.

Once loaded, the easiest way to get an overview of GCS usage is to select it under the resource dashboard.

Expanding the legend of the Object Size graph on the resource dashboard provides a list of all buckets along with their current space utilization.

In this case, the us.artifacts bucket is responsible for 99.7% of my total storage. The main cost driver has been identified!

The Solution

Of the 5 auto-generated buckets in my Google Cloud Storage, 4 are multi-region and are incurring costs. I will outline what GCP processes are using each bucket for and how to minimize or eliminate the costs.

The Cloud Run Buckets

The <project-id>_cloudbuild and artifacts.<project-id>.appspot.com  buckets are utilized by the Google Cloud Run engine. When code is submitted to Cloud Run, the engine uses the cloudbuild bucket to stage build objects and the artifacts bucket as the artifact registry. It’s not critical for you to know exactly what these objects are, but you should know that they are typically not critical after deployments and there is no reason for them to be in a Multi-region bucket.

The good news is that GCP allows to overload defaults with the gcloud builds submit command. Here are the steps to ensure you incur no more GCS costs from your Clour Run deployments:

  1. Create a new bucket with your desired regional storage (eg. gcr_store)
  2. Create a default folder for the build objects in this bucket (eg. source)
  3. Create a default folder for the artifact objects in this bucket (eg. artifacts)
  4. Create a cloudbuild.yaml file in your deployment directory with something like the following (note the location mapping to the new artifacts folder and the gcr.io/cloud-builders/docker indicating what builder to use)
steps:
- name: 'gcr.io/cloud-builders/docker'  
artifacts:
  objects:
    location: 'gs://gcr_store/artifacts'
    paths: ['*']

5. Use the  --gcs-source-staging-dir flag to specify where build objects should be saved when building new Cloud Run applications and include your config yaml file

gcloud builds submit --gcs-source-staging-dir=gs://gcr_store/source --config cloudbuild.yaml

6. Delete your auto-generated <project-id>_cloudbuild and artifacts.<project-id>.appspot.com buckets

5. (Optional) Add a lifecycle rule on your new bucket to delete objects older than X days (eg. 7 days)

Once done you should no longer have a Multi-region bucket associated with your Cloud Run deployment process and if you ever find the size of your custom bucket is getting out of hand you can implement Step 6.

The Cloud Functions Bucket

The gcf-sources-<id>-<region>   bucket is used for the storage of Google Cloud Function (GCF) objects and metadata. This folder is deployed in the same region as your functions and should never get very large (mine is 11 kB for 5 functions). I don’t recommend touching the contents of this bucket as it could permanently corrupt your GCF objects.

Some Cloud Functions will also use Cloud Build which dumps artifacts into the us.artifacts.<project-id>.appspot.com bucket. See the The us.artifacts Bucket section below and what can be done to address these objects.

The App Engine Buckets

The staging.<project-id>.apopspot.com bucket is used by the Google App Engine for temporary storage during deployments.

App Engine also creates a bucket that it uses for temporary storage when it deploys new versions of your app. This bucket, named staging.project-id.appspot.com, is for use by App Engine only. Apps can't interact with this bucket.

You can’t get rid of this bucket but you can reduce the number of stored objects by specifying a different bucket at build time with the —bucket flag. Here are the steps to ensure you incur minimal costs from this bucket:

  1. Create a new bucket with your desired regional storage (eg. gae_storage) — if desired you can use a different bucket for each app
  2. Use the  —bucket flag to specify where build objects should be saved when deploying your app
gcloud app deploy --bucket=gs://gae_storage

3. Delete everything in the staging.<project-id>.apopspot.com directory except for the ae/ folder

Once done the Multi-region staging.<project-id>.apopspot.com bucket will be minimally leveraged and your custom buckets will contain 99% of the objects for each app deployed.

App Engine deployments also leverage us.artifacts.<project-id>.appspot.com bucket. See the The us.artifacts Bucket section below and what can be done to address these objects.

The us.artifacts Bucket

The  us.artifacts.<project-id>.appspot.com bucket is used to store container images generated by the Cloud Build service. The only processes I have observed to generate objects in this bucket are Cloud Functions and App Engine builds. Objects generated by these processes are safe to remove post-deployment as described here.

Once deployment is complete, App Engine no longer needs the container images. Note that they are not automatically deleted, so to avoid reaching your storage quota, you can safely delete any images you don’t need.

The same should apply for Cloud Function artifacts as well.

Although I do not use Firebase to deploy functions I have come across several open tickets online indicating that the approach below might cause issues for you. I might do another article exploring the Firebase issue and possible resolutions.

Do not delete this bucket outright and do not follow the below instruction if you use Firebase to deploy functions!

We cannot remove the bucket altogether but we can follow these steps to minimize space usage.

  1. Navigate to the LIFECYCLE tab of the us.artifacts.<project-id>.appspot.com bucket
  2. Add a new lifecycle rule deleting objects that have an age greater than X days (I use 7 for mine)
  3. Delete all objects in this bucket

Once done you should see your space consumption for this bucket drop significantly. In my case, I was able to free up 85% of the utilized space to less than 300MB.

Conclusion

GCP is a great platform but when it comes to automatic storage of metadata objects and build container images things can get complicated and messy. Through this investigation, I got a chance to learn more about how Cloud Run, App Engine application, and Cloud Functions are managed. I hope that you were able to learn something from this post as well and if not that at least I was able to help you tidy up your GCS environment.

Good luck and happy coding!

Header photo by Pedro da Silva on Unsplash