How to remove multi-part-upload (MPU) data and free up bucket space

For larger file uploads, most S3 clients make use of the multi-part-upload (MPU) feature of the S3 protocol. This allows the client to break large files into smaller chunks, upload these smaller chunks, and re-try any chunks that failed without having to start over.

Most S3 clients are good about cleaning up MPU data that it no longer needs, but if a connection drops or the client crashes, it could leave this data behind. The data is generally not used again, however it may silently use additional disk space on your account until it is removed. It is worthwhile to check for and remove this MPU data if disk storage costs appear larger than expected.

Most S3 clients don’t have a MPU data purge feature, so in the following example, Python and the boto library is used to check for and clean up this data.

Step 1 — Create a .boto file to store your keys.

View the following article for instructions on how to create a .boto config file. This will be used to store your DreamObjects keys.

There should now be a file named .boto in your user's home directory which stores your DreamObjects keys.

Step 2 — Create the clean-up script

This script iterates over all buckets checking for MPU data. If any are found, it displays the file name, the date it was uploaded, its size, and then prompts you if it should be deleted.

Once the MPU data is deleted, it cannot be recovered. Please be sure you don’t need the data before removing it.

Clean-up script code

You do not need to adjust any of the code below since your keys are already stored in your .boto file from step 1 above.

#!/usr/bin/python

import boto

# Connect to DreamObjects
c = boto.connect_s3(host='objects-us-east-1.dream.io')

# Iterate over all buckets
for b in c.get_all_buckets():
    print '\nBucket: ' + b.name

    # Check for MPU data and calculate the total storage used
    total_size = 0
    for mpu in b.get_all_multipart_uploads():
        ptotalsize = 0
        for p in mpu.get_all_parts():
            ptotalsize += p.size
        print mpu.initiated, mpu.key_name, ptotalsize, str(round(ptotalsize * 1.0 / 1024 ** 3, 2)) + 'GB'
        total_size += ptotalsize

    print 'Total: ' + str(round(total_size * 1.0 / 1024 ** 3, 2)) + 'GB'

    # If there is any usage, prompt to delete it and do so if requested
    if total_size > 0 and str(raw_input('Delete MPU data? (y/n) ')) == 'y':
        for mpu in b.get_all_multipart_uploads():
            mpu.cancel_upload()
        print 'MPU data deleted!'
    else:
        print 'No changes made to bucket.'

If path-style format is required (for example if your bucket was accidentally created with uppercase characters), you need to add `calling_format=boto.s3.connection.OrdinaryCallingFormat()` to the boto.connect_s3 call.

For example:

c = boto.connect_s3(accesskey, secretkey, host='objects-us-east-1.dream.io', calling_format=boto.s3.connection.OrdinaryCallingFormat())

Clean-up script example output

Bucket: my-user-bucket
2019-02-20T19:36:21.072Z backups/example.com/02-20-2019_example.com.zip 0.1GB
Total: 0.1GB
Delete MPU data? (y/n) y
MPU data deleted! Bucket: workbackup Total: 0.00GB No changes made to bucket.

Did this article answer your questions?

Article last updated PST.