Overview
For larger file uploads, most S3 clients make use of the multi-part-upload (MPU) feature of the S3 protocol. This allows the client to break large files into smaller chunks, upload these smaller chunks, and re-try any chunks that failed without having to start over.
Most S3 clients are good about cleaning up MPU data that it no longer needs, but if a connection drops or the client crashes, it could leave this data behind. The data is generally not used again, however it may silently use additional disk space on your account until it is removed. It is worthwhile to check for and remove this MPU data if disk storage costs appear larger than expected.
Most S3 clients don’t have an MPU data purge feature, so in the following example, Python and the boto library is used to check for and clean up this data.
Step 1 — Create a .boto file to store your keys.
View the following external instructions on how to create a .boto config file. This will be used to store your DreamObjects keys.
There should now be a file named .boto in your user's home directory, which stores your DreamObjects keys.
Step 2 — Create the clean-up script
Create a file titled mpu.py via SSH. The following article explains how to do this.
You can then add the code below to this file. This script iterates over all buckets checking for MPU data. If any are found, it displays the file name, the date it was uploaded, its size, and then prompts you if it should be deleted.
Once the MPU data is deleted, it cannot be recovered. Please be sure you don’t need the data before removing it.
Clean-up script code
You do not need to adjust any of the code below since your keys are already stored in your .boto file from step 1 above.
#!/usr/bin/python import boto
from boto.s3.connection import OrdinaryCallingFormat # Connect to DreamObjects c = boto.connect_s3(host='objects-us-east-1.dream.io', calling_format=boto.s3.connection.OrdinaryCallingFormat()) # Iterate over all buckets for b in c.get_all_buckets(): print '\nBucket: ' + b.name # Check for MPU data and calculate the total storage used total_size = 0 for mpu in b.get_all_multipart_uploads(): ptotalsize = 0 for p in mpu.get_all_parts(): ptotalsize += p.size print mpu.initiated, mpu.key_name, ptotalsize, str(round(ptotalsize * 1.0 / 1024 ** 3, 2)) + 'GB' total_size += ptotalsize print 'Total: ' + str(round(total_size * 1.0 / 1024 ** 3, 2)) + 'GB' # If there is any usage, prompt to delete it and do so if requested if total_size > 0 and str(raw_input('Delete MPU data? (y/n) ')) == 'y': for mpu in b.get_all_multipart_uploads(): mpu.cancel_upload() print 'MPU data deleted!' else: print 'No changes made to bucket.'
Clean-up script example output
Bucket: my-user-bucket
2024-04-02T19:36:21.072Z backups/example.com/04-02-2024_example.com.zip 0.1GB
Total: 0.1GB
Delete MPU data? (y/n) y
MPU data deleted! Bucket: workbackup Total: 0.00GB No changes made to bucket.
Step 3 — Run the file
While still logged into your server via SSH, run the file by using the following command.
[server]$ python mpy.py