Hello there!
Privacy on the Internet is essential. Nowadays, people ignore exposing their PII. Today I’ll build a small tool to strengthen and ease this process.
Before publishing my first project here: Smart Light Bulb Cop Car, I’ve prepared a draft copy on my local disc. I used a popular document editor to ensure everything was written and ‘composing’ the article. I’ve added text, pictures, and video. I have even placed them in the different directories assigned to each text block. So, the tree of my guide looks, more or less, like:
D:.
| Smart Light Bulb Cop Car.docx
|
+---bulb
| Screenshot_xxx-152050.jpg
| Screenshot_xxx-152056.jpg
| Screenshot_xxx-152102.jpg
| Screenshot_xxx-155846.jpg
|
+---gear
| IMG_xxx_201451.jpg
| IMG_xxx_201451_cleaned.JPG
| IMG_xxx_201555.jpg
| IMG_xxx_201703.jpg
| IMG_xxx_201921.jpg
| IMG_xxx_201926.jpg
| IMG_xxx_202404.jpg
| IMG_xxx_203348.jpg
| out.JPG
| video.mp4
| VID_xxx_202822.mp4
| VID_xxx_203101.mp4
| VID_xxx_203101_cleaned.mp4
| VID_xxx_203142.mp4
| VID_xxx_203208.mp4
| VID_xxx_203238.mp4
| VID_xxx_203320.mp4
|
+---snippets
| police.py
| vlc-test.py
|
\---speaker
1.JPG
2.JPG
3.JPG
4.JPG
5.JPG
6.JPG
vlc-error.JPG
“What about your privacy?” – popped into my head. It’s a matter of time before I would somehow disclose myself – by mistake or willingly. But I was to upload a lot of digital content to my webpage, including media captured with my phone, so I should somehow take care of that.
Why privacy is so important
We are taking away our personal info for pennies. In 2015 and 2017, Jimmy Kimmel Live published videos about giving out the personal password in a street poll:
But that’s not only limited to passwords. There are a variety of metadata out there:
- This is why you should never Instagram your boarding pass
- Military And Intelligence Personnel Can Be Tracked With The Untappd Beer App
The article about Untappd Beer App (which I also use…) was published in May 2020. I want to underline that it tells about military and intelligence personnel, which should be too aware of. In 2014 there was an incident related to posting pictures on Instagram:
It was kind of a big thing in Europe, so after that, I’ve heard that now Instagram is deleting all meta-data while publishing. To put this sentence in the article, I wanted to double-check that, so I’ve found research about social media sites and their attitude to metadata, which you may find interesting.
So, when I decide to put some of my content online, I would like to ensure no unwanted meta-data. That’s why I have written my own meta-data removers:
- In Python – this one, for “on-site” purposes,
- Android – for quick mobile use-cases (e.g., before posting on social sites)
Climbing the tree
As far as you can see, while I shared my directory tree, I mostly had images and videos. I’ll take a closer look at their meta-data further.
Because I’m lazy, I would like to build two lists (images, videos) of absolute paths to the files I need to clean. While the file is cleaned, I would like it to be marked as ‘_cleaned’ to ensure that every media has a corresponding cleaned file and not to clean files with no metadata.
I’ve read a very extensive article about How to iterate over files in directory python, and I was ready to go:
directory = r'D:\HFOC\arts\copcar'
img_extenstions = ('.jpg','.JPG','.png','.PNG')
video_extenstions = ('.avi', '.AVI', '.mp4', '.MP4')
_cleaned = '_cleaned'
all_imgs = []
all_videos = []
cleaned_imgs = []
cleaned_videos = []
need_cleaning = []
for subdir, dirs, files in os.walk(directory):
for filename in files:
filepath = subdir + os.sep + filename
if filepath.endswith(img_extenstions):
all_imgs.append(filepath)
elif filepath.endswith(video_extenstions):
all_videos.append(filepath)
After checking my all_imgs and all_videos, I had lists of all images and videos in my directory. Then I would like to aim for those which were previously cleaned. So, as you can see on the tree, I’m looking for:
...
| IMG_xxx_201451_cleaned.JPG
...
| VID_xxx_203101_cleaned.mp4
...
Which are stored in my lists as:
D:\HFOC\arts\copcar\gear\IMG_xxx_201451_cleaned.JPG
and
D:\HFOC\arts\copcar\gear\VID_xxx_203101_cleaned.mp4
Now I would go through the whole list and look for an element that is:
- Last element after splitting the element by os.separator,
- This element split by ‘.’, and check if the penultimate ends with ‘_cleaned’,
- If yes, I would like to add it to the cleaned list, both with and without end.
for img in all_imgs[:]:
if img.split(os.sep)[-1].split('.')[-2].endswith(_cleaned):
cleaned_imgs.append(img.split(os.sep)[-1].split('.')[-2].replace(_cleaned, ''))
for video in all_videos[:]:
if video.split(os.sep)[-1].split('.')[-2].endswith(_cleaned):
cleaned_videos.append(video.split(os.sep)[-1].split('.')[-2].replace(_cleaned, ''))
And then for all aspects that are not in the cleaned lists, I would like them to be appended to the need_cleaning list:
for img in all_imgs:
for cleaned in cleaned_imgs:
if not img.split(os.sep)[-1].split('.')[-2].startswith(cleaned):
need_cleaning.append(img)
for video in all_videos:
for cleaned in cleaned_videos:
if not video.split(os.sep)[-1].split('.')[-2].startswith(cleaned):
need_cleaning.append(video)
And the final loop:
for media in need_cleaning:
if media.endswith(img_extensions):
clear_img(media)
elif media.endswith(video_extenstions):
clear_vid(media)
Note: I know that there is a way to do it more efficient 😉 but there is no point in ALWAYS optimizing a number of lines.
Clearing Images
There are many ways to clear Image metadata (EXIF). E.g., on Windows 10, you can right-click the file, go to the Details tab and “ Remove Properties and Personal Information.”
Also, you can find a dedicated program and platforms online to remove them. (Btw. What a funny idea to upload online a file that you are concerned about as a risk for your privacy)
There are also some python libraries and scripts to do it. I’ll use the python Exif library.
pip3 install exif
It’s effortless and convenient. So I can write a simple function, where I will produce output name, open my file and try to print the metadata, end then delete them:
def clear_img(filename):
file_mod = filename.split('.')
file_mod = file_mod[0]+_cleaned+'.'+file_mod[1] #
with open(filename, 'rb') as image_file:
my_image = Image(image_file)
for element in dir(my_image):
try:
print(f"{element}: {my_image[element]}")
del my_image[element]
except:
print(f"{element} unknown")
Then I would like to save the file without those EXIF data:
with open(file_mod, 'wb') as new_image_file:
new_image_file.write(my_image.get_file())
print(f'File {filename} cleared and saved as a {file_mod}')
The sample output looks like this:
clear_img(file)
_exif_ifd_pointer: 216
_gps_ifd_pointer: 752
_interoperability_ifd_Pointer: 721
_segments unknown
aperture_value: 1.53
brightness_value: -2.52
color_space: 1
(...)
datetime: 2020:05:27 20:15:56
datetime_digitized: 2020:05:27 20:15:56
datetime_original: 2020:05:27 20:15:56
exif_version: 0220
(...)
image_height: 3456
image_width: 4608
jpeg_interchange_format: 888
jpeg_interchange_format_length: 39361
light_source: 0
make: OnePlus
max_aperture_value: 1.53
metering_mode: 2
model: ONEPLUS xxx
orientation: 6
photographic_sensitivity: 500
(...)
x_resolution: 72.0
y_and_c_positioning: 1
y_resolution: 72.0
File D:\HFOC\arts\copcar\gear\IMG_xxx_201555.JPG cleared and saved as a D:\HFOC\arts\copcar\gear\IMG_xxx_201555_cleaned.JPG
After double-checking, there are no EXIF data left:
get unknown
get_file unknown
has_exif unknown
orientation: 6
resolution_unit: 2
x_resolution: 72.0
y_resolution: 72.0
Clearing video
Reading video metadata it’s not so popular on the Internet. So it’s not an immediate answer from Google to use the python library how to do that. But I have found two solutions that will help me remove video metadata. To do that, I need both pytaglib and shutil.
Pytaglib is an audio tagging library. It is effortless to use yet fully featured.
To my surprise, the video I recorded was without any metadata. Compared to images, that’s good. So I edited the file manually to prove my concept:

Then I checked if pytaglib can read them:
>>> import taglib
>>> video = r'D:\HFOC\arts\copcar\gear\video.mp4'
>>> v = taglib.File(video)
>>> v.tags
Out: {'COMMENT': ['thisiscomment'], 'TITLE': ['videofrominternet']}
Tags can be deleted using simple del, e.g.:
del v.tags['COMMENT']
Then we need to save:
v.save()
But I want to make sure, by using shutil.copyfile (Copy the contents (no metadata) of the file named src to a file named dst.) and then clear the rest (if any) with taglib. The final code looks like this:
def clear_vid(filename):
file_mod = filename.split('.')
file_mod = file_mod[0]+_cleaned+'.'+file_mod[1]
shutil.copyfile(filename, file_mod) #shutil.copy does not copy metadata
#just to make sure:
v = taglib.File(file_mod)
keys = v.tags.keys()
for tag in list(keys):
del v.tags[tag]
v.save()
Getting it all together
After getting it all together, my uncomplicated script looks like this:
from exif import Image
import taglib
import shutil
import os
def clear_vid(filename):
file_mod = filename.split('.')
file_mod = file_mod[0]+_cleaned+'.'+file_mod[1]
shutil.copyfile(filename, file_mod) #shutil.copy does not copy metadata
#just to make sure:
v = taglib.File(file_mod)
keys = v.tags.keys()
for tag in list(keys):
del v.tags[tag]
v.save()
def clear_img(filename):
file_mod = filename.split('.')
file_mod = file_mod[0]+_cleaned+'.'+file_mod[1]
with open(filename, 'rb') as image_file:
my_image = Image(image_file)
for element in dir(my_image):
try:
print(f"{element}: {my_image[element]}")
del my_image[element]
except:
print(f"{element} unknown")
with open(file_mod, 'wb') as new_image_file:
new_image_file.write(my_image.get_file())
print(f'File {filename} cleared and saved as a {file_mod}')
directory = r'D:\HFOC\arts\copcar'
img_extenstions = ('.jpg','.JPG','.png','.PNG')
video_extenstions = ('.avi', '.AVI', '.mp4', '.MP4')
_cleaned = '_cleaned'
all_imgs = []
all_videos = []
cleaned_imgs = []
cleaned_videos = []
need_cleaning = []
for subdir, dirs, files in os.walk(directory):
for filename in files:
filepath = subdir + os.sep + filename
if filepath.endswith(img_extenstions):
all_imgs.append(filepath)
elif filepath.endswith(video_extenstions):
all_videos.append(filepath)
for img in all_imgs[:]:
if img.split(os.sep)[-1].split('.')[-2].endswith(_cleaned):
cleaned_imgs.append(img.split(os.sep)[-1].split('.')[-2].replace(_cleaned, ''))
for video in all_videos[:]:
if video.split(os.sep)[-1].split('.')[-2].endswith(_cleaned):
cleaned_videos.append(video.split(os.sep)[-1].split('.')[-2].replace(_cleaned, ''))
for img in all_imgs:
for cleaned in cleaned_imgs:
if not img.split(os.sep)[-1].split('.')[-2].startswith(cleaned):
need_cleaning.append(img)
for video in all_videos:
for cleaned in cleaned_videos:
if not video.split(os.sep)[-1].split('.')[-2].startswith(cleaned):
need_cleaning.append(video)
for media in need_cleaning:
if media.endswith(img_extenstions):
clear_img(media)
elif media.endswith(video_extenstions):
clear_vid(media)
In the future, I can quickly run it for different directories just by modifying the directory variable.
Source code for this project on my Github: HeadFullOfCiphers/PythonExIfRemover
And that’s it for today 😉 I hope you liked this little project, and you can find a related one for Android below:
Reference list:
- Python exif library: https://pypi.org/project/exif/
- Python shutil docs: https://docs.python.org/3.8/library/shutil.html
- Python pytaglib: https://pypi.org/project/pytaglib/
- How to iterate over files in directory Python
- Jimmy Kimmel Live: Whats is Your Password? (2015)
- Jimmy Kimmel Live: What’s Your Password? (2017)
- This is why you should never Instagram your boarding pass
- Military And Intelligence Personnel Can Be Tracked With The Untappd Beer App
- Social Media Sites Photo Metadata Test Results 2019
Check out related posts:
it’s a great article.
Unconscious sharing PII is indeed a serious problem.
LikeLiked by 1 person