I’ve had a really bad first impression of the new Photos app for OS X. My scenario is a bit of a corner case but it shouldn’t be a hard problem to solve if Photos were even a tiny bit less stupid about a couple of things.
Scenario
I don’t want to use Photos to actually organize my pictures on the file system. I also don’t want to use Photo’s iCloud sync. The reason is that I have a heterogeneous set of devices and computers at home and as a result, I already have a system for managing my pictures that works on all these devices. If I would go all in on Photos, I would have a hard time using the same library on my Linux, Android and FreeBSD systems. (I also have a bit of a philosophical issue with trusting my entire library to a single program.)
So if I already have a system, why would I want to use Photos at all? Basically, there are two things I want from Photos:
Marking photos I love as favourites.
Tagging photos with faces so I can find for example pictures of me and my fiancée for the calendar my family does every year.
These features are nice enhancements to my library but not so crucial that I want to give my entire library over to Photos.
My current system involves manually creating folders for my pictures,
grouping them by capture date and naming the folder with something
descriptive. It’s fairly easy to maintain and since I don’t use any
special program, I can simply rsync
my pictures between my different
computers (and server) and all of them are able to display them
natively. The only part I’m missing is the two points above.
I’ve thought about actually copying the Masters
directory in the
Photos library to the other systems but the problem is that the pictures
in that directory are arranged according to import time. This means
that I would loose the ability to easily list all the pictures taken on
a particular day or event on the non-Apple systems — a deal breaker
for me since that’s how I usually look at my pictures.
Photos does support referenced libraries which I thought I could use to achieve what I wanted (and I could if it wasn’t such a brain-dead implementation). In a referenced library, no pictures are actually copied to the library during an import. Instead, Photos keeps a reference to where in the file system the pictures actually exist.
Problem
I launched Photos, opened its preferences and unchecked the “Copy items to the Photos library” check box. I then began importing all my pictures — a modest 66GB. At first everything seemed to go well so I left the computer and went to help some friends who were moving. When I got back a couple of hours later, Photos was done importing and had finished running facial recognition on all pictures. I opened Finder, curious to see how big the library had become (it’s just references, shouldn’t be more than a couple of gigabytes, right?).
Before I even got to the ~/Pictures
folder, I noticed something wrong:
I was down to under one gigabyte of disk space left. With suspicion, I
clicked on the Photos Library.photoslibrary
file (a fairly annoying
name for a file that isn’t even a file) and waited a second for Finder
to tell me how big it was.
60GB?! What on earth happened here?
After some research, I found that some other people had had the same
issue. The conclusion they came to was that it seemed to be related to
facial recognition in some way. The directory in the Photos library that
had expanded uncontrollably was resources/modelresources
. The folder
contains a lot of full-size copies1 of my pictures but cropped
in some way, often cropped on peoples faces2.
I don’t know how things are implemented behind the scenes but apparently, none of the Apple engineers are constrained by disk space (or alternatively, have significantly smaller libraries than I do). It’s especially frustrating since it would be fairly resonable to assume that a person that chooses to use referenced libraries already have a copy of their pictures and doesn’t necessarily have that much disk space left. Uncontrollably gobbling up disk space seems like a bad design decision in this case.
Until this rectified, Photos is basically useless for me. I don’t want to give up that much disk space just to have the convenience points listed above.
Thoughts About a Space Efficient Alternative Solution
As a programmer, I began thinking about the least amount of information I would need to store in order to keep track of which people are in which pictures. Note: This is a rough idea and undoubtedly, someone will find a problem with it. It also only handles face detection and doesn’t concern itself with thumbnails and other things needed in the library.
Let’s assume our facial recognition algorithm returns a list of the
positions (x1,y1,x2,y2
) of faces in each picture it processes. We then
ask the user to identify which person belongs to which face so we can
attach a name (and some form of unique identifier) to each face. This
would give us the following information to keep track of for each
detected face:
- A UUID representing the person this face belongs to.
- A tuple (
x1,y1,x2,y2
) representing the rectangle of the location of the face in the picture. - A reference to the picture.
This would allow us to:
- List all the pictures a specific person is in.
- Show the location of each face in each picture.
- List all faces in a particular picture.
This covers all the functionality Photos provides for faces. We’ll
assume a UUID is 128 bits and each position coordinate is 64 bits (a bit
overkill, yes). The reference could be an absolute path to the picture
which would make it hard to estimate its size. Lets say its 32 bits for
each character and max 500 characters long (the paths to my pictures
would easily fit in half of that). This would mean a maximum of 128 + 4
* 32
+ 32 * 500 = 16 256 bits
for each face.
Allowing for an average of 3 faces in each picture (probably high for my
library), this would mean that the library would take up 3 * 5400 *
16253 = 263 298 600 bits ≈ 32 MB
with this method.
Caveats
Note that this (deliberately) doesn’t say anything about the data structures used to store the information. In reality, we would need to keep it in some sort of database with indexes that allows us to look up rows by UUID or picture reference quickly. This would undoubtedly mean a larger library than I have calculated above but it would probably still be possible to stay under 1GB.
Yes, copies, not hard links as some forum posts claim. The inode count for the files in that directory is 1, meaning those files are the only reference pointing to that inode.
[return]This is what lead people to think that
[return]modelresources
has something to do with facial recognition. Although not all pictures in mymodelresources
were of faces.