Social Media

Can Google Really Tell Which of Your E-mails are Interesting?

Learn how Google decides which e-mails to store for offline viewing.
4 min read
Brought to you by PCWorld

One of the most fascinating tidbits from Google's announcement last week of offline capability for Gmail was this: The company says when it stores your messages for offline viewing, it tries to choose the most interesting ones. It's a sensible, if somewhat creepy strategy -- after all, why waste your disk space on messages you'll never want to read again? But after looking at the results on my own e-mail account, I can't see any evidence that, in my case at least, Google is successfully separating the wheat from the chaff.

Here's exactly what Google says in its online support:

"We try to download your most recent conversations along with any conversations that seem to be important (regardless of their age). We also try not to dowload [sic] uninteresting conversations. This process is done heuristically and as with any heuristic can and will miss things. We'll continue to tune things up, but more importantly, we'll eventually provide a UI that will allow you to change the settings."

I asked a Google rep if someone could give me more detail on the process, but he declined. So we're left with this somewhat cryptic explanation. (When I first started hearing the word heuristic a few years ago, I thought it actually meant something. After hearing it applied to countless mysterious and diverse technological processes over the years, I've concluded that it's really just a polite way of saying "You wouldn't understand.")

So Google seems to be saying it analyzes your messages to figure out whether they're important or interesting (certainly not always the same thing). Does it do that by looking at the content (it already processes the content of your messages to serve up contextual ads)? Does it look at the activity a message engenders -- the number of responses, etc.? My guess is they'd use a combination of both methods.

But whatever the strategy is, it doesn't seem to be working, at least on my Gmail account. I looked at what Gmail did with uninteresting, unimportant messages and what it did with very important and interesting missives. What I found was it seems to be doing the exact same thing with both kinds of messages.

I started by looking at all the mail in my offline cache. Basically Gmail has kept a pretty comprehensive collection of my mail back to the beginning of December 2008, about 6,500 messages. It also kept other messages that have a few select labels. (Google says that it chooses some labels that will be completely cached. Here's Google's baroque explanation of how it chooses those labels: "Additionally, we'll download any conversation marked with a label that contains less than 200 conversations, has at least one conversation that has been received in the last 30 days and also has at least one conversation that's outside the estimated time period. For many users, this list of labels will include Starred and Drafts.")

Then I looked at unimportant, uninteresting messages, for instance, a stream of collected business press releases I get daily, never read and immediately archive. The result: Every one of them is preserved in my Google cache, right back to early December. I also looked at a stream of e-mail ads from Amazon about bargains. Again, these are messages I don't open, just immediately archive. Again, each one seems to be preserved in my offline cache, back to the beginning of December.

Finally, I looked at important messages from my colleagues here at PC World (some of which are actually interesting as well.) These are messages that I read closely and often respond to or forward to another editor. According to Google's description of its system, I should expect that some of those messages would be preserved even if they predate the early December cutoff for the rest of my mail. (Remember: "We try to download ... any conversations that seem to be important (regardless of their age).")

But that's not the case. The only messages in my cache that are from before early December are those that have one of the labels that Gmail decided to keep a complete record of. It doesn't matter if an e-mail thread has 15 responses or includes words like "this is important" or "that's interesting." It's still not there.

So I'm forced to conclude that for my account at least, Google's heuristic isn't working. Or perhaps it's something that I just wouldn't understand.

More from Entrepreneur

Grow Your Business at Entrepreneur LIVE! Join us on Nov. 16 in Brooklyn, NY, to learn from legends like Danica Patrick and Maria Sharapova, pitch our editors, meet with investors, and potentially walk away with funding!
Register here

One-on-one online sessions with our experts can help you start a business, grow your business, build your brand, fundraise and more.
Book Your Session

Whether you are launching or growing a business, we have all the business tools you need to take your business to the next level, in one place.
Enroll Now

Latest on Entrepreneur

My Queue

There are no Videos in your queue.

Click on the Add to next to any video to save to your queue.

There are no Articles in your queue.

Click on the Add to next to any article to save to your queue.

There are no Podcasts in your queue.

Click on the Add to next to any podcast episode to save to your queue.

You're not following any authors.

Click the Follow button on any author page to keep up with the latest content from your favorite authors.