After reading Benford’s Law and email subjects at Yiorgos’ blog I was curious if the law stands for email sizes as well.
So I did the experiment, for a week of email traffic at a certain mail server, and here is the result:

Amazing, the law stands for sizes as well!
Cool! The next thing to do is to verify whether the law stands with regards to INBOX sizes. On systems with email quotas I expect a distortion on the digit that defines the quota (or the previous one) unless the quota is too high compared to the general case of inbox size.
I just wish we could get more Postmasters measure these things. Then maybe we could find a practical use of this observation.
One thing that would be interesting, is if you have a huge mailserver (or a cluster of many of them), to do this per sender/block of senders. It may give you some idea on whether senders send spam.
Not sure if this is going to be useful in any way, but it seems fun. :-)
It cannot be useful in «real time». However, when one periodically reviews information, it may be indicative of persistent spammers and such information could be used to add score in SpamAssassin.
I agree – it’s not useful in real time. And you need to know a bit more about the numbers involved before you can say a data-set that doesn’t follow the law is fake. The ages of my Facebook friends definitely don’t follow the law, but they’re all real friends.
I might do something where I compare sets of ham and spam and see whether subject lines, length etc. follow Benford’s Law.
How about the length of the email? Or even of the subject? The number of people sent to? The number of emails in a conversation. Once you start power law hunting you can see them everywhere.
We could turn this into a meme :-)
adamo did the subjects, I did the size, you do the nrcpts and pass it on to someone else :-)