Saturday, July 23, 2011

Tweetalytics

Twitter logoUntil this week I thought Twitter would focus on datamining the tweetstream rather than adding features for individual users. I based this in part on mentions by Fred Wilson of work by Twitter on analytics. I've been watching for evidence of changes I expected to be made in the service, intending to write about it if they appeared.

Earlier this week came news of a shakeup in product management at Twitter. Jack Dorsey seems much more focussed on user-visible aspects of the service, and I'm less convinced that backend analytics will be a priority now. Therefore I'm just going to write about the things I'd been watching for.

To reiterate: these are not things Twitter currently does, nor do I know they're looking at it. These are things which seemed logical, and would be visible outside the service.


Wrap all links: URLs passing through the firehose can be identified, but knowing what gets clicked is valuable. The twitter.com web client already wraps all URLs using t.co, regardless of their length. Taking the next step to shorten every non-t.co link passing through the system would be a way to get click data on everything. There is a downside in added latency to contact the shortener, but that is a product tradeoff to be made.


Unique t.co per retweet: There is already good visibility into how tweets spread through the system, by tracking new-style retweets and URL search for manual RTs. What is not currently visible is the point of egress from the service: which retweet actually gets clicked on. This can be useful if trying to measure a user's influence. An approximation can be made by looking at the number of followers, but that breaks down when retweeters have a similar number of followers. Instead, each retweet could generate a new t.co entry. The specific egress point would be known because each would have a unique URL.


Tracking beyond tweets: t.co tracks the first click. Once the link is expanded, there is no visibility into what happens. Tracking its spread once it leaves the service would require work with the individual sites, likely only practical for the top sites passing through the tweetstream. Tracking information could be automatically added to URLs before shortening, in a format suitable for the site's analytics. For example a utm_medium=tweet parameter could be added to the original URL. There might be some user displeasure at having the URL modified, which would have to be taken into account.


Each of these adds more information to be datamined by publishers. They don't result in user-visible features, and I suspect that as of a couple days ago user-visible features became a far higher priority.


footnote: this blog contains articles on a range of topics. If you want more posts like this, I suggest the Social label.