Coding Relic: social

Showing posts with label social. Show all posts

Wednesday, March 26, 2025

Venmo Public Transactions

Venmo pushes hard for transaction activity to be Public. It doesn't say whether any past payments were actually public, and puts up an interstitial to confirm a change to Private.

This selection does have a benefit for the user, in making it more straightforward for friends to find each other and to make payment arrangements. However the choice has a larger impact on Venmo's user growth, and does come with downsides for their users like making activities public which they assumed were not.

Venmo Privacy settings page with options for Public, Friends, and Private. The current selection is Private. Below are buttons to change past transactions to Friends or to Private.

Venmo confirmation to really change past transactions to Private?

Presumably Venmo has data on how much of a network effect they get from having payment information be Public, drawing in friends and family and acquaintances and randos. Venmo appears to allow this data to impact their UI design to steer users toward the choice most beneficial to the company.

Tuesday, January 14, 2025

Nextdoor Emails Ads to Non-Users

Email from a Nextdoor user named Lisa: Hi Denton, Hope all is well. I'm currently trying to generate some new business. I'm hoping you'd be kind enough to give my business a Fave on Nextdoor, the neighborhood app, to help get the word out to our other neighbors. Sharing your support would be really appreciated and valuable for our growth - thanks!

I received an email from Nextdoor, clearly a paid advertisement targeted at people who live in my town.

The person who paid for the ad is asking something fairly innocuous, trying to build their business by generating enthusiasm on Nextdoor. Fake enthusiasm, but that is the way capitalism works nowadays.

I decided to obscure the name of their business because I don't really consider them to be a bad actor in this. Nextdoor is.

I am not a user of Nextdoor.
I have never been a user of Nextdoor.
I have no account there.

However, my mother was an active user and, as I learned today, likely allowed Nextdoor to access her contacts. Nextdoor's privacy policy article about this mentions names, email addresses, phone numbers, and "other information" will be harvested from uploaded contact information. Clearly it includes the postal address as well since Nextdoor targeted a geographical ad at me.

Nextdoor is selling access to me, without any kind of relationship with me and never having provided any value to me whatsoever. Someone had an account, therefore my information is free for them to monetize and do with as they please.

This stuff mostly fades into the background. Even while drafting this post, LinkedIn sent an email of "Denton, this top CEO is answering your questions live" which is clearly also a paid email advertisement targeted at me. I pay for LinkedIn Premium, but my information is nonetheless still used to juice some additional revenue. I receive this stuff regularly enough that don't even think about it, but Nextdoor stood out.

privacy@nextdoor.com

I wrote to privacy@nextdoor.com:

Referencing https://help.nextdoor.com/s/article/Information-for-people-who-don-t-use-Nextdoor-Products, may I have a copy of the information Nextdoor has for my email address?

They responded:

We’re sorry to hear that this email was unwelcome.

Upon review, there is no account associated with your email address, therefore we cannot provide a copy of your information.

From time to time, Nextdoor receives information from third parties about non-users. In your case, we received your information from a third-party partner and used this information to invite you to join your Nextdoor Neighborhood.

I can confirm that we have deleted from our systems the personal information associated with the email address that you used to contact us.

Let me know if you have any questions.

I didn't mention anything about an email. Apparently they get enough complaints about this practice that they just assume it is so.

Friday, December 30, 2011

#emotivehashtags

Earlier this week Sam Biddle of Gizmodo published How the Hashtag Is Ruining the English Language, decrying the use of hashtags to add additional color or meaning to text. Quoth the article, "The hashtag is a vulgar crutch, a lazy reach for substance in the personal void – written clipart." #getoffhislawn

Written communication has never been as effective as in-person conversation, nor even as simple audio via telephone. Presented with plain text, we lack a huge array of additional channels for meaning: posture, facial expression, tone, cadence, gestures, etc. Smileys can be seen as an early attempt to add emotional context to online communication, albeit a limited one. #deathtosmileys

Yet language evolves to suit our needs and to fit advances communications technology. A specific example: in the US we commonly say "Hello" as a greeting. Its considered polite, and it has always been the common practice... except that it hasn't. The greeting Hello entered the English language in the mid 19th century with the invention of the telephone. The custom until that time of speaking only after a proper introduction simply didn't work on the telephone, it wasn't practical over the distances involved to coordinate so many people. Use of Hello spread from the telephone into all areas of interaction. I suspect there were people at the time who bemoaned and berated the verbal crutch of the "hello" as they watched it push aside the more finely crafted greetings of the time. #getofftheirlawn

So now we have hashtags. Spawned by the space-constrained medium of the tweet, they are now spreading to other written forms. That they find traction in longer form media is an indication that they fill a need. They supply context, overlay emotional meaning, and convey intent, all lacking in current practice. Its easy to label hashtags as lazy or somehow vulgar. "[W]hy the need for metadata when regular words have been working so well?" questions the Gizmodo piece. Yet the sad reality is that regular words haven't been working so well. Even in the spoken word there is an enormous difference between oratory and casual conversation. A moving speech, filled with meaning in every phrase, takes a long time to prepare and rehearse. Its a rare event, not the norm day to day. The same holds true in the written word. "I apologize that this letter is so long - I lacked the time to make it short." quipped Blaise Pascal in the 17th century.

Disambiguation

Gizmodo even elicited a response from Noam Chomsky, probably via email, "Don't use Twitter, almost never see it."

What I find most interesting about Chomsky's response is that it so perfectly illustrates the problem which emotive hashtags try to solve: his phrasing is slightly ambiguous. It could be interpreted as Chomsky saying he doesn't use Twitter and so never sees hashtags, or that anyone bothered by hashtags shouldn't use Twitter so they won't see them. He probably means the former, but in an in-person conversation there would be no ambiguity. Facial expression would convey his unfamiliarity with Twitter.

For Chomsky, adding a hashtag would require extra thought and effort which could instead have gone into rewording the sentence. That, I think, is the key. For those to whom hashtags are extra work, it all seems silly and even stupid. For those whose main form of communication is short texts, it doesn't. #getoffmylawntoo

Friday, October 28, 2011

Tweetflection Point

Last week at the Web 2.0 Summit in San Francisco, Twitter CEO Dick Costolo talked about recent growth in the service and how iOS5 had caused a sudden 3x jump in signups. He also said daily Tweet volume had reached 250 million. There are many, many estimates of the volume of Tweets sent, but I know of only three which are verifiable as directly from Twitter:

50M tweets/day in March, 2010 according to a Twitter blog post.
140M tweets/day in March, 2011 according to that same Twitter blog post.
250M tweets/day in late October, 2011 according to Dick Costolo.

Graphing these on a log scale shows the rate of growth in Tweet volume, ~~roughly tripling in two years~~ almost tripling in one year.

This graph is misleading though, as we have so few data points. It is very likely that, like signups for the service, the rate of growth in tweet volume suddenly increased after iOS5 shipped. Lets assume the rate of growth also tripled for the few days after the iOS5 launch, and zoom in on the tail end of the graph. It is quite similar up until a sharp uptick at the end.

Speculative graph of average daily Tweet volume, knee of curve at iOS5 launch.

The reality is somewhere between those two graphs, but likely still steep enough to be terrifying to the engineers involved. iOS5 will absolutely have an impact on the daily volume of Tweets, it would be ludicrous to think otherwise. It probably isn't so abrupt a knee in the curve as shown here, but it has to be substantial. Tweet growth is on a new and steeper slope now. It used to triple in a bit over a year, now it will triple in way less than one year.

Why this matters

Even five months ago, the traffic to carry the Twitter Firehose was becoming a challenge to handle. At that time the average throughput was 35 Mbps, with spikes up to about 138 Mbps. Scaling those numbers to today would be 56 Mbps sustained with spikes to 223 Mbps, and about one year until the spikes exceed a gigabit.

The indications I've seen are that the feed from Twitter is still sent uncompressed. Compressing using gzip (or Snappy) would gain some breathing room, but not solve the underlying problem. The underlying problem is that the volume of data is increasing way, way faster than the capacity of the network and computing elements tasked with handling it. Compression can reduce the absolute number of bits being sent (at the cost of even more CPU), but not reduce the rate of growth.

Fundamentally, there is a limit to how fast a single HTTP stream can go. As described in the post earlier this year, we've scaled network and CPU capacity by going horizontal and spreading load across more elements. Use of a single very fast TCP flow restricts the handling to a single network link and single CPU in a number of places. The network capacity has some headroom still, particularly by throwing money at it in the form of 10G Ethernet links. The capacity of a single CPU core to process the TCP stream is the more serious bottleneck. At some point relatively soon it will be more cost effective to split the Twitter firehose across multiple TCP streams, for easier scaling. The Tweet ID (or a new sequence number) could put tweets back into an absolute order when needed.

Unbalanced link aggregation with a single high speed HTTP firehose.

Update: My math was off. Even before the iOS5 announcement, the rate of growth was nearly tripling in one year. Corrected post.

Sunday, September 4, 2011

Telling Strangers Where You Are

foursquare Ten One Hundred badge, for a thousand checkins. Not quite two years ago in this space I wrote about how I use foursquare. I've continued using the service since then, passing 1,000 checkins several months ago.

The first generation of location based services like foursquare have paid a lot of attention to privacy concerns. Explicit connection to other users is required in order to allow them to see your checkins. To do otherwise would have been perceived as creepy, the go-to label for vague privacy concerns. For those who do want to make their checkins public, Foursquare has an option to publish checkins to Twitter.

Yet social norms evolve, even in the span of just two years. Facebook Places and Google+ both offer checkins as a feature of their respective services. I've been periodically checking in on Google+ for several months. For routine trips I check in to a very limited circle of people, not so much out of concern about privacy as to not be spammy. For well-known venues I've been checking in publicly, and something fascinating happens: well-known venues are really well-known. Lots of people have been there, and they chime in with commentary and suggestions of things to see and do. Our trip to the Monterey Bay Aquarium was much improved by real-time suggestions from Google+ users, and pictures from the trip in turn made a couple other people think about going back.

Jeff Jarvis has long made the argument about the benefits of publicness, and that overemphasizing concerns about privacy undermines the benefits we could get by being connected. We use nebulous terms in justifying privacy like creepy, and stifle discussion of the value of openness. Our brains are really good at concocting (unlikely) scenarios of the bad things which could happen from sharing information, and not so good at seeing the good which can come of it. I'm definitely seeing that effect with public checkins, it seems scary but yet there is tremendous value in sharing them widely.

footnote: this blog contains articles on a range of topics. If you want more posts like this, I suggest the social label.

Saturday, July 23, 2011

Tweetalytics

Until this week I thought Twitter would focus on datamining the tweetstream rather than adding features for individual users. I based this in part on mentions by Fred Wilson of work by Twitter on analytics. I've been watching for evidence of changes I expected to be made in the service, intending to write about it if they appeared.

Earlier this week came news of a shakeup in product management at Twitter. Jack Dorsey seems much more focussed on user-visible aspects of the service, and I'm less convinced that backend analytics will be a priority now. Therefore I'm just going to write about the things I'd been watching for.

To reiterate: these are not things Twitter currently does, nor do I know they're looking at it. These are things which seemed logical, and would be visible outside the service.

Wrap all links: URLs passing through the firehose can be identified, but knowing what gets clicked is valuable. The twitter.com web client already wraps all URLs using t.co, regardless of their length. Taking the next step to shorten every non-t.co link passing through the system would be a way to get click data on everything. There is a downside in added latency to contact the shortener, but that is a product tradeoff to be made.

Unique t.co per retweet: There is already good visibility into how tweets spread through the system, by tracking new-style retweets and URL search for manual RTs. What is not currently visible is the point of egress from the service: which retweet actually gets clicked on. This can be useful if trying to measure a user's influence. An approximation can be made by looking at the number of followers, but that breaks down when retweeters have a similar number of followers. Instead, each retweet could generate a new t.co entry. The specific egress point would be known because each would have a unique URL.

Tracking beyond tweets: t.co tracks the first click. Once the link is expanded, there is no visibility into what happens. Tracking its spread once it leaves the service would require work with the individual sites, likely only practical for the top sites passing through the tweetstream. Tracking information could be automatically added to URLs before shortening, in a format suitable for the site's analytics. For example a utm_medium=tweet parameter could be added to the original URL. There might be some user displeasure at having the URL modified, which would have to be taken into account.

Each of these adds more information to be datamined by publishers. They don't result in user-visible features, and I suspect that as of a couple days ago user-visible features became a far higher priority.

footnote: this blog contains articles on a range of topics. If you want more posts like this, I suggest the Social label.

Saturday, July 16, 2011

Billions and Billions

In March, 2010 there were 50 million tweets per day.

In March, 2011 there were 140 million tweets per day.

In May, 2011 there were 155 million tweets per day.

Yesterday, apparently, there were 350 billion tweets per day.

350 million tweets/day would have been an astonishing 2.25x growth in just two months, where previously tweet volume has been increasing by 3x per year. 350 billion tweets/day is an unbelievable 2258x growth in just two months.

Quite unbelievable. In fact, I don't believe it.

350 billion tweets per day means about 4 million tweets per second. With metadata, each tweet is about 2500 bytes uncompressed. In May 2011 the Tweet firehose was still sent uncompressed, as not all consumers were ready for compression. 4 million tweets per second at 2500 bytes each works out to 80 Gigabits per second. Though its possible to build networks that fast, I'll assert without proof that it is not possible to build them in two months. Even assuming good compression is now used to get it down to ~200 bytes/tweet, that still works out to an average of 6.4 Gigabits per second. Peak tweet volumes are about 4x average, which means the peak would be 25 Gigabits per second. 25 Gigabits per second is a lot for modern servers to handle.

I think TwitterEng meant to say 350 million tweets per second. Thats still a breaktaking growth in the volume of data in just two months, and Twitter should be congratulated for operating the service so smoothly in the face of that growth.

Update: Daniel White and Atul Arora both noted that yesterday's tweet claimed 350 billion tweets delivered per day, where previous announcements have only discussed tweets per day. That probably means 350 billion recipients per day, or the number of tweets times the average fanout.

Update 2: In an interview on July 19, 2011 Twitter CEO Dick Costolo said 1 billion tweets are sent every 5 days, or 200 million tweets per day. This is more in line with previous growth rates.

Wednesday, July 13, 2011

Essayists and Orators

Recently Kevin Rose redirected his eponymous domain to his Google+ profile, reflecting that "G+ gives me more (real-time) feedback and engagement than my blog ever did." Earlier this year Steve Rubel deleted thousands of blog posts from older TypePad and Posterous sites, and started afresh on Tumblr.

Moving the center of one's online presence to "where the action is" is not a new phenomena. In 2008 Robert Scoble essentially abandoned his own sites in order to spend time on Friendfeed, the hot new social networking site at that time. Techcrunch even attempted an intervention over the move. After the Facebook acquisition of FriendFeed the site gradually decayed through benign neglect. Scobleizer moved on long ago.

Why do this? Surely its better to own your own domain and control your destiny? Or is it.

Essayists And Orators

In this discussion we'll focus on people who are online for more than just casual interaction or journaling, who have specific goals they are trying to accomplish with their online presence.

Essayists publish thoughtful prose, focussed on a particular topic. Presentation and style is important, but generally secondary to the density of ideas within. The product of their labor comes slowly, and is intended to stand for considerable time.

Orators can also deliver thoughtful ideas and spend considerable time preparing for it, but the dynamics are very different. The pace is faster, the interaction more frequent with less time to consider. The delivery and ideas can be adjusted over time, with each new presentation.

Translated to their online equivalents, I think we can still recognize the Essayist and Orator archetypes based on what they want people to find when they search. The world is a larger place now, when we want to know something outside of our knowledge we search for it.

For an Essayist, the desired result is a post with thoughts on the topic, linked to their name. For an Orator, the desired result is a conclusion that the orator is knowledgeable about the topic.

For an Essayist its important to keep material available for people to find, and in a form which links back to the author. Considerable effort has been spent to provide value up front. If someone needs more they can contact the author, who can provide additional help freely or with suitable compensation. Hosting on one's own site allows the linking of authorship to original material, and provides a stable contact point.

For an Orator, its more important that people find the author's name as someone knowledgeable about the topic. An Orator seeks contact much earlier in the process than an Essayist. They want a followup search to be for their name, to find out how to contact them. This desire for contact earlier in the process implies that the Orator will interact freely on many topics. At some point, if the searcher becomes convinced they can benefit from the Orator's expertise, they may discuss terms for further help.

For an Orator, its less important to have a stable presence online. The desired result is for someone to seek them out personally, and even if they move from one site to another search engines can be depended on to find their most recent incarnation.

I suspect this categorization paints with too broad a brush, as no one corresponds exactly to either archetype, but I'm finding it useful to consider.

Pictures of Abraham Lincoln and Frederick Douglas.

Lincoln and Douglas pictures courtesy Wikimedia Commons. Both are in the public domain in the United States.

footnote: this blog contains articles on a range of topics. If you want more posts like this, I suggest the Social label.

Tuesday, July 5, 2011

On the Naming of Our Relationships

I've never been happy with the use of a single word to describe associations on a social network, whether that word is "friending" or "following." Human relationships have a huge range of possibilities, and we use subtle variations in wording and adjectives to add nuance to our descriptions of them. To take just one example, "godmother," "stepmother," and "mother" all convey a beloved relationship, yet with vastly different levels of parental (and biological) involvement.

On Twitter I've tried to use Lists to broadly categorize those I follow into groups, mostly focussed on topics. An account can have at most twenty lists, so I've tended to be sparing in creating them. As Twitter has recently de-emphasized lists in the UI, I don't anticipate much more development there.

In the past week I've spent a lot of time on the Google+ Field Trial. There are a number of things I like about it, but one favorite is that I get to name the Circles I use. I can choose the terminology to define our relationship, how I see it from my own perspective.

Friends, BFFs, Family, Extended Family circles.

I also use circles to focus on particular topics, or on groups I am associated with. I rarely post to these circles, mostly just read.

FriendFeeders, Googlers, Journalistas, Networking, Scobleizers.

Its very liberating to be able to put a name on one's associations.

footnote: this blog contains articles on a range of topics. If you want more posts like this, I suggest the Social label.

Monday, May 30, 2011

Lamenting the Lack of Love for Lists

Twitter introduced Lists about a year and a half ago. When first introduced on twitter.com, Lists were shown in a drop-down menu on the profile page. You could select the dropdown and check off whichever of your lists you'd like to add them to.

New Twitter lists menu item Some time in the last several days this changed. Lists have moved one level deep in menus. Selecting "Add to list" brings up a floating window of checkboxes where list memberships can be changed.

Presumably this means a relatively small percentage of Twitter users made use of the Lists feature, as removing the dedicated icon declutters the UI. Its too bad that use of Lists was not more widespread. Akin to anchor text for web pages, the names of Lists to which one has been added to are an independent signal of the content and quality of an account.

Sunday, May 22, 2011

Tweetbots Need Exercise Too

Tweetbots repeating the same tweet over and over Search Twitter for the phrase "Right, off to the gym and to listen to the Packetpushers Openflow podcast"

I'll wait.

Note the large number of results, at least as of May 22, 2011. Apparently Twitter is just chock-full of exercise nuts who listen to techie networking content while working out. Yet, its odd that all of them use the exact same phrase. It is also odd that a couple accounts use the same photo.

Of course, its not odd at all: these are all bots. Twitter bots started out very simple, harvesting a random selection of tweets from the stream and using them as their own. They've evolved and become more believable, harvesting tweets of a particular theme. In this case, they selected tweets via exercise-related keywords like "gym" and "workout." That they happened to pick up a highly unusual topic is just dumb luck.

If you examine the tweetstream of any individual bot, its quite believable. They come across as an exercise-obsessed but otherwise normal person. The machine algorithms still fall prey to silly things like a tweet about getting up in the morning followed shortly thereafter by going to bed, but on the whole this crop of bots has advanced considerably since the last time I looked into it. The game is definitely afoot.

One final note: the podcast these bots mention is Packetpushers show 40, and if you are interested in techie networking content it is a good one to listen to. Perhaps you can listen while working out.

Monday, May 9, 2011

RSS Bucket Dipped Into the Stream

Yesterday Jesse Stay observed that Twitter and Facebook have both discontinued RSS feeds from various parts of their service. Later that day he commented on the lack of blog reactions. So here you go. Some thoughts, expanded from a comment I left on the original article.

A social service is able to offer a better experience from knowing who their users are and what they are reading. Learning the users interests allows the site to suggest related material, and also target advertising to the specific person. Content publishers in turn can get data about who finds their material interesting, not necessarily identifying the individuals but detailed demographics and related interests.

RSS doesn't fit into that world. The entity fetching the RSS feed is often not associated with an individual user at all, instead being an aggregator or other bit of infrastructure somewhere. Once syndicated via RSS, the originating service loses visibility into who accesses it. The aggregator might report a total number of readers, but not the same rich detail which the service would get natively. Users on RSS are thus far less valuable than users who come to the site, or tools which use the site's (authenticated) APIs. For the originator, tracked activity is much preferred over anonymous content consumption. Paywalls are another symptom of the same underlying phenomena: anonymous content consumption isn't working for the publishers.

Sunday, May 8, 2011

On the Nature of Premium Accounts

For almost as long as it has existed, people have speculated about premium Twitter accounts as a way to monetize the service. Thus far, no such offering has appeared. Disappointment that premium accounts have never materialized was quite eloquently voiced by Suw Charman-Anderson.

I believe there are two fundamentally different models for premium accounts: either as power users of the free service, or as consumers of the data produced by the free users. Lets consider an example of each.

Evernote premium accounts get an enhanced version of the free service. They can upload and index additional file types, and have added features for collaboration.
LinkedIn premium accounts access a different type of service. They can examine everyone's connections, not just their own network. It is aimed at recruiters, salespeople, analysts, etc looking for a contact, rather than someone they personally know.

Twitter already offers paid access to its firehose of data, both directly and via gnip. A number of brand tracking, reputation measurement, and sentiment analysis tools use this firehose. Twitter is already well down the path of offering datamining services, but has yet to introduce added features for individual users. Why not? Speculating about some of the premium features which might be offered, and attempting to analyze the impact, is illuminating.

Longer search history: Twitter search goes back only a handful of days. So far as I can tell, it searches the volume of tweets which will fit into RAM across a reasonable number of servers. Searching a much larger volume of tweets would call for a different architecture, possibly involving databases on disk and a vastly larger pool of servers to handle the load.

Might Twitter offer enhanced search as a paid service, stretching back much further in time and additional search operations? That seems likely, but I would point out that even today search is not tied to your account. The search is of the public tweetstream, with no biasing for those you follow. If Twitter offers an enhanced search product it could do so as a datamining feature, not tied to a premium account.

Analytics: How many people clicked on a t.co link? How many people saw a tweet (defined as their client actually fetching it)? These features appear tied to an account and good material for premium features, but consider what people willing to pay for it are really trying to do: measure the effectiveness in spreading an idea, a brand, a celebrity name, etc. Knowing how many people saw their own tweet isn't enough: they need to know about retweets, and even quoted paraphrases of their tweet. Knowing how many times their own t.co link was clicked isn't enough, they want to know how many times any link to their URL was clicked on.

If you're trying to measure effectiveness, analyzing just one account isn't enough. The demand for analytics is primarily a datamining feature.

Group Messaging: Would twitter offer a service which allowed premium accounts to send group DMs? Meaning, send a message which can be seen by multiple participants but not be publicly searchable. Presumably this would be tied in with Twitter lists. The existence of Beluga, GroupMe, Kik, etc implies there is a demand for such a service not filled by existing tools like email.

In terms of Twitter's business, the downside of a group messaging facility is that it reduces the value of the datamining service. If taking a conversation off the record is simple, people will use it. Influencers with many contacts are perhaps even more likely to use it, and that is data which Twitter wants to be part of the zeitgeist firehose they offer.

Other features: People ask for the ability to retrieve more than the most recent 3200 of their own tweets, for higher hourly rate limits, etc. It would be quite possible to offer premium accounts with substantially higher limits. Yet consider the reaction once such accounts are available: these are features which the free accounts have, but which are artificially limited. Lifting the limit doesn't feel like a premium feature, it feels like extortion. Lifting limits isn't a good basis for a premium account, you need a strong core feature set.

A Conclusion

Offering both premium features for individual accounts and datamining services over the tweetstream is difficult, as the two are often in conflict. Individual users want to maximize their own effectiveness and, quite frankly, reserve the benefits of their use of the service to themselves by restricting access to their tweets. Removing data from the public stream reduces the value of the firehose. I suspect this is the reason Twitter has not offered such accounts, as datamining the firehose is held to be more valuable. Offering premium accounts would inevitably bring pressure to offer the features which damage the value of the firehose.

Thursday, March 10, 2011

Experiential versus Informational Search

Yesterday Louis Gray posed a set of questions which seem like they should be easy to answer, but aren't. While reading it one of the questions stood out, for obvious reasons.

"2. When was the first time Denton Gentry left a comment on my blog?"

Louis had sent that question via email earlier in the day, and it turned out to be very difficult to answer. My profile on disqus.com shows comments going back to July 2009. That should be definitive, but unfortunately isn't correct as manual checking had already turned up earlier comments. In the end Louis answered his own question by searching his email for Disqus notifications. It was a full year before the first comment shown on my profile page. The other questions were similar: find the first citation. From a technical perspective, it should be easy to answer questions like this as all of the information is available. That it isn't easy is a reflection of economic reality. There is infrequent demand for it.

I'd like to flip it around, though: why is email able to answer questions like this? You can search email and sort it by date. You can find emails around a particular time. You can find emails which happened at about the same time as some other event which is unrelated, but intertwined in your memories. Why is email structured this way?

I suspect this is a reflection of human psychology. Email is information which we personally experienced. It exists in our own memories, albeit dimly or imperfectly. When we go to search for it, we're searching as an extension of our own memory. Its Experiential search, not Informational, and email services which don't match our expectations in this regard get less traction. This is also what makes services like Evernote so useful, letting us organize and search arbitrary information Experientially.

In comparison when searching for something we never personally experienced we're looking for information which we know must exist, and we just need to find it. Search engines are designed to this expectation.

The disconnect occurs when we want an Experiential search over an Informational dataset. Organizing arbitrary information in a way which maps to what we'd expect had we personally experienced it is an unsolved problem. It has been a rich field of speculation in science fiction, as authors have postulated implanted memories and neural interface.

Will there be developments in this area? Clearly there is at least some demand, as LexisNexis can answer such queries for the subset of publications they handle. Its something we'll need to work on if we're going to make the world even more like science fiction.

Sunday, February 20, 2011

From a Lawnchair Overlooking the Bot War

In the last several days I've noticed a large increase in the rate of bots following me on twitter. It went from perhaps one a day to dozens. The bots are following 100-200 people, and have always sent zero tweets. They sometimes have an avatar picture, and their bio always sounds convincing. The avatars and bios are probably harvested from real Twitter accounts, and sometimes they get the default avatar. They appear to avoid any bios with a link.

The email from Twitter tells you what client was used to follow you. They have cycled through various third party clients, and then Twitter's mobile web page. Most recently the email contains no mention of the client, and I don't know what that means.

What is interesting is that this influx of follow bots never appear to send any tweets. An entirely different herd of bots has started spamming via @-replies, with a link purporting to offer a free iPhone. The bots which send the spam follow zero people.

I wonder: are the botnets now fielding offensive and defensive teams? By this I mean using the offense to send spam, and watch for twitter users which block them. They can check whether the bot can still see the tweets of the users it has spammed. Users who react by blocking are likely reporting the bot for spam, and can themselves be targeted by the defense. The defense has never sent any spam, and can report legitimate users to try to get their account suspended.

Wednesday, December 29, 2010

OneTrueFan Observation

OneTrueFan is a service to track web history, allowing web users to see what sites they spend the most time visiting. It also allows site operators to see their most active users, though only amongst those who have signed up for the service. You can read more about OnerueFan here.

Users accumulate points for a site by visiting, sharing links, and other activities. OneTrueFan rate limits point increases amongst players to one point every few seconds. I have no way to tell if this was intended to discourage gaming of the system, or is a coalescing mechanism for scalability which batches site hits every few seconds. Nonetheless for any site with a large collection of pages, at present it is relatively easy to take the OneTrueFan title.

Screenshot of louisgray.com showing me as the One True Fan

My list of shared items Users who have installed the OTF browser extension see the web bar on every site they visit. Site owners can also install the web bar on their pages, making it visible to all visitors whether they use the browser extension or not. Hovering over the pictures in the web bar shows a list of recently shared links. My list of shared links is relatively tame. Its not difficult to imagine links to WoW Gold sales, or porn sites, or any of the other innumerable schemes which spam is used to peddle. The potential for mischief is there.

It is possible that OneTrueFan already has effective spam controls, focussed on the shared links rather than on obtaining the top spot in the fan list. I did not create a profile with spam links to check, nor do I intend to. In any case I expect they realize the importance of effective spam controls for a service which inserts content into other websites, and need to continue to focus on it.

Update: In the comments Eric Marcoullier (co-founder and co-CEO of OneTrueFan) described the current spam prevention tools in the service, and discussed some plans for the future.

Wednesday, September 29, 2010

Twitter URL Search No Worky?

Something is wonky with twitter's search function. In May 2010 twitter search started returning results for keywords in the original, unshortened links passing through the service. Were http://example.com/booga to be shortened and tweeted, searches for "example" and "booga" would turn up the tweet.

At some point in late September 2010 this seems to have changed. Now the original URL is generally not matched, with the possible exceptions of Top Tweets, Promoted Tweets, and Twitter's own t.co shortener. The change seems to have ramped in slowly, as even three days ago I was seeing some shortened URLs turn up in search results, while others did not. There were also a couple examples of the host portion like example.com not being indexed, but the rest of the URL triggering search results. I thought it was a glitch, but as of 9/28 it seems consistent: URLs behind third party shorteners are not being indexed.

Its possible this is just a glitch due to traffic growth. It could be a normal adjustment to the Twitter service, attempting to tweak search results for better relevance. It could be a rather bold incentive to use Twitter's URL shortener. However I cannot help but notice that it also makes room for features relating to brand management. Searching for something specific might use the regular search service, while monitoring for all links pointing to a site could be a service from a different, premium system.

Update: Apparently its just me. Tweets sent from @dgentry containing shortened links do not appear in search results for keywords in the original URL, starting late September. For example, the following tweets did not appear in search results for codingrelic or geekhold: 1 2 3 4 5 6 7 8. I also noticed thiat this tweet did not appear in a search for "ifixit." I suspect that this affects any tweet I've sent since mid-September, but I didn't start checking all of them until I realized there was a problem.

Friday, August 13, 2010

Bury Brigades as the Future of Media?

I'm currently reading Cognitive Surplus, by Clay Shirky. It builds upon his earlier Here Comes Everybody, detailing how the Internet fundamentally changes the media landscape to an extent not seen since Gutenberg. Before the Internet, when the cost of distribution was non-trivial, you ended up with publishers, producers, TV networks, and a whole host of powerful institutions built upon managing the production. When the cost of distributing media drops to essentially nothing, when everybody who wants to can become a publisher without having to ask permission or convince anybody of the value of their work, it completely disrupts the models which evolved in the prior era. A lot more material will be produced. Much of it will be trash, as we've moved the filtering function away from an editor before publication and onto the audience after publication.

Something will evolve to fill an institutional role in the New Media. The current period of creative chaos is unlikely to continue forever. A portion of the population is willing to wade through the trash in order to surface the truly great, but only a small portion. The rest of us need some filtering, or curation as the cool kids seem to call it.

Warning: Speculation Ahead

Are Digg Bury Brigades early precursors to a form of New Media institution? Organized groups, loosely connected by shared interests but not centrally funded or managed, they influence the spread of material online and therefore gain some control over media distribution. Bury Brigades are negative filters, suppressing material they don't agree with rather than surfacing material they want to promote. There will be equally a role for positive filters, entities which seek out and promote material. Motivation for groups to organize as positive filters is less clear, as simple altruism and a desire for recognition only go so far.

Tuesday, July 6, 2010

The New Intelligence Agency: All of Us

Yesterday Louis Gray pieced together vague snippets of information from tweets made by the founders and investors of Foursquare and Brizzly to speculate that Foursquare was negotiating to buy Brizzly. Later that day the speculation was denied by all parties involved. To (loosely) quote Mandy Rice-Davies: "They would deny it, wouldn't they?" ... I don't think this story is over.

Nonetheless the details of the speculated transaction are not our topic today. Instead, I'd like to consider the process which led to it. For decades government (and sometimes corporate) intelligence operations have had access to reams of communications data from which to make inferences. They could see who was calling whom, where letters and packages were being delivered, and know people's movements to some extent via airline manifests. Intelligence agencies are famous for collecting massive amounts of information and using algorithms to look for patterns, to be followed up by a human analyst.

We're rapidly moving into a world where a significant amount of that information is available to anyone who cares to look for it. We're using social networks which broadcast our updates publicly, either deliberately or because we don't understand the privacy settings. We're rapidly integrating location data into online applications, which people willingly share if they see a benefit from it. As the tweets Louis quoted show, people also love making coy hints about their dealings, secure in the knowledge that nobody will figure out such a vague hint. Yet given enough vague data, particularly if one is aware of existing connections between the participants, correlations can be found. Certainly there will be false positives, but there will also be some real gems.

Systematic data mining of social networks, both their contents and the metadata they contain, in order to gain competitive advantage has enormous implications. It apparently is already happening: military raids have been cancelled due to leaks on social networks, showing that government agencies are concerned about the possibility. For the most part it won't be reported on, and will become just another part of the Internet underpinnings.

Friday, May 14, 2010

Uncanny Friending

There is an urban legend that Eskimos have many different words for snow. The truth is the Aleut languages have about as many words for snow as does English, but allow descriptive suffixes to be attached to any word to form countless variations.

Consider the English words we use to describe human relationships, and the distinctions they convey in meaning:

sister	stepsister	half sister
significant other	fiancée	spouse
friend	just friends	friend with benefits
peer	coworker	colleague
mother	stepmother	godmother

We use adjectives to add huge amounts of information in a single word. "fiancée" conveys one meaning, that of a beloved person. "current fiancée" conveys an entirely different meaning, a disposable relationship given a label for convenience.

Now consider the words we use to describe relationships in social networks:

friend	friend	friend
friend	friend	friend
friend	friend	friend
friend	friend	friend
friend	friend	friend

Why do we find this unsatisfying? I believe it is a corollary to the Uncanny Valley effect in robotics and computer games: "friend" is close enough to the real description of the human relationship that we find it unsettling. If the term were more inhuman, less shaded with meaning, it would not be so maddening.

The term "like" has a similar problem: who wants to like something unpleasant or unsavory? Clicking "like" is meant is to express interest, but the terminology is close enough to the real intention to be maddeningly imprecise.

I also suspect this vaguely unsettling feeling will resolve itself in a few more years online: the words friend and like will simply lose all meaning. We'll know this has been achieved when people stop using air quotes to distinguish online friending versus real life friends.

This genesis of this musing came via an insightful tweet by Marshall Kirkpatrick:

told my wife that google "results from your social circle" showed me because we are friends. she insists we are more than that. true :) less than a minute ago via TweetDeck Marshall Kirkpatrick
marshallk