Monday, March 26, 2012

Needed a better Twitter so we made one for ourself

Yeah! We are out of our mind, or may be not. We felt a lot of things can be improved in twitter or micro-blogging for that matter. So what all things that we wanted to improve in a micro-blogging service:

Posting:

Give some more characters darn it. Have you ever felt like that you just need that extra 10,20 or so characters in your tweet. If so read on.

Why the hell I need to go to another service to shorten my URL. That's pretty basic. A micro-blogging site should do it automatically for you.

Following:

Why not follow my interests - in a wholesome way. So following people gives me a lot of sh*t. I have to wait for my meat. I will jump on to a conversation, if I find it interesting. One more thing about interest is it's not always people but tags. Some tags are interesting enough to follow and get a stream in.

Tagging:

This brings me to tagging. Tags are meta data not data. Please don't clutter my post with #FOO and #BAR. I should be able to add tags separately.

Can you help with suggesting some tags for a link. So we wanted an auto-tagging system, it need not to be perfect (as long as I can remove suggested tags which does not make sense), but give me something.

All this can be done. We need to do it for us. So we did it. If you are interested take a look at: ScoopSpot.

Friday, March 23, 2012

How we made a social news service with machine learning from scratch


How do you get to news which interests you in Hacker News or TechCrunch or The Next Web or the ReadWriteWeb? I mean when you go to a news website do you read all articles or do you scan the article titles and try to guess which ones to read based on your interests?

In my case I have noticed that I am looking for the news which interests me. Like startup news, Java or Ruby programming language news. So it beg the question can I build a system which scans through all the news and then classifies, indexes all relevant terms and serve to me so that I can get a stream of news which I would like to know and read about.

So we(my friend and I) started building a system which when given a website link does the following:

It tries to get the relevant content from the page.

It uses natural language processing to understand a bunch of things like: sentences, parts of speech etc. We score the word relevance in the article.

We also do classification and clustering with bunch of statistical methods of the article to find out which category an article belongs to. Like is it a technical or scientific or law related article.

Once we got the meat of the document we tag it with the terms that we have identified with the above steps.Now we were faced with the challenge that how can we make it available to us. We needed a web site which will allow us to see the content in stream. We should also be able to customize the stream to user’s choice. So we thought of creating a micro-blogging site, after all, we just needed to put a post with article title, URL and the tags. Here is how a post looks today in the stream.

If you would like to see a full stream:



As you can see here that I am seeing news with startup and java. As I am following news related to java and startup.

We noticed that we need more character support than 140 character of Twitter so we increased the character limit to 300. One more choice that we have made is adding tag separately from the actual post. This allowed us to add as many tags as we want. We feel tags are more like meta information so one should not be limited with hash tags.

Once we have such a system, we needed a way to get the latest news from Hacker News, TechCrunch, TheNextWeb etc. RSS feed came to our rescue. These websites provides RSS feed which we read periodically. We created an account called news in ScoopSpot which will do the posting in our micro-blogging site. If you would like to see news in action, visit: http://news.scoopspot.com/.

So now if a user comes to our web site and post something with an article link we are now able to auto tag it.

Here are some links to tag based pages:

Ruby: http://www.scoopspot.com/Ruby

Startup: http://www.scoopspot.com/Startups

Steve Jobs: http://www.scoopspot.com/Steve_Jobs

If you are interested to try it out you can visit: www.scoopspot.com