Wednesday, August 10, 2016

Best of July



I found the blog "The morning paper", very good. There was a series of papers about chatbot development. The blog is changing topic often by reviewing papers from actual conferences but there are several interesting ones even if it's not about chatbots. Anyway here is the selection about "chatbot" papers:


Chatbots are not good enough (yet) to replace humans so why do not automate human workforce. Scale (http://www.scaleapi.com/) is a startup who want's to do this. Let get done short tasks by human workers through API calls.

And finally some marketing related infographics from Kissmetrics I came by during the month: How to calculate customer lifetime value:

How To Calculate Lifetime Value – The Infographic

Wednesday, July 13, 2016

A more comprehensive list of conversational datasets

The list just the copy of the datasets listed in A survey of available corpora for building data-driven dialogue systems (Serban et al. 2015). All credits goes for them. I simply try to make it more accessible. I couldn't find the download location for every one of them, so if you see a '?' as URL and you know how to get the data, please write a comment about it.

(same google doc as a link)

Monday, July 4, 2016

Best in June

As last month I posted a video about "hyper reality" I also found a writing about "hyper personalization". For me this is just personalization but either way we call it I agree that tere is a move in this direction:
Focus on 3 important ‘W” of hyper-personalization:
  • Who you personalize
  • What you personalize and
  • Where you personalize.
 Anyway, let's switch back to my favorite topic these days, chatbots :) I found a page, it's not exactly about chatbots, but it shows how interesting this simple chat interface can be. There is a weekend project to post interviews as chat one-to-one chat dialog. Check out Talk turkey!

And finally I found a pretty comprehensive list of dialog datasets. I found it in this article: A survey of available corpora for building data-driven dialogue systems (Serban et al. 2015). If you don't want to read the whole 46 pages (btw it's probably worth for you if you are interested in developing chatbots) I'm still processing the list but I will make it available in a separate post soon.

STRATA + Hadoop world London was organized in jun. If somebody missed the keynote from Stuart Russel, I recommend to watch it. He is talking about The future of (artificial) intelligence.

#FUN: And you can also check http://www.projectmurphy.net/ Microsoft's AI bot:
With the Bot Framework you can add the bot on Skype, Messenger, telegram etc.. and ask it all life's most important questions such as: "What if Charlie Chaplin was a baby?" or "What if Beethoven was a rockstar!" Project Murphy then uses artificial intelligence to answer these questions by combining the subject's face with the object of interest i.e. a baby's face smartly added on top of Charlie Chaplin's face.

Thursday, June 16, 2016

Best in May

In may there was just a couple of stories grabbing may attention. Here are those:



And here is a movie about augmented reality. As I'm working with personalization in retail space, it was particularly interesting for me.

Sunday, May 1, 2016

Best in April

Still chatbots and more. Sundar Pichai said this last week on Alphabet’s earnings call:
In the long run, I think we will evolve in computing from a mobile-first to an AI-first world.

Reinforcement learning toolkit for Python: OpenAI Gym

Speech KITT makes it easy to add a GUI to sites using Speech Recognition. Speech KITT provides a graphical interface for the user to start or stop Speech Recognition and see its current status. It can also help guide the user on how to interact with your site using their voice, providing instructions and sample commands.
Rod Humble had previously spent three years as the CEO of Linden Lab, and before that worked on the Sims franchise as an EVP at Electronic Arts. In June, an automated conversation company called PullString (formerly ToyTalk) hired him to create a new series of games for Facebook Messenger called Humani. Here is a nice article about this game: Humani: Jessie's Story.

Tuesday, April 19, 2016

Chatbot craze - deep learning for chatbots intro

I'm just participating the Deep Learning Udacity course (I really like it :) When I found an article: Deep Learning for Chatbots, Part 1 – Introduction. Exact the topic I'm really interested in.

It's a nice introduction. And it sheds some lights on the capabilities of the state-of-the-art systems. E.g.: "However, we’re still at the early stages of building generative models that work reasonably well. Production systems are more likely to be retrieval-based for now."

Also I found interesting references about incorporating context into generativ models: "Experiments in Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models and Attention with Intention for a Neural Network Conversation Model both go into that direction."

Sunday, April 17, 2016

Question Answering datasets

To extend the list of conversational datasets there is a collection of Question Answering (QA) datasets. A question-answer pair is a very short conversation which can be also  used to train chatbots. If you want to use the chatbot for giving information for customers, like automated customer support or automated sales agent on your website, this type of datasets can be particularly useful.

The WikiQA corpus is a new publicly available set of question and sentence pairs, collected and annotated for research on open-domain question answering.

Usually on TREC (Text REtrieval Conference) there is a QA task which has some kind of datasets associated with it. Most of the datasets are focusing on factoid QA task but the one in 2015 is a kind of live QA. The task was to answer questions on Yahoo Answers.

Manually-generated factoid question/answer pairs with difficulty ratings from Wikipedia articles. Dataset includes articles, questions, and answers.

There are some manually curated QA datasets from Yahoo Answers from Yahoo.

You also can download the Stack Overflow questions and answers. It's a domain specific but huge dataset.

Saturday, April 9, 2016

Conversational datasets to train a chatbot

As in the last two months I read a lot about chatbots which awakens in me the desire to develop my own chatbot. And of course the most trendy approach is some deep learning. That's why as a first step a decided to collect the available conversation datasets which are definitely needed for training. Here is the list of English conversation datasets I found: (If you know about more please leave a comment.)

Data collected from twitter (by Chenhao Tan):




  • Argument trees, "successful persuasion" metadata, and related data from the subreddit ChangeMyView. First release 2016.




  • Multi-community engagement (users posting, or not posting, in different subreddits since Reddit's inception). Data includes the texts of posts made and associated metadata, such as the subreddit, the "number" of upvotes, and the time stamp. First release 2015.




  • Cornell natural-experiment tweet pairs: data for investigating whether whether phrasing affects message propagation, controlling for user and topic. zip file can be retrieved from the given URL (first release 2014)





  • Supreme Court dialogs corpus: conversations and metadata (such as vote outcomes) from oral arguments before the US Supreme Court (first release 2012)
  • Wikipedia editor conversations corpus: zip file can be retrieved from the page I've linked to (first release 2012)
  • Cornell movie-dialogs corpus: conversations and metadata (IMDB rating, genre, character gender, etc.) from movie scripts (first release 2011). This corpus contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts: 220,579 conversational exchanges between 10,292 pairs of movie characters.
  • Microsoft Research Social Media Conversation Corpus. A collection of 12,696 Tweet Ids representing 4,232 three-step conversational snippets extracted from Twitter logs. Each row in the dataset represents a single context-message-response triple that has been evaluated by crowdsourced annotators as scoring an average of 4 or higher on a 5-point Likert scale measuring quality of the response in the context.
  • And a conversation on Reddit about a Reddit corpus.
  • The Santa Barbara corpus is an interesting one because it's a transcription of spoken dialogues.
  • The NPS Chat Corpus is part of the Python NLTK. Release 1.0 consists of 10,567 posts out of approximately 500,000 posts we have gathered from various online chat services in accordance with their terms of service. Future releases will contain more posts from more domains. 
  • NUS Corpus is a collection of SMS messages. There is English and Chines corpus as well.


  • Off: during my research for conversation datasets I found a relatively large collection of public datasets here .

    EDIT: you can also check the collection of QA datasets.
    ALSO CHECK OUT THIS more comprehensive list of dialogue datasets.

    Best in March

    This month is still about AI more specifically about chatbots. There was so many news about this:

    Techcrunch wrote that Facebook’s Messenger Bot Store could be the most important launch since the App Store.

    Tay, Microsoft's AI Twitter chatbot got racist.

    PocketConfidant developed an AI for coaching through a chat interface.

    Robot At SXSW Says She Wants To Destroy Humans ...

    There was more companies on Y-combinator's demo day dealing with this topic:
    • Nova uses artificial intelligence to write sales emails automatically. It can search the web and social media for facts about the recipient that it can include in the email, like that they were recently the subject of a news article, or enjoy a specific hobby. Nova’s emails perform better than humans. They get a 67% open rate and 11% click through rate.
    • MSG.ai helps onoperating chatbots on multiple platforms. It offers a centralized dashboard to detect trends and sentiments, and integrates with Salesforce Desk and Zendesk. With Msg.ai’s intelligence and A/B testing, businesses can maximize the benefit of their chatbots.
    • Sendbird provides a UI, SDK and backend to easily add chat functionality for websites and apps.
    • With Chatfuel, those looking to build and engage an audience can use the native interface to create bots that help facilitate conversations. More than 130,000 bots have been created on the platform. Publishers, like TechCrunch and Forbes, can build on Chatfuel and deploy to any messengers.
    • Promt is a chatbot building platform that lets businesses build a chatbot in 15 minutes with 15 lines of code. It can then be deployed instantly to Slack, Line, WeChat, SMS, and soon Facebook Messenger.

    Sunday, March 13, 2016

    Best in February

    I'm reading many interesting news and articles on the web and it's so hard to find them later that I decided to collect the best ones in a post at least once a month. Just a short summary and the link to the original article. If you have similar interest it will be probably a good collection for you too.

    Tensorflow: the python deep learning framework from Google. Probably it's not so polished as Theano a widely used Python deep learning library but for me it's much easier to understand and write code in it. And if you want to learn it, here is a great tutorial. And there is also a tool where you can use Theano or Tensorflow through the same interface.

    The zero UI is the new trend. I love it. Use chat or SMS as a communication interface. Here is a blog post and an other article about this. Big companies are trending in this direction. The only obstacle what slows down the progress is that machines are not yet able to understand human language. But there is so many development in this direction that in a couple of years I expect a significant breakthrough in natural language understanding. (And here is where singularity come into the picture...)

    Just for fun: you can live inside Van Gogh's famous painting thanks to the Art Institute of Chicago.