Saturday, April 9, 2016

Conversational datasets to train a chatbot

As in the last two months I read a lot about chatbots which awakens in me the desire to develop my own chatbot. And of course the most trendy approach is some deep learning. That's why as a first step a decided to collect the available conversation datasets which are definitely needed for training. Here is the list of English conversation datasets I found: (If you know about more please leave a comment.)

Data collected from twitter (by Chenhao Tan):

  • Argument trees, "successful persuasion" metadata, and related data from the subreddit ChangeMyView. First release 2016.

  • Multi-community engagement (users posting, or not posting, in different subreddits since Reddit's inception). Data includes the texts of posts made and associated metadata, such as the subreddit, the "number" of upvotes, and the time stamp. First release 2015.

  • Cornell natural-experiment tweet pairs: data for investigating whether whether phrasing affects message propagation, controlling for user and topic. zip file can be retrieved from the given URL (first release 2014)

  • Supreme Court dialogs corpus: conversations and metadata (such as vote outcomes) from oral arguments before the US Supreme Court (first release 2012)
  • Wikipedia editor conversations corpus: zip file can be retrieved from the page I've linked to (first release 2012)
  • Cornell movie-dialogs corpus: conversations and metadata (IMDB rating, genre, character gender, etc.) from movie scripts (first release 2011). This corpus contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts: 220,579 conversational exchanges between 10,292 pairs of movie characters.
  • Microsoft Research Social Media Conversation Corpus. A collection of 12,696 Tweet Ids representing 4,232 three-step conversational snippets extracted from Twitter logs. Each row in the dataset represents a single context-message-response triple that has been evaluated by crowdsourced annotators as scoring an average of 4 or higher on a 5-point Likert scale measuring quality of the response in the context.
  • And a conversation on Reddit about a Reddit corpus.
  • The Santa Barbara corpus is an interesting one because it's a transcription of spoken dialogues.
  • The NPS Chat Corpus is part of the Python NLTK. Release 1.0 consists of 10,567 posts out of approximately 500,000 posts we have gathered from various online chat services in accordance with their terms of service. Future releases will contain more posts from more domains. 
  • NUS Corpus is a collection of SMS messages. There is English and Chines corpus as well.

  • Off: during my research for conversation datasets I found a relatively large collection of public datasets here .

    EDIT: you can also check the collection of QA datasets.
    ALSO CHECK OUT THIS more comprehensive list of dialogue datasets.


    Vishal said...

    This is so helpful ! Thanks.... I owe you at least a beer or a coffee !

    Unknown said...

    The blog was very informative, I am really crazy about chatbots. I really appreciate your work.

    Unknown said...

    nice blog
    hadoop training in chennai

    Unknown said...

    nice blog
    android training in bangalore
    ios training in bangalore
    machine learning online training

    Cloud Dial said...

    Thanks for sharing informative blog. Few Things You Should Know visit here :
    Cloud Dial
    Telecommunication solutions

    Unknown said...

    Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging
    Best Machine Learning Training courses | best machine learning institute in chennai | Machine Learning course in chennai

    UMS Tech Labs said...

    Amazing Article Written. I am very much glad to read your article.
    I am Following Your From Last 6 Month and really linking the stuff
    you post on your blog on Regular Basis.
    Keep Posting blogs like this….. Thanks alot

    also we provide WhatsApp API Integration Services. if any thing you need then please contact us

    Buy Growth Flex Online said...

    Facebook Bot Development
    Automate your process with our Facebook Messenger Bot Development services. We provide full circle consultancy, development and ongoing support.

    Augurs Technologies Pvt Ltd. said...

    We are specialize in ChatBot development services. If you're looking to build your bot on any of the popular chat applications - we should have a talk!

    wautomate said...

    really i like your blog we also provide same blog Integrate WhatsApp with Tally

    Erick rowan said...

    chatbot for marketing is the upcoming great feature in the field of marketing.

    ums tech lab said...

    I am really impressed with your blog article, such great & useful knowledge you mentioned here.Your post is very informative. I have read all your posts and all are very informative. Thanks for sharing and keep it up like this.
    WhatsApp API

    Hobbs Parker said...

    This is really a very good article. Thanks for taking the time to discuss with us, I feel happy about learning this topic. keep sharing your information regularly for my future reference. if you look out this our WhatsApp API Services. if any thing you need then please visit us

    Desiber said...

    Incredible blog... Keep sharing.. Thanks alot!!!
    Creative Graphic Design

    Skein Tech said...

    Thanks for sharing Information to us. If someone wants to know about,I think this is the right place for you!

    Android App Development in Coimbatore
    Chatbot Development Company
    3D Animation Company

    unknown said...

    Hiii....Thank you so much for sharing Great information...Nice post...Keep move on...
    Best Python Training Institutes in Hyderabad

    Kavi Priya said...

    Well written articles like yours renews my faith in today's writers. The article is very informative. Thanks for sharing such beautiful information.
    AI Chatbot
    Chatbot Development
    RPA Bot
    Bank Chatbot
    Chatbots in Banking

    educational blogs said...

    Thanks for sharing this valuable information and we collected some information from this blog.

    Machine learning in-house Corporate training in Nigeria

    easylearn said...

    Best article, very useful and well explanation. Your post is extremely incredible.Good job & thank you very much for the new information, i learned something new. Very well written. It was sooo good to read and usefull to improve knowledge. Who want to learn this information most helpful. One who wanted to learn this technology IT employees will always suggest you take Data science course in Pimple Saudagar

    unknown said...

    Hiii...Thanks for sharing Great info...Nice post Keep move on...
    Python Training in Hyderabad

    Unknown said...

    Best information share
    thank you
    Logo Design

    Monica MS said...

    They’re really convincing and will definitely work. Still, the posts are too brief for newbies. May you please extend them a little from subsequent time?Also, I’ve shared your website in my social networks.
    Chatbot Company in Dubai
    Chatbot Companies in Dubai
    Chatbot Development
    AI Chatbot Development
    Chatbot Companies in UAE
    Chatbot Company in Chennai
    Chatbot Company in Mumbai
    Chatbot Company in Delhi
    Chatbot Development Companies