As in the last two months I read a lot about chatbots which awakens in me the desire to develop my own chatbot. And of course the most trendy approach is some deep learning. That's why as a first step a decided to collect the available conversation datasets which are definitely needed for training. Here is the list of English conversation datasets I found: (If you know about more please leave a comment.)
Data collected from twitter (by Chenhao Tan):
Argument trees, "successful persuasion" metadata, and related data from the subreddit ChangeMyView. First release 2016.
Multi-community engagement (users posting, or not posting, in different subreddits since Reddit's inception). Data includes the texts of posts made and associated metadata, such as the subreddit, the "number" of upvotes, and the time stamp. First release 2015.
Cornell natural-experiment tweet pairs: data for investigating whether whether phrasing affects message propagation, controlling for user and topic. zip file can be retrieved from the given URL (first release 2014)
Supreme Court dialogs corpus: conversations and metadata (such as vote outcomes) from oral arguments before the US Supreme Court (first release 2012)
Wikipedia editor conversations corpus: zip file can be retrieved from the page I've linked to (first release 2012)
Cornell movie-dialogs corpus: conversations and metadata (IMDB rating, genre, character gender, etc.) from movie scripts (first release 2011). This corpus contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts: 220,579 conversational exchanges between 10,292 pairs of movie characters.
Microsoft Research Social Media Conversation Corpus. A collection of 12,696 Tweet Ids representing 4,232 three-step conversational snippets extracted from Twitter logs. Each row in the dataset represents a single context-message-response triple that has been evaluated by crowdsourced annotators as scoring an average of 4 or higher on a 5-point Likert scale measuring quality of the response in the context.
And a conversation on Reddit about a Reddit corpus.
The Santa Barbara corpus is an interesting one because it's a transcription of spoken dialogues.
The NPS Chat Corpus is part of the Python NLTK. Release 1.0 consists of 10,567 posts out of approximately 500,000 posts we have gathered from various online chat services in accordance with their terms of service. Future releases will contain more posts from more domains.
NUS Corpus is a collection of SMS messages. There is English and Chines corpus as well.
Off: during my research for conversation datasets I found a relatively large collection of public datasets
here .
EDIT: you can also check the
collection of QA datasets.
ALSO CHECK OUT
THIS more comprehensive list of dialogue datasets.
28 comments:
This is so helpful ! Thanks.... I owe you at least a beer or a coffee !
The blog was very informative, I am really crazy about chatbots. I really appreciate your work.
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging
Best Machine Learning Training courses | best machine learning institute in chennai | Machine Learning course in chennai
Amazing Article Written. I am very much glad to read your article.
I am Following Your From Last 6 Month and really linking the stuff
you post on your blog on Regular Basis.
Keep Posting blogs like this….. Thanks alot
also we provide WhatsApp API Integration Services. if any thing you need then please contact us
We are specialize in ChatBot development services. If you're looking to build your bot on any of the popular chat applications - we should have a talk!
really i like your blog we also provide same blog Integrate WhatsApp with Tally
chatbot for marketing is the upcoming great feature in the field of marketing.
Incredible blog... Keep sharing.. Thanks alot!!!
Creative Graphic Design
Hiii....Thank you so much for sharing Great information...Nice post...Keep move on...
Best Python Training Institutes in Hyderabad
Thanks for sharing this valuable information and we collected some information from this blog.
Machine learning in-house Corporate training in Nigeria
Hi,
Best article, very useful and well explanation. Your post is extremely incredible.Good job & thank you very much for the new information, i learned something new. Very well written. It was sooo good to read and usefull to improve knowledge. Who want to learn this information most helpful. One who wanted to learn this technology IT employees will always suggest you take Data science course in Pimple Saudagar
Hiii...Thanks for sharing Great info...Nice post Keep move on...
Python Training in Hyderabad
Best information share
thank you
Logo Design
They’re really convincing and will definitely work. Still, the posts are too brief for newbies. May you please extend them a little from subsequent time?Also, I’ve shared your website in my social networks.
Chatbot Company in Dubai
Chatbot Companies in Dubai
Chatbot Development
AI Chatbot Development
Chatbot Companies in UAE
Chatbot Company in Chennai
Chatbot Company in Mumbai
Chatbot Company in Delhi
Chatbot Development Companies
Thanks for sharing this very good write-up. Very interesting ideas! (as always, btw)
Django online training
Django training
Go Language online training
Go Language training
Hibernate online training
Hibernate training
Hyperion ESS Base online training
Hyperion ESS Base training
Hyperion Fdqm online training
This content of information has helped me a lot. It is very well explained and easy to understand.
AI Corporate Training
https://www.analyticspath.com/artificial-intelligence-corporate-training
The content of this website was really informative. 50 High Quality Backlinks for just 50 INR
2000 Backlink at cheapest
5000 Backlink at cheapest
Boost DA upto 15+ at cheapest
Boost DA upto 25+ at cheapest
Boost DA upto 35+ at cheapest
Boost DA upto 45+ at cheapest
Annabelle loves to write and has been doing so for many years.Backlink Indexer My GPL Store Teckum-All about Knowledge
Great Post! Thanks for sharing informative article.
also we provide Web Design & Development Services in Melbourne. if any thing you need then please contact us.
KGF 2 Release Date Directed by Prashanth Neel. With Yash, Sanjay Dutt, Raveena Tandon, Prakash Raj. The blood-soaked land of Kolar Gold Fields
Hello, this weekend is good for me, since this time i am reading this enormous informative article here at my home. Chatbot for beginners
A woman excess weight around her breast and this is what she used to remove the fat around the breast
Welcome to CapturedCurrentNews – Latest & Breaking India News 2021
Hello Friends My Name Anthony Morris.latest and breaking news linkfeeder
chatbot for beginners You made such an interesting piece to read, giving every subject enlightenment for us to gain knowledge. Thanks for sharing the such information with us to read this...
Thanks for the blog loaded with so many information. Stopping by your blog helped me to get what I was looking for. sales qualification
I'm glad to see the great detail here!. organization professional
Nice Post!
ai chatbot services
IT smart workforce services
Post a Comment