You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm also looking for a decent training set for casual conversations, actually for a language learning chatbot.
But it seems this project only has ~ 200k of logs. It's a start but...
What other sources do you know? I'm sharing some info hope others can also suggest where to look
I'm also looking for a decent training set for casual conversations, actually for a language learning chatbot.
But it seems this project only has ~ 200k of logs. It's a start but...
What other sources do you know? I'm sharing some info hope others can also suggest where to look
Cornell's convokit
provides an API onto some really good sets like the famous movie dialogue corpus and also a structured API for some subreddits
https://convokit.cornell.edu/
Facebook's Parl.ai
has a standardized API to lots of datasets
https://parl.ai/about/
eg. https://arxiv.org/pdf/1801.07243.pdf
tatoeba
has a good sentence database but no conversation turns
https://tatoeba.org/eng/
I'm keeping archives of a few things I find. Here are a bunch of logs for teach English conversation
https://github.com/dcsan/corpus/blob/master/convo/esl-china/esl06.csv
some of which could be converted for use here.
What other sources have people found for conversations?
The text was updated successfully, but these errors were encountered: