The information gathering is rather simple actually. There are already APIs available that allow you to do that with python. There are two major sources of information on the internet that I will use for this: the Wolfram API by Jason Coombs and the Wikipedia API by Jonathan Goldsmith. Both are nicely written and can help you set it up.
Now we can define some functions that take speech as input and outputs the relevant information. The trick here is to remove noise words like “what”, “who”, etc. This can be accomplished with the
re package. Here is the
import re, wolframalpha def wolframLookUp(a_string): client = wolframalpha.Client(keyring.get_password('wolfram','app_id')) pattern = re.compile('([^\s\w]|_)+') b_string = re.sub(pattern, '', a_string) phrase=b_string pattern = re.compile("\\b(what|is)\\W", re.I) phrase_noise_removed = [pattern.sub("", phrase)] try: res= client.query(a_string) return next(res.results).text except: return "Sorry"
As you can see you need to have an
application id from Wolfram. You can easily sign up and get an id here and just load that into your keyring. Also I implemented a
try/except block because sometimes (not often) the WolframAlpha API will not return anything and this will catch that.
wikipediaLookUp function is very similar:
import re, wikipedia def wikipediaLookUp(a_string,num_sentences): print a_string pattern = re.compile('([^\s\w]|_)+') b_string = re.sub(pattern, '', a_string) phrase=b_string print phrase pattern = re.compile("\\b(lot|lots|a|an|who|can|you|what|is|info|somethings|whats|have|i|something|to|know|like|Id|information|about|tell|me)\\W", re.I) phrase_noise_removed = [pattern.sub("", phrase)] print phrase_noise_removed a = wikipedia.search(phrase_noise_removed) print a the_summary = (wikipedia.summary(a, sentences=num_sentences)) print the_summary return the_summary
As you can see there are lot more noise words removed from the wikipedia lookup because it is not as smart as parsing as WolframAlpha. Also there is a second parameter passed,
num_sentences, which allows the user to determine whether they want a long or a short response.