Declarative information gathering

The information gathering is rather simple actually. There are already APIs available that allow you to do that with python. There are two major sources of information on the internet that I will use for this: the Wolfram API by Jason Coombs and the Wikipedia API by Jonathan Goldsmith. Both are nicely written and can help you set it up.

Now we can define some functions that take speech as input and outputs the relevant information. The trick here is to remove noise words like “what”, “who”, etc. This can be accomplished with the re package. Here is the wolframLookUp function:

import re, wolframalpha

def wolframLookUp(a_string):
	client = wolframalpha.Client(keyring.get_password('wolfram','app_id'))
	pattern = re.compile('([^\s\w]|_)+')
	b_string = re.sub(pattern, '', a_string)
	phrase=b_string
	pattern = re.compile("\\b(what|is)\\W", re.I)
	phrase_noise_removed = [pattern.sub("", phrase)]
	try:
		res= client.query(a_string)
		return next(res.results).text
	except:
		return "Sorry"

As you can see you need to have an application id from Wolfram. You can easily sign up and get an id here and just load that into your keyring. Also I implemented a try/except block because sometimes (not often) the WolframAlpha API will not return anything and this will catch that.

The wikipediaLookUp function is very similar:

import re, wikipedia

def wikipediaLookUp(a_string,num_sentences):
	print a_string
	pattern = re.compile('([^\s\w]|_)+')
	b_string = re.sub(pattern, '', a_string)
	phrase=b_string
	print phrase
	pattern = re.compile("\\b(lot|lots|a|an|who|can|you|what|is|info|somethings|whats|have|i|something|to|know|like|Id|information|about|tell|me)\\W", re.I)
	phrase_noise_removed = [pattern.sub("", phrase)]
	print phrase_noise_removed[0]
	a = wikipedia.search(phrase_noise_removed[0])
	print a[0]
	the_summary = (wikipedia.summary(a[0], sentences=num_sentences))
	print the_summary
	return the_summary

As you can see there are lot more noise words removed from the wikipedia lookup because it is not as smart as parsing as WolframAlpha. Also there is a second parameter passed, num_sentences, which allows the user to determine whether they want a long or a short response.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: