If it won't be simple, it simply won't be. [Hire me, source code] by Miki Tebeka, CEO, 353Solutions

Wednesday, October 31, 2007

Quick Web Searches

A little script to search the documents in Journal of Machine Learning.

import webbrowser
from urllib import urlencode

def jmlr(words):
query = "site:http://jmlr.csail.mit.edu filetype:pdf " + " ".join(words)
url = "http://www.google.com/search?" + urlencode([("q", query)])

webbrowser.open(url)

if __name__ == "__main__":
from optparse import OptionParser

parser = OptionParser("usage: %prog WORD1 [WORD2 ...]")

opts, args = parser.parse_args()
if not args:
parser.error("wrong number of arguments") # Will exit

jmlr(args)

I have many more is these for Google search, Wikipedia search, Acronym Search ...
The trick is to do a search using the web interface and then look at the URL of the results page.

Tuesday, October 23, 2007

Minimal testing

I'm using py.test as our test suite. And found out that even the most minimal tests give me great benefits:
def test_joke():
pass
One I have that in a file call test_joke.py,
py.test will pick it up and try to run it.

The good thing to have only this minimal code is that the test will fail if you happen to introduce a syntax error or some error in the module initialization.

Of course, when I have more time I beef up the tests ;)

What make it even more useful is a continuous integration system that run the tests every time someone checks in the code.

Wednesday, October 10, 2007

Persistant ID generator

Let's say you want to map key->number, and every time you get a new key you give it a different number.
(Useful for mapping words to vector index in IR)

The easy way is:
from collections import defaultdict
from itertools import count

vector_index = defaultdict(count(0).next)
print vector_index["a"] # 0
print vector_index["a"] # 0
print vector_index["b"] # 1

Blog Archive