
For years, I’ve been using the following snippet in the solution for one of my interview questions:
anagrams = dict()
with open(WORDS_PATH) as f:
for line in f:
key = sort(line.strip())
if key not in anagrams:
anagrams[key] = list()
anagrams[key].append(line.strip())
else:
anagrams[key].append(line.strip())
Recently, I learned to use the dict.setdefault function to further optimize it, and the end result looks like the following:
anagrams = dict()
with open(WORDS_PATH) as f:
for line in f:
key = sort(line.strip())
anagrams.setdefault(key, []).append(line.strip())
Get the list of anagrams for key, or set it to [] if not found; setdefault returns the value, so it can be updated without requiring a second search. In other words, the end result of this line…
anagrams.setdefault(key, []).append(line.strip())
…is the same as running…
if key not in anagrams:
anagrams[key] = list()
anagrams[key].append(line.strip())
else:
anagrams[key].append(line.strip())
…except that the latter code performs at least two searches for key – three if it’s not found – while setdefault does it all with a single lookup.