As a short companion piece to my post about Shingling in XQuery, and as an exercise to keep my Python skills sharp, I rewrote my XQuery example in Python.
[word[i:i + n] for i in range(len(word) - n + 1)]
This makes use of Python's list comprehension technique to succinctly render n-shingles for a given word. I used that as inspiration for my word-based w-shingle method.
theString = "the quick brown fox jumps over the lazy dog. now is the time for all good men to come to the aid of the party"
shingleLength = 3
tokens = theString.split()
print [tokens[i:i+shingleLength] for i in range(len(tokens) - shingleLength + 1) if len(tokens[i]) < 4]
Enjoy!
Character Shingling in Python
It is slightly easier to find examples of shingling in Python. For example, I found this one line example of character shingling in this blog post http://jutopia.tirsen.com/2011/08/26/shingles.html[word[i:i + n] for i in range(len(word) - n + 1)]
This makes use of Python's list comprehension technique to succinctly render n-shingles for a given word. I used that as inspiration for my word-based w-shingle method.
Word Shingling in Python
As before, I am only interested in shingles that start with what look like stop words (approximated as being words consisting of fewer than 4 characters).theString = "the quick brown fox jumps over the lazy dog. now is the time for all good men to come to the aid of the party"
shingleLength = 3
tokens = theString.split()
print [tokens[i:i+shingleLength] for i in range(len(tokens) - shingleLength + 1) if len(tokens[i]) < 4]
Enjoy!
No comments:
Post a Comment