PHANTOM
🇮🇳 IN
Skip to content

added documentation#21

Open
Aashu-Adhikari wants to merge 3 commits intodorianbrown:masterfrom
Aashu-Adhikari:bm25okapi
Open

added documentation#21
Aashu-Adhikari wants to merge 3 commits intodorianbrown:masterfrom
Aashu-Adhikari:bm25okapi

Conversation

@Aashu-Adhikari
Copy link

added inline comments and docstrings to explain what the code is actually doing.

Copy link

@bhattbhuwan13 bhattbhuwan13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the suggested changes

if tokenizer:
corpus = self._tokenize_corpus(corpus)

nd = self._initialize(corpus)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is nd here? You should explain it

Comment on lines +38 to +40
Example:
corpus = [['ram', 'is', 'a', 'good', 'boy'], ['ram', 'does', 'cycling', 'and', 'racing'], ['ram', 'is', 'healthy'], ['rita', 'likes', 'shyam'], ['good', 'luck']]
nd = {'ram': 3, 'is': 2, 'a': 1, 'good': 2, 'boy': 1, 'does': 1, 'cycling': 1, 'and': 1, 'racing': 1, 'healthy': 1, 'rita': 1, 'likes': 1, 'shyam': 1, 'luck': 1}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shorten the examples so that I don't need to scroll. The functionality can also be explained only using 2 items in the list.

for document in corpus:
self.doc_len.append(len(document))
num_doc += len(document)
num_words += len(document) # total number of words in whole corpus

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function of variable num_words has already been explained.

frequencies = {}
term_frequencies = (
{}
) # term frequency of each word in a document........ changed frequencies to term_frequencies

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to comment that you changed the name of variable. git keeps track of it.

Comment on lines +53 to +54
if word not in term_frequencies:
term_frequencies[word] = 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block of code can be removed by using defaultdict instead of the normal dictionary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants