Search String = This sentence have nothing to do with any other
Document = Who is the “He” in this sentence?
Score Calculation:
Step 1: Tokenize search string.Apply Stemming and remove stop words.
Token 1: "sentence"
Token 2: "nothing"
Step 2: For every search token obtained in Step 1, do steps 3-11:
Step 3: Take Sample Document and Remove Stop Words
Input Document: Who is the “He” in this sentence?
Document after stop word removal: "sentence"
Step 4: Apply Stemming
Document in Step 3: "sentence"
After Stemming : "sentence"
Step 5: Calculate data.count per search token
data.count(sentence)= 1
data.count(nothing)= 1
Step 6: Calculate total number of token in document
numTokens = 1
Step 7: Calculate coefficient per search token
coeff = •(0.5 * data.count / numTokens) + 0.5
coeff(sentence) =• 0.5*(1/1) + 0.5 = 1.0
coeff(nothing) =• 0.5*(1/1) + 0.5 = 1.0
Step 8: Calculate adjustment per search token (Adjustment is 1 by default. If the search text match exactly with the raw document only then adjustment = 1.1)
adjustment(sentence) = 1
adjustment(nothing) =• 1
Step 9: weight of field (1 is default weight)
weight = 1
Step 10: Calculate frequency of search token in document (data.freq)
For ever ith occurrence, the data frequency = 1/(2^i). All occurrences are summed.
a. Data.freq(sentence)= 1/(2^0) = 1
b. Data.freq(nothing)= 0
Step 11: Calculate score per search token per field:
score = (weight * data.freq * coeff * adjustment);
score(sentence) = (1 * 1 * 1.0 * 1.0) = 1.0
score(nothing) = (1 * 0 * 1.0 * 1.0) = 0
Step 12: Add individual score for every token of search string to get total score
Total score = score(sentence) + score(nothing) = 1.0 + 0.0 = 1.0