تحسين محرك البحث

تقديم الكتب المفضلة في نتائج البحث

يكون ذلك بعمل ما يشبه

from whoosh.scoring import BM25F
 
class MyWeighting(BM25F):
        use_final = True
 
        def final(searcher, docnum, score):
                return score + docnum * 10
 
s = myindex.searcher(weighting=MyWeighting)

توليد المواضيع ذات الصلة تلقائيا

الكلمات المهمة من متجه الحدود term vector

من هذا النقاش نجد أن الكود التالي يولد قائمة بأهم الكلمات في وثيقة ما

searcher = myindex.searcher()
 
# Get the document number for the forum topic
topic_docnum = searcher.document_number(id=topic_id)
 
# Get the iterator of key terms from the "content" field, or
# whatever you called the field containing the main text.
keyterms = searcher.key_terms([topic_docnum], "content", numterms=5)

الكلمات المهمة دون متجه الحدود

يمكن لمحرك البحث الجديد معرفة الكلمات المهمة من الوثيقة مباشرة دون الحاجة لحفظ فهرس أمامي (متجه الحدود) وذلك عبر الدالة Searcher.key_terms_from_text

كلفة متجه الحدود

تم تجربة عينة من الكتب حجمها 23 ميغابايت فكان حجم الفهرس (باستخدام ووش 1.0) دون متجه الحدود 24.7 ميغابايت ومع متجه الحدود 37.7 ميغابايت

في ووش 0.3.18 كان حجم الفهرس لنفس العينة 15.8 ميغابايت.

ذات الصلة من الكلمات المهمة

الآن نأخذ تلك الكلمات ونبحث عنها في وثائق أخرى داخل نفس الكتاب أو في كتب أخرى

# Create a query from the key terms
from whoosh import query
q = query.Or([query.Term("content", kt) for kt in keyterms])
 
# Get the results of the query
results = searcher.search(q, limit=10)
 
# Display the documents in the results, but skip the original topic
for i, fields in enumerate(results):
     if results.docnum(i) != topic_docnum:
         print fields

الاقتراحات أثناء البحث

بناء على جواب صاحب whoosh في هذا النقاش فإن يمكن معرفة عدد تطابق كلمة ما عبر

s = myindex.searcher()
print len(set(Term(fieldname, term).docs(s))

يمكن أيضا استخدام عملية Prefix وطريقة _words

wordprefix=u"كتب"
r=th.searchEngine.indexer.reader()
l=set(Prefix("content",wordprefix)._words(r))
for i in l: print len(set(Term("content", i).docs(s))),i