Skip to content

Generate Bigrams and Calculate their scores with the help of Lucene4IR

Notifications You must be signed in to change notification settings

ABDULAZIZALQATAN/BigramGenerator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Bigram Generator


Short Description :

  • Bigram Generator is a java class that is used to extract bigrams from a Lucene index and calculate their scores .
  • It outputs : (bigramID - bigram - term1(frequency) - term2(frequency) - bigram frequency - bigram score).
  • The score is calculated based on Mutual Information formula from the following book :
    https://www.cs.vassar.edu/~cs366/docs/Manning_Schuetze_StatisticalNLP.pdf
    Browser Page (206) - Paper Page (178)
  • It is developed as a part of Lucene4IR

Process Schema :

This is the workflow (Process schema) of the application :

Schema

  • Steps 1 and 2 ( Creating the index perferraly using AppIndexer from Lucene4IR )
  • Steps (3 and 4) show the work of the Generator

Usage :

in order to use the class , do the following :

  1. Create Index
  2. Place parameter XML File
  3. Fill parameters in XML File
  4. Run

Step1 Step2

About

Generate Bigrams and Calculate their scores with the help of Lucene4IR

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages