Indic Ngram Library

What is Ngram?

An n-gram model is a type of probabilistic model for predicting the next item in a sequence. n-grams are used in various areas of statistical natural language processing and genetic sequence analysis. An n-gram is a subsequence of n items from a given sequence. The items in question can be phonemes, syllables, letters, words or base pairs according to the application. An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram" (or, less commonly, a "digram"); size 3 is a "trigram"; and size 4 or more is simply called an "n-gram".

If you want to use this library in your program , you may refer the JSON-RPC based API documentation.

Read more about N-gram

Supported Languages

English, Hindi, Malayalam, Kannada, Bengali

Enter the text for getting the n-gram below. For Word Ngram type enter a sentence. Language of each word will be detected. You can give the text in any language and even with mixed language.

N-Gram type N :


Python ngram API

This service provides indic ngram libraries
  • Method: modules.Ngram.wordNgram
    • arg1 : the sentence
    • n : n of n-gram (Optional)
    • Return : The ngram for the sentence
  • Method: modules.Ngram.letterNgram
    • arg1 : the word
    • n : n of n-gram (Optional)
    • Return : The ngram for the word
  • Method: modules.Ngram.syllableNgram
    • arg1 : the word
    • n : n of n-gram (Optional)
    • Return : The ngram for the word, the letters being splitted at syllable level