Popularity

5.3

Stable

Activity

0.0

Stable

Stars 64

Watchers 5

Forks 3

Last Commit about 7 years ago

Monthly Downloads: 8

Programming language: Elixir

License: MIT License

Tags: Natural Language Processing (NLP)

gibran alternatives and similar packages

Based on the "Natural Language Processing (NLP)" category.
Alternatively, view gibran alternatives based on common mentions on social networks and blogs.

Paasaa

6.5 5.5 gibran VS Paasaa

🔤 Natural language detection for Elixir
Petrovich

5.3 1.8 gibran VS Petrovich

Elixir library to inflect Russian first, last, and middle names.

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

Woolly

5.1 0.0 gibran VS Woolly

The Text Mining Elixir
Tongue

3.1 0.0 gibran VS Tongue

Elixir port of Nakatani Shuyo's natural language detector
Cognixir

2.1 0.0 gibran VS Cognixir

Cognitive Services for Elixir

Do you think we are missing an alternative of gibran or a related project?

Add another 'Natural Language Processing (NLP)' Package

Popular Comparisons

README

Gibran

Yesterday is but today's memory, and tomorrow is today's dream.

Gibran

Gibran is an Elixir natural language processor. Lofty goals for Gibran include:

Metaphone phonetic coding system
Soundex algorithm
Porter Stemming algorithm
String similarity as described by Simon White

Currently, Gibran ships with the following features:

Token count, unique token count, and character count
Average characters per token
HashDicts of tokens and their frequencies, lengths, and densities
The longest token(s) and its length
The most frequent token(s) and its frequency
Unique tokens
Levenshtein distance algorithm

Usage

Let's start with something simple.

alias Gibran.Tokeniser
alias Gibran.Counter

str = "Yesterday is but today's memory, and tomorrow is today's dream."
Tokeniser.tokenise(str)
# => ["yesterday", "is", "but", "today's", "memory", "and", "tomorrow", "is", "today's", "dream"]

Tokeniser.tokenise(str) |> Counter.uniq_token_count
# => 8

By default Gibran uses the following regular expression to tokenise strings: ~r/[^\p{L}'-]/u. You can provide your own regular expression through the pattern option. You can combine pattern with exclude to create sophisticated tokenisation strategies.

Tokeniser.tokenise(string, exclude: &String.length(&1) < 4) |> Counter.token_count
# => 6

The exclude option accepts a string, a function, a regular expression, or a list combining any one or more of those types.

# Using `exclude` with a function.
Tokeniser.tokenise("Kingdom of the Imagination", exclude: &(String.length(&1) < 10))
["imagination"]

# Using `exclude` with a regular expression.
Tokeniser.tokenise("Sand and Foam", exclude: ~r/and/)
["foam"]

# Using `exclude` with a string.
Tokeniser.tokenise("Eye of The Prophet", exclude: "eye of")
["the", "prophet"]

# Using `exclude` with a list of a combination of types.
Tokeniser.tokenise("Eye of The Prophet", exclude: ["eye", &(String.ends_with?(&1, "he")), ~r/of/])
["prophet"]

Gibran provides a shortcut for working with strings directly (instead of running them through the tokeniser first).

Gibran.from_string(str, :token_count, opts: [exclude: &String.length(&1) < 4])
# => 6

To avoid inconsistencies that arise from character-casing, Gibran normalises input before applying transformations.

Levenshtein distance

Ordinary use:

iex(1)> Gibran.Levenshtein.distance("kitten", "sitting")
3

The Levenshtein distance for the same string is 0.

iex(2)> Gibran.Levenshtein.distance("snail", "snail")
0

The Levenshtein distance is case-sensitive.

iex(3)> Gibran.Levenshtein.distance("HOUSEBOAT", "houseboat")
9

The function can accept charlists as well as strings.

 iex(4)> Gibran.Levenshtein.distance('jogging', 'logger')
 4

The doctests contain extensive usage examples. Please take a look there for more details.

gibran

Gibran is an Elixir natural language processor, and a port of WordsCounted.

gibran alternatives and similar packages

Paasaa

Petrovich

InfluxDB - Power Real-Time Data Analytics at Scale

Woolly

Tongue

Cognixir

Popular Comparisons

README

Gibran

Usage

Levenshtein distance