Popularity

6.7

Growing

Activity

4.7

Stars 105

Watchers 4

Forks 20

Last Commit 3 months ago

Description

Codepagex is an elixir library to convert between string encodings to and from utf-8. Like iconv, but written in pure Elixir.

All the encodings are fetched from unicode.org tables and conversion functions are generated from these at compile time.

Monthly Downloads: 9,534

Programming language: Elixir

License: Apache License 2.0

Tags: Translations And Internationalizations Codepage Iconv

Codepagex alternatives and similar packages

Based on the "Translations and Internationalizations" category.
Alternatively, view Codepagex alternatives based on common mentions on social networks and blogs.

gettext

8.8 6.5 Codepagex VS gettext

Internationalization and localization support for Elixir.
Ex_Cldr

8.6 6.7 Codepagex VS Ex_Cldr

Elixir implementation of CLDR/ICU

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

linguist

7.9 6.5 Codepagex VS linguist

Elixir Internationalization library
trans

7.6 1.8 Codepagex VS trans

Embedded translations for Elixir
ecto_gettext

2.9 0.2 Codepagex VS ecto_gettext

Library for localization Ecto validation errors with using Gettext.
getatrex

2.7 2.7 Codepagex VS getatrex

Gettext Automatic Translator in Elixir
exkanji

1.7 0.0 Codepagex VS exkanji

A Elixir library for translating between hiragana, katakana, romaji, kanji and sound. It uses Mecab.
exromaji

1.4 0.0 Codepagex VS exromaji

A Elixir library for translating between hiragana, katakana, romaji and sound.
parabaikElixirConverter

1.0 0.0 Codepagex VS parabaikElixirConverter

ParabaikElixirConverter is just a Elixir version of Parabaik converter
free PO editor

- Codepagex VS free PO editor

A tool for translating PO files.

Do you think we are missing an alternative of Codepagex or a related project?

Add another 'Translations and Internationalizations' Package

Popular Comparisons

README

Codepagex

Codepagex is an elixir library to convert between string encodings to and from utf-8. Like iconv, but written in pure Elixir.

All the encodings are fetched from unicode.org tables and conversion functions are generated from these at compile time.

Note on the unicode built in module

Note that the Erlang built in :unicode module has some provisions for converting between utf-8 and latin1 code sets. If that is all you need, you should consider not using codepagex but rather rely on this simpler alternative.

Compared to this functionality codepagex provides:

More codepage mapping options
The ability to handle illegal encoding with custom logic
A simpler interface

But please remember that codepagex is comparatively a lot more complex, making extensive use of macro programming.

Examples

The package is assumed to be interfaced using only the Codepagex module.

    iex> from_string("æøåÆØÅ", :iso_8859_1)
    {:ok, <<230, 248, 229, 198, 216, 197>>}

    iex> to_string(<<230, 248, 229, 198, 216, 197>>, :iso_8859_1)
    {:ok, "æøåÆØÅ"}

    iex> from_string!("æøåÆØÅ", :iso_8859_1)
    <<230, 248, 229, 198, 216, 197>>

    iex> to_string!(<<230, 248, 229, 198, 216, 197>>, :iso_8859_1)
    "æøåÆØÅ"

When there are invalid byte sequences in a String or encoded binary, the functions will not succeed. If you still want to handle these strings, you may specify a function to handle these circumstances. Eg:

    iex> from_string("Hello æøå!", :ascii, replace_nonexistent("_"))
    {:ok, "Hello ___!", 3}

    iex> iso = "Hello æøå!" |> from_string!(:iso_8859_1)
    iex> to_string!(iso, :ascii, use_utf_replacement())
    "Hello ���!"

Encodings

A full list of encodings is found by running encoding_list/1.

The encodings are best supplied as an atom, or else the string is converted to atom for you (but with a somewhat less efficient function lookup). Eg:

    iex> from_string("æøå", "ISO8859/8859-9")
    {:ok, <<230, 248, 229>>}

    iex> from_string("æøå", :"ISO8859/8859-9")
    {:ok, <<230, 248, 229>>}

For some encodings, an alias is set up for easier dispatch. The list of aliases is found by running aliases/1. The code looks like:

    iex> from_string!("Hello æøåÆØÅ!", :iso_8859_1)
    <<72, 101, 108, 108, 111, 32, 230, 248, 229, 198, 216, 197, 33>>

Encoding selection

By default all ISO-8859 encodings and ASCII is included. There are a few more available, and these must be specified in the config/config.exs file. The specified files are then compiled. Adding many encodings may affect compilation times, in particular for the largest ones.

To specify the encodings to use, add the following lines to your config/config.exs and recompile:

    use Mix.Config
    config :codepagex, :encodings, [:ascii]

This will add only the ASCII encoding, as specified by it's shorthand alias. Any number of encodings may be specified like this in the list. The list may contain strings, atoms or regular expressions that match either an alias or a full encoding name, eg:

    use Mix.Config
    config :codepagex, :encodings, [
      :ascii,           # by alias name
      ~r[iso8859]i,     # by a regex matching the full name
      "ETSI/GSM0338",   # by the full name as a string
      :"MISC/CP856"     # by a full name as an atom
    ]

After modifying the encodings list in the configuration, always make sure to run the following or the encodings you specified will not be compiled in:

mix deps.compile codepagex --force

This is necessary due to the fact that Codepagex's configuration changes are not picked up automatically when it's a dependency in another project. Credit for the find goes to @michalmuskala here: https://elixirforum.com/t/sharing-with-the-community-text-transcoding-libraries/17962/2

The encodings that are known to require very long compile times are:

VENDORS/MISC/KPS9566
VENDORS/MICSFT/WINDOWS/CP932
VENDORS/MICSFT/WINDOWS/CP936
VENDORS/MICSFT/WINDOWS/CP949
VENDORS/MICSFT/WINDOWS/CP950

TODO

A few encodings are not yet supported for different reasons. In particular the asian and arab ones with left-right and up-down variations.
Test Elixir function specs
Benchmarking vs iconv native libraries
Support for iolists
when converting sections of a string that are unchanged, return the original input. Consider using iolists to return the values so that chunks may be saved continuously
lazy converter to get n characters / codepoints
function to drop n characters and take n characters (and slice?)