A Haskell library for text censorship, using DansGuardian Phraselists.
I converted those Phraselists to JSON. You can see the converted Phraselists here. There are compressed versions for use in your code.
Eros is still in development, and is not ready to be actually used. If you would like to contribute, please do.
You can try the API documentation on Hackage if you want to learn how to use the library. Hackage isn't terribly reliable at successfully building the documentation, so I also publish the documentation on GitHub pages
This is a usage guide for version 0.5.2.0. There will be more up-to-date usage guides as more versions come, hopefully.
To install, add eros >=0.5 && <0.6 to the build-depends field in your
library's .cabal file
You can get all the functions, simply by importing Text.Eros.
Hackage seems to be unable to build the API documentation for Eros, but it won't hurt to check eros on Hackage. If that doesn't work, I publish the documentation here.
The basic idea is you take a Message type, and check it against a PhraseMap,
using messageScore. Message is actually just a type alias for Tl.Text, so
just enable the OverloadedStrings extension, and pretend you're using normal
strings.
In GHCi,
:set -XOverloadedStrings
import Text.Eros
In a file,
{-# LANGUAGE OverloadedStrings #-}
import Text.Eros
A PhraseMap is just a Phraselist marshaled into the more Haskell-friendly
Ms.Map type.
Eros provides a large number of Phraselists.
data ErosList = Chat
| Conspiracy
| DrugAdvocacy
| Forums
| Gambling
| Games
| Gore
| IdTheft
| IllegalDrugs
| Intolerance
| LegalDrugs
| Malware
| Music
| News
| Nudism
| Peer2Peer
| Personals
| Pornography
| Proxies
| SecretSocieties
| SelfLabeling
| Sport
| Translation
| UpstreamFilter
| Violence
| WarezHacking
| Weapons
| Webmail
deriving (Eq)
The easiest way to marshal a Phraselist into a PhraseMap is to use the
readPhraseMap function.
readPhraseMap :: Phraselist t => t -> IO PhraseMap
Use it like this
pornMap <- readPhraseMap Pornography
30
Internally, readPhraseMap reads JSON data containing the Phraselist,
marshals it into a list of PhraseAlmostTrees, converts those into a
PhraseForsest, and then into a PhraseMap.
You can obviously use mkMap and readPhraselist to do it yourself, but it's a
lot easier to just use readPhraseMap.
You can then use messageScore to see the Score (actually an Int) of each
message.
messageScore "Go fuck yourself." pornMap
messageScore is not case sensitive, so go fUck YoUrself returns the same
score as go fuck yourself, and so on.
If you want to use multiple eros lists, do something like this
let myLists = [Chat, Pornography, Weapons]
myMaps <- mapM readPhraseMap myLists
map (messageScore "Go fuck yourself") myMaps
[0, 30, 0]
I haven't added good support in for this yet, but there still is support nonetheless. Your phraselist needs to be in JSON, in accordance with the Phraselist schema (I'm too lazy to find a link to it).
data MyList = MyList
instance Phraselist MyList where
phraselistPath MyList = "/path/to/phraselist"
You can then do the normal stuff with messageScore and readPhraseMap.
I would love if people would contribute. QuickCheck tests are desperately needed.
As far as functionality goes, this library is pretty cut & dry. I already added all of the features I envisioned.
Eros is pretty heavy development, so the versions change quickly. I follow the
Hackage standard of major.minor.even-more-minor.trivial, where major and
minor entail API-breaking changes.
In the interest of not confusing myself, I keep Eros and the Eros Client on the
same major.minor version. So, a bump in the major.minor number doesn't
necessarily mean that there's an API-breaking change.
The best way to contact me is via IRC. I hang out on #archlinux and #haskell
on FreeNode. My handles are l0cust and isomorpheous.