No Examples Found
Text Moderation (Alpha Release)
Sieve's text moderation provides a robust solution for moderating text in real-time, designed to identify harmful, inappropriate, or unwanted content across various categories such as bullying, hate speech, sexual content, and more. The app is highly customizable and supports custom words, word indices for filtering or censoring content, and additional classes giving developers the ability to adapt moderation to their specific needs.
Note: This is an experimental Alpha release. Features may change, and some functionality might not yet be fully stable.
Key Features
- Advanced Moderation: Leverages a combination of AI and algorithmic approaches to detect harmful content across multiple categories such as bullying, sexual exploitation, and violence.
- Additional Classes: Enables developers to choose specific content categories for moderation beyond the default classes. By selecting from the additional classes list, users can tailor the moderation to their platform's needs. For instance, if a developer wants to moderate political discussions, they can easily add and moderate that category, providing full control over the moderation process.
- Filters: Enables filtering capabilities for detecting and handling sensitive information such as phone numbers and addresses. Developers can select which filters to apply, and the app will return the precise location (start and end index) of the detected items within the text. This feature allows for targeted content management and protection of personal information. More details on filter usage and output can be found in the Filters section.
- Custom Words: Enables developers to provide a list of community specific words to be filtered. More information on custom words can be found here.
- Contextual Understanding: Understands the intent behind messages, even when they aren't explicitly sensitive. It understands context, such as combinations of emojis or subtle language cues that may imply inappropriate content, like sexually suggestive messages, ensuring a more nuanced approach to moderation.
- High Performance: Designed for speed and scalability, the app can efficiently handle millions of messages in real-time, ensuring fast moderation without compromising accuracy.
Future Work
- CSV/Text File support Support for csv and text files is coming soon.
Pricing
Number of Characters | Price |
---|---|
1 million characters | $0.5 |
Default Moderation Classes
By default, the app moderates messages for the following moderation classes:
Safety Flag | Label | Description |
---|---|---|
Sexual | S | Classify Text for sexually explicit or suggestive content |
Violence | V | Flag Content containing extreme threats of violence |
Bullying | B | Identify bullying or abusive language in real-time |
Hate | H | Detect hate speech with high levels of granularity |
Spam | SP | Mark language designed to take you to a different platform as spam |
Drugs | D | Flag text that discusses or promotes the sale, possession, or usage of drugs |
Child Exploitation | CE | Identify content that mentions or explicitly alludes to child sexual exploitation |
Child Safety | CS | Detect threats of physical violence targeted at children in a school or school-related setting |
Gibberish | G | Mark keyboard spam and phrases or words that are completely incomprehensible as gibberish |
Phone Numbers | PN | Detect phone numbers in message strings, including international formats |
Promotions | PR | Identify promotional content that redirects to another platform or requests an action such as reposting, donating, etc. |
Weapons | W | Content that mentions knives, guns, personal weapons, and accessories such as ammunition, holsters, etc. |
Scoring
The scoring factor indicates the severity of the message. Some classes have multiple scores (0, 1, 2, 3), while others, such as spam, are binary (0, 3).
Scoring Type | Class Names | Valid Scores |
---|---|---|
Non-Binary | Sexual, Hate, Violence, Bullying, Drugs, Weapons | 0, 1, 2, 3 |
Binary | Custom Words, Child Exploitation, Child Safety, Self Harm, Gibberish, Spam, Promotions, Redirection, Phone Numbers | 0, 3 |
Note: Non-binary classes are scored from 0 to 3, with higher scores indicating more severe content. Binary classes are scored as 0 (no violation) or 3 (violation detected).
Examples
The app flags messages based on the detected content, if a message may contain multiple moderation classes, it returns each of these classes with a severity score:
[
{
"classes": [
{
"class": "bullying",
"score": 2
}
]
},
{
"classes": [
{
"class": "bullying",
"score": 2
},
{
"class": "sexual_exploitation",
"score": 3
}
]
}
]
Additional Classes
If a developer wants to moderate more classes in addition to the default classes, they can use the following additional classes. To include any of the additional classes to be a part of moderation results, simply pass a list of safety flag or labels from the additional classes to the additional_classes
parameter.
Safety Flag | Labels | Description |
---|---|---|
Death, Harm & Tragedy | DHT | Human deaths, tragedies, accidents, disasters, and self-harm. |
Public Safety | PS | Services and organizations that provide relief and ensure public safety. |
Health | HL | Human health, including: Health conditions, diseases, disorders medical therapies, medication, vaccination, medical practices, and resources for healing, including support groups. |
Religion & Belief | RB | Belief systems that deal with the possibility of supernatural laws and beings: religion, faith, belief, spiritual practice, churches, and places of worship. Includes astrology and the occult. |
War & Conflict | WC | War, military conflicts, and major physical conflicts involving large numbers of people. Includes discussion of military services, even if not directly related to a war or conflict. |
Finance | F | Consumer and business financial services, such as banking, loans, credit, investing, and insurance. |
Politics | P | Political news and media; discussions of social, governmental, and public policy. |
Legal | L | Law-related content, including law firms, legal information, primary legal materials, paralegal services, legal publications and technology, expert witnesses, litigation consultants, and other legal service providers. |
Note: Please note that any safety flag or label that may not be part of the table when added to the additional_classes
list will result in an exception.
Scoring
Scoring Type | Class Names | Valid Scores |
---|---|---|
Binary | Death, Harm & Tragedy, Weapons, Public Safety, Health, Religion & Belief, War & Conflict, Finance, Politics, Legal | 0, 3 |
Non-Binary | - | - |
Note: Each class is scored based on the severity of the content. For binary classes, scores are either 0 (no mention) or 3 (explicit mention).
Notes
Filters
The filters
parameter enables granular content filtering by returning the start and end index of detected words or patterns. This feature allows developers to precisely identify and manage potentially harmful or sensitive content. Available filter options include:
- None: No filtering applied
- all: Apply all available filters
- profanity: Detect profane language
- phone-numbers: Identify phone number patterns
- phone-numbers-and-addresses: Detect both phone numbers and address formats
Developers can choose one of these options to tailor the filtering process to their specific needs.
The filter functionality is robust enough to detect obfuscated words, such as "f*ck" or "@ss", ensuring that attempts to bypass the filter are still caught.
Example usage:
{
"filters": [
{
"value": "f*ck",
"class": "profanity",
"start_index": 0,
"end_index": 4
},
]
}
Custom Words
To filter any community specific words, provide the list of words to the custom_words
parameter. If the words were found they are returned under the filters
key with type custom
. Please note that custom_words
parameter only works if filters
are enabled.
{
"filters": [
{
"value": "L-rizz",
"type": "custom",
"start_index": 42,
"end_index": 49
},
]
}