Detecting toxicity in text with TensorFlow.js
During my work on a recent side project, Rephraser AI, I discovered that the
GPT-3 API had a safety setting for content toxicity. If user input contained
toxic content, sometimes the API would ignore the prompt altogether and return
warning messages instead of rephrases. To communicate this to the users, I
developed this useToxicity
React hook for detecting toxic messages.
I will walk through each step of the process, including loading the toxicity model, creating the hook, and using it in a React component.
Set up the project
We will use create-react-app
for this demo.
npx create-react-app use-toxicity-demo
Next, navigate to the project directory and install the required dependencies:
cd use-toxicity-demonpm install @tensorflow/tfjs @tensorflow-models/toxicity
We will use @tensorflow/tfjs
for loading the model and
@tensorflow-models/toxicity
for detecting toxicity.
Create the hook
Now we can create the useToxicity
hook. This hook takes a string as an
argument and returns an object with two boolean values: loading
and isToxic
.
The loading
value indicates whether the toxicity detection process is still
running, while the isToxic
value indicates whether the message is toxic or
not.
import { useEffect, useState } from 'react'import '@tensorflow/tfjs'import * as toxicity from '@tensorflow-models/toxicity'const useToxicity = (text) => {const [loading, setLoading] = useState(true)const [isToxic, setIsToxic] = useState(false)const threshold = 0.9useEffect(() => {const checkToxicity = async () => {setLoading(true)const model = await toxicity.load(threshold, [])const predictions = await model.classify(text)const toxicPredictions = predictions.filter((p) => p.results[0].match)setIsToxic(toxicPredictions.length > 0)setLoading(false)}checkToxicity()}, [text])return { loading, isToxic }}export default useToxicity
To specify which categories of toxicity you want to classify your text against,
you can pass a list of labels as the second argument for toxicity.load()
.
However, if you only need a simple binary classification of whether a message is
toxic or not, you can omit the labels and pass an empty array, as done in this
example.
Additionally, the model.classify()
method returns an array of predictions,
each of which is an object with the following structure:
{label: 'identity_attack',results: [{match: true,probabilities: [0.97, 0.01, 0.02, 0.01, 0.01, 0.01, 0.01, 0.01],}]}
This information can be used for more specific toxicity detection if needed.
Example usage of the hook
The code below is a simple example of using the useToxicity()
hook in a React
component.
import useToxicity from './useToxicity'const TextInput = () => {const [inputText, setInputText] = useState('')const [textToCheck, setTextToCheck] = useState('')const { loading, isToxic } = useToxicity(textToCheck)const handleButtonClick = (e) => {e.preventDefault()setTextToCheck(inputText)}return (<div><label htmlFor="text-input">Enter your message:</label><inputtype="text"id="text-input"value={inputText}onChange={(e) => setInputText(e.target.value)}/><button onClick={handleButtonClick}>Check toxicity</button>{loading && <div>Loading...</div>}{!loading && !isToxic && <div>Your message is clean!</div>}{!loading && isToxic && <div>Warning: Your message is toxic!</div>}</div>)}export default TextInput
To avoid running toxicity detection on every keystroke, a separate textToCheck
state variable is used for toxicity checking. This is because the toxicity
detection process can take some time.