How NLP is being used to keep Youtube comments safe

NLP has a lot of tools that help AI to understand natural language, and these tools are now being used to help keep YouTube comments safe. Using sentiment analysis of the comments section on YouTube, organizations can understand how the community accepts and understands their channel and videos. NLP combined with machine learning can then work together to create a safe comment space.
Post Header Image
Ananya Avasthi
PUBLISHED ON
March 24, 2022
PUBLISHED ON
March 24, 2022
October 15, 2021
Post Detail Image

AI is used in a lot of ways. It is used in our phones, laptops, websites we visit: AI is everywhere. Natural language processing (NLP) is an offshoot of AI that helps computers understand the human language. With the help of NLP and sentiment analysis, YouTube keeps comments sections safer.

What is Sentiment Analysis?

Sentiment analysis is the use of natural language processing, text analysis, and more to systematically identify, extract, quantify, and study affective states and subjective information. For instance, the Grammarly extension is used all around the globe to check grammar and tone for working documents. This is achieved with sentiment analysis. 


Sentiment analysis helps organizations manage customer reviews on different online platforms. Using sentiment analysis of comments, the organization can understand their user's views on their products, values, and more. Organizations can also use it to analyze the trending videos, depending on their views, likes, comments, categories, etc.


NLP Tools for Comments Online

You don't have to look far to find a slew of offensive and negative comments online. (Read this article to learn how NLP is being used to mitigate negative comments in online gaming.) This can be toxic and destructive, and it helps to filter out things that foster negativity and hate. Google—and YouTube—introduced "Held Comments" as a way to filter comments on YouTube. It combines sentiment analysis, data labeling, data processing, etc. to filter spam comments.


Held Comments

Held Comments has become the default for YouTube comments. It flags comments and provides that data to the creator. This allows the creator to approve, hide, or report comments as needed. It uses data labeling combined with machine learning (ML) to create an algorithm of appropriateness, which means it automatically flags comments the system finds unacceptable.


The algorithm is still a work in progress, since the kind of comments that need to be filtered or marked as spam is dependent on the creator. There is also an option to opt out of the held comments option. This is available because, for big channels, it could become a humongous task to filter each comment.


Sentiment Classification

Sentiment classification is the process of picking out opinions in a text and labeling them as positive, negative, or neutral, depending on the emotions expressed daily within them. While some NLP models are more emotionally intelligent than others, sentiment classification uses these algorithms for filtering comments:

Rule-Based Systems

Rule-based systems rely on a list of words (or lexicon) and divide it into two categories: positive and negatives. Positive terms might include words like "good" or "insightful" while negative terms might include words like "bad" or "frustrated". This type of algorithm creates a series of hand-crafted rules to initiate a pattern for each tag, though it comes with a fair amount of limitations. For example, it simply doesn't recognize words that aren't labeled in the lexicon. So the system removes words from their context units, making it unlikely to detect things like polysemy, sarcasm, and irony. 

Automated Systems (Based on Machine Learning)

In the training process, using machine learning the model transforms text data into vectors (a group of numbers with encoded information) and uses a pattern to identify each vector with one of the pre-defined tags (Positive, Negative, Neutral). After using large datasets to make their predictions to classify unseen data. To improve efficiency, users can provide the algorithm with more tagged examples.

Hybrid Systems

Hybrid systems are a combination of both rule-based and machine learning-based systems. First, this model learns to identify sentiment from a ton of tagged examples. Afterward, it compares the results with a lexicon to improve accuracy. This system is used to obtain the best outcome, with no downside of the other system limitations.


YouTube combines NLP with machine learning to cater to the needs of the creators. This is done to remove hateful and offensive comments that could spread toxicity. With the internet being accessible to everyone, it is impossible to cater to every person’s needs, so some level of hateful comments are inevitable and must be managed. To avoid seeing offensive comments, NLP is used to create a safe space in the YouTube community. Contact us if you'd like to learn more about how NLP can help your organization.