Analyzing Large-Scale User Feedback with Azure Language Studio

TL;DR

Psychologists, particularly those working in industrial domains, should embrace the power of big data and transition from traditional tools to more advanced technologies like SQL and AI.
With tools like SQL and AI, analyzing user feedback from online platforms has become more feasible than ever before.

In psychological research, it's common to gather customer feedback on products through surveys or interviews. You can gather hundreds or even thousands of responses and utilize traditional psychometrics to further analyze these responses. These small-sample techniques have been well-established in the field for years, and psychologists are still using such practices to conduct research. However, what happens when you want to analyze hundreds of thousands of user comments? Traditional methods may not be feasible for such a large volume of data.

In real-world scenarios, product designers often seek to gain insights from user feedback to improve their products. By analyzing a large volume of user comments, designers can identify common issues, preferences, and suggestions. This valuable information can then be used to guide product enhancements, bug fixes, and the development of new features.

First and foremost, personal laptops and traditional software like SPSS are not designed to handle massive datasets. These tools are limited by their processing power and memory, making it impractical to analyze hundreds of thousands of user comments efficiently.

Let's take the SFU Opinion and Comments Corpus (SOCC) as an example. This corpus contains more than 300,000 user comments. Attempting to analyze this dataset using SPSS or jamovi would be a daunting and time-consuming task. These software packages are simply not optimized for processing such a large volume of data.

To handle a large dataset like the SOCC, we can utilize MySQL, a powerful relational database management system. MySQL is capable of efficiently storing and querying massive amounts of data, making it suitable for analyzing hundreds of thousands of user comments. To get started, you can download MySQL 8.0 and MySQL Workbench, which provides a user-friendly interface for managing your MySQL databases. Once installed, you can import the SOCC dataset into MySQL, creating a structured database that allows for quick and efficient querying and analysis.

The data I imported was located in a database called "comments", and the data table is "new_table".

I used below command to take a look at the first 1000 rows of the dataset:

SELECT * FROM comments.new_table;

Now comes the difficult part. As the volume of user comments grows, analyzing them becomes increasingly challenging. At a minimum, we would like to determine whether the comments are positive or negative, indicating whether our customers are satisfied with our product or not. In the past, researchers might have relied on research assistants to manually code these comments (would we? I know some people still prefer the old-fashion way, but come on!). While some people still prefer this approach, it's simply not feasible when dealing with large datasets. Manually coding hundreds of thousands of comments would be incredibly time-consuming and resource-intensive. It's time to embrace more efficient and AI methods.

When faced with the challenge of analyzing a vast number of user comments, my initial thought was to leverage the power of LLMs like GPT-4, GPT-3.5, or open-source models like Mistral. These models have been trained on massive amounts of text data and can be fine-tuned for specific tasks, such as sentiment analysis, and should be capable of coding the comments with no problem.

I plan to ask for a Python script. This script will establish a connection to the MySQL database, retrieve the comments, and pass them to the chosen LLM for sentiment analysis. One thing that still bugs me is crafting the prompt for the LLM. It's crucial to phrase the prompt in a way that clearly conveys the task at hand, enabling the LLM to accurately distinguish between positive and negative sentiments. The prompt should provide sufficient context and guidance for the model to perform sentiment analysis effectively.

It turns out that Microsoft has already addressed this problem by offering a solution in Azure! Perhaps I'm too stupid to keep pace with the rapid advancements in AI, but it appears that many of the challenges I've been facing already have existing solutions. Azure also offers python scripts!

Let's do this.

The tool is named "Analyze sentiment and opinions" in Azure language studio.

The whole function only takes two lines of code! Much better than hand-craft LLM grader.

Simply load it and use it:

from azure.ai.textanalytics import TextAnalyticsClient

For example:

def analyze_sentiment(comment):

result = text_analytics_client.analyze_sentiment(comment, show_opinion_mining=True)

doc_result = [doc for doc in result if not doc.is_error]

return doc_result[0].sentiment

The function above will return a mark to each of the comments. It includes positive, negative, mixed, or neutral.

And, I further asked the code to return the results directly back to MySQL in a new column (variable) called "results".

for row in records:

   sentiment = analyze_sentiment([row[1]]) # Assume row[1] is the comment

   update_query = """UPDATE new_table SET result = %s WHERE id = %s"""

   cursor.execute(update_query, (sentiment, row[0]))

Let's take a look at the results:

Amazing, isn't it? Now we can proceed with further analyses if we want, for example, what kinds of articles are rated with more negatives, or what factors are associated with more negatives separately by comment authors, article id, or whatever categories we have.

Conclusion

For the purpose of this demonstration, I limited the analysis to the first 100 comments. Remarkably, the entire process took only about 10 seconds to complete, incredible speed.

With such a powerful and user-friendly tool at our disposal, we can potentially bypass the need for fine-tuning LLMs ourselves. However, if our goal is to create a free and open-source alternative or to adapt this text classification functionality for other tasks like grading essays, we may still need to invest time in fine-tuning our own models. Azure also offers a customizable classification tool. But if I want to do it myself, I would prefer using a open-source model.

One exciting possibility is to integrate this sentiment analysis feature directly into the backend of the platform. By doing so, we can ensure that all user comments are automatically analyzed and graded in real-time as soon as they are submitted. This would provide us with instant insights into customer sentiment, enabling us to quickly identify and address any issues or concerns.

The availability of such advanced tools in Azure highlights the rapid progress being made in the field of AI and the increasing accessibility of powerful analytics capabilities. By leveraging these tools, we can greatly enhance our ability to process and derive meaningful insights from large volumes of text data, ultimately leading to better-informed decision-making and improved customer experiences.

In short, we should absolutely use this for analyzing customer feedback!

Random Talk Theory

Analyzing Large-Scale User Feedback with Azure Language Studio

Comments

Post a Comment

Popular posts from this blog

What is Tang Ping? Investigating the Social Media Phenomenon Through Natural Language Processing

Using LLMs on Your Phone Locally in Three Easy Steps

Shifting Preferences of Mainland Chinese Tourists' Interests from Luxury to Budget Experiences in Hong Kong - Data From Little Redbook