Using LLM workflow to evaluate documents
With generative AI widely available, its productivity potential has attracted significant attention, especially among students and teachers. Given the prevalence of writing-based tasks in educational settings, the ability of large language models (LLMs) to automate aspects of writing processes has sparked a strong interest.
With the rapid development of workflow-based approaches in generative AI, simply "dumping everything" into LLMs and expecting optimal results is becoming increasingly unwise. The idea of workflow, compared to traditional zero-shot prompting approach (e.g., casual usage of Chatbot), offers many distinct advantages:
Improved Accuracy: Workflows can break down complex tasks into manageable steps, allowing LLMs to focus on specific aspects and produce more accurate results.
Flexibility in Model Selection: Each step of the workflow can be customised using different LLMs, such as free models (llama 3) or proprietary models (GPT4o), or a combination of both. This is largely used in industry because they have many models out there specialised in one type of tasks!
Enhanced Control: By defining a clear sequence of operations, workflows provide users with greater control over the AI's output, reducing the risk of unexpected or undesirable outcomes.
Increased Efficiency: Workflows can automate repetitive tasks and optimize the overall process, leading to significant time and resource savings.
Greater Flexibility: Workflows can be easily adapted and customized to suit specific needs and requirements, making them more versatile than one-size-fits-all zero-shot prompting.
Managed costs: Yes! We can choose free models, running them on our own laptops. Or, if we are rich, we can use the best model out there.
Indeed, the effective utilization of workflows necessitates a certain level of understanding and preparation. The process of breaking down tasks into discrete steps requires a structured mindset and a conceptual framework of the task at hand. Users need to visualize the overall objective, identify the key components, and determine the logical sequence of actions required to achieve the desired outcome. This ability to deconstruct and analyze complex tasks is essential for designing effective workflows and harnessing the full potential of generative AI.
Furthermore, the successful implementation of workflows often involves an iterative process of refinement and optimization. As users gain experience and feedback, they can identify areas for improvement and adjust the workflow accordingly.
For instance, consider a university student is tasked with writing an essay on intergroup relations between immigrants and local residents in their city. Simply instructing an AI chatbot to "write an essay about XXX" is unlikely to yield satisfactory results. In fact, many online complaints about AI's limitations in such tasks stem from this simplistic approach.
To utilize AI effectively, we should approach it as we would guide a fellow student. Imagine how you would approach this essay without AI. First, you'd thoroughly understand the assignment, then brainstorm and research, outline your ideas, write a draft, revise and edit, and finally, proofread. Similarly, we can "teach" the AI in a systematic manner, breaking down the task into manageable steps, providing clear instructions and feedback at each stage.
If we apply this to AI, it's not about one prompt, but a series of interactions. Below is just an example:
1. Define the topic:
"Please help me write an essay on [specific aspect of intergroup relations] between immigrants and local residents in Hong Kong. List some important aspects that need to discuss in the essay"
2. Brainstorm Ideas:
"List some key questions we should address in this essay"
If the LLM can access online content or database plugins:
"Find 10 relevant academic articles about [specific issue] in Hong Kong"
3. Outline:
"Based on the findings, create an essay outline" (You can then refine this by providing more relevant pieces)
4. Drafting:
"Write a paragraph on [specific topic from outline], incorporating [source you found]"
Repeat for each section, giving the AI guidance.
5. Revision & Editing:
"Review this paragraph for clarity and coherence, and provide some suggestions"
"Base on the suggestions, revise the paragraph"
By breaking it down, you're not just asking the AI to do the work, you're collaborating with it, guiding its output, and ultimately producing a higher-quality essay. Just like how humans.
Thanks to open-source communities, these steps can be seamlessly integrated into user-friendly applications, simplifying the workflow process. Theoretically, you can customize workflows with as many steps as needed to tailor them to your specific tasks.
As a beneficiary of these advancements, I'll demonstrate a compelling use case particularly relevant to educators: grading student assignments.
Now, let's design an assignment-grading workflow:
Workflow: Student Assignment Grading
1. Summarization & Relevance Check:
Input: Student's submitted assignment, original assignment prompt/rubric.
- Summarize the main points and arguments of the student's work.
- Compare the summary to the assignment requirements and topic.
- Assess if the essay fully addresses the prompt, partially addresses it, or is entirely off-topic.
Output:
- A concise summary of the student's work.
- A clear evaluation of topic relevance (on-topic, partially on-topic, off-topic).
2. Detailed Feedback:
Input: Student's essay, initial summary/relevance assessment.
- If on-topic, provide feedback on:
- Strength of arguments
- Quality of evidence and sources
- Clarity and organization of writing
- Grammar and mechanics
- If partially on-topic, identify areas of deviation and suggest improvements.
- If off-topic, provide constructive feedback on how to refocus the essay.
Output:
- Comprehensive feedback of the essay's strengths and weaknesses.
- Suggestions for improvement, if applicable.
3. Grading:
Input: assignment rubric, detailed feedback.
- Base on each criteria of the rubric provided, assign a numerical or letter grade based on the feedback.
Output:
- A suggested grade or grade range.
**Teacher Review & Finalization:**
(Human's work) Completed grading with personalized feedback for the student.
Now, let's implement the above workflow into Dify and execute it using a python script via API.
Dify is one of the best workflow-based Chatbot development tools, which is also open-source and feature-rich.
To run the entire process on my own computer with my preferred LLMs, I need to set up Dify using Docker Desktop. For detailed instructions, please refer to https://github.com/langgenius/dify
and https://www.docker.com/products/docker-desktop/
Once Dify is successfully deployed, I must select a model from a range of options. One possibility is to utilize open-source and free-of-charge models like Llama3 in conjunction with Ollama. However, if you're working on an older computer with limited RAM and CPU resources, it may be more practical to opt for a proprietary model from OpenAI, Google or Microsoft. In this demonstration, I will use Google Gemini as an example.
Refer to the screenshot below, which outlines my 4-step workflow for grading student essays using LLMs. In Step 1, I instruct the LLM to summarize the student's essay. Next, in Step 2, I ask the LLM to evaluate the essay's topic relevance based on the summary and topic. In Step 3, the LLM generates detailed feedback for the assignment using grading rubrics. Finally, in Step 4, it produces a numerical grade for the entire assignment.
You will notice that, in such case, we need 3 input variables: input_text (essay), topic, and evaluation rubric.
After setting up the workflow, we can simply click "Run" to conduct a test or "Publish" to access a formal user interface for running it. To fully automate the process, I will also utilize a Python script to iterate through all assignments in a folder and generate reports for each one. In this scenario, teachers will only need to click once and wait for a comprehensive report covering an entire class!
Once Dify is fully configured, we can request an API to execute our custom-designed workflow. This will enable us to utilize Python to batch-run this process for other tasks.
Just click "API Access" on the left-side panel in the workflow interface, we can easily copy the API key.
import requests
import json
url = "http://localhost/v1/workflows/run"
headers = {
'Authorization': 'Bearer API',
'Content-Type': 'application/json',
}
data = {
"inputs": {"input_text": text, "topic":topic, "rubric": rubric},
"response_mode": "blocking",
"user": "USERNAME"
}
response = requests.post(url, headers=headers, data=json.dumps(data))
json_data = response.json()
print(json.dumps(json_data, indent=2))
def import_txt(file_name):
with open(file_name) as f:
return [line.strip() for line in f]
text1 = import_txt('essay1.txt')
text = ''.join(text1)
rubrics1 = import_txt('rubrics.txt')
rubric = ''.join(rubrics1)
topic1 = import_txt('topic.txt')
topic = ''.join(topic1)
Finally, we can save the evaluation in a separate txt file (the name of the output in json object is defined in our Dify workflow):
finaltext = json_data['data']['outputs']['output']
with open("evaluation.gemini.essay1.txt", "w") as file:
file.write(finaltext)
With our custom-designed workflow now set up, we can simply run this script to obtain instant (almost) feedback on the essay. To take it a step further, we can rename the essay files with student ID numbers and automate the entire process to evaluate multiple files in a folder. For example:
def evaluate_files_in_folder(folder_path):
evaluations = []
for file_name in os.listdir(folder_path):
if file_name.endswith(".docx"):
file_path = os.path.join(folder_path, file_name)
content = read_docx(file_path)
data = {"inputs": {"input_text": content, "topic":topic, "rubric": rubric}, #define your workflow variables
"response_mode": "blocking",
"user": "USERNAME" #define your username
}
response = requests.post(url, headers=headers, data=json.dumps(data))
evaluation1 = response.json()
evaluation = evaluation1['data']['outputs']['output']
evaluations.append((file_name, evaluation))
print(f"Evaluated:{file_name}")
return evaluations
Dify also gives a summary of the tasks, including how many tokens are used.
For what is worth, it only takes 21 seconds to grade one assignment! If you have limited time but enough money, go for GPT4o.
Let's take a look at the detailed feedback generated by Gemini using a fake essay:
<h1> Grade </h1>
Here's a breakdown of the grades based on the rubric and the feedback provided:
**1. Structure and Organization:**
* **Grade: 5**
* The essay has a clear structure with an introduction, body paragraphs, and a conclusion.
* The flow is generally good, but the lack of focus on immigrant minorities weakens the overall coherence.
* The essay covers some key issues related to multiculturalism but doesn't delve deeply into the specific relationship between immigrant minorities and local residents.
**2. Clarity and Soundness of Argument:**
* **Grade: 6**
* The argument is clear and logical, demonstrating a good understanding of the concept of multiculturalism.
* The essay effectively presents both the potential benefits and challenges of multiculturalism.
* However, the lack of focus on the specific topic weakens the overall argument.
**3. Collection and Analysis of Literature and Information:**
* **Grade: 7**
* The student uses relevant and credible research to support their claims, citing several academic journals and studies on multiculturalism, prejudice reduction, and cultural competence.
* References are mostly correct and in APA-7 style.
**4. Quality of Writing:**
* **Grade: 8**
* The essay demonstrates superb use of language with appropriate word choice.
* There are no grammar or spelling mistakes.
**Overall Numerical Grade:**
Based on the individual scores, a fair overall numerical grade would be **6.5**. This reflects the strengths in research, writing quality, and overall argumentation, but also the weaknesses in focusing on the specific topic and exploring key psychological dynamics related to the relationship between immigrant minorities and local residents.
<h1> Topic Assessment </h1>
The student essay summary **partially addresses** the essay topic. While it touches on the broader concept of multiculturalism and its impact on social psychology, it doesn't fully delve into the specific relationship between immigrant minorities and local residents.
Here's a breakdown:
**Strengths:**
* **Relevant to the topic:** The summary mentions the challenges of integrating multiple cultures, including cultural misunderstandings, language barriers, and conflicting values. These are relevant to the relationship between immigrant minorities and local residents.
* **Uses research:** The summary cites relevant research on multiculturalism, prejudice reduction, and cultural competence.
* **Highlights potential issues:** It mentions the negative psychological outcomes faced by minorities in monocultural societies, which is a crucial aspect of the topic.
**Weaknesses:**
* **Lack of focus on immigrant minorities:** The summary doesn't specifically address the dynamics between immigrant minorities and local residents. It focuses on multiculturalism in general, which is a broader concept.
* **Missing key aspects:** The summary doesn't explore the specific psychological factors influencing the relationship between these groups, such as intergroup relations, prejudice, discrimination, or acculturation.
* **Limited scope:** The summary mainly focuses on the benefits and challenges of multiculturalism, without diving into the complexities of the relationship between immigrant minorities and local residents.
**Recommendations:**
To fully address the essay topic, the student should:
* **Focus on the specific relationship:** The essay should explicitly address the interactions between immigrant minorities and local residents, exploring the psychological dynamics at play.
* **Analyze research on intergroup relations:** The essay should delve into research on intergroup relations, prejudice, discrimination, and acculturation, focusing on how these concepts relate to the immigrant minority-local resident relationship.
* **Provide specific examples:** The essay should include examples of real-world situations and research findings that illustrate the complex relationship between these groups.
By incorporating these recommendations, the student can significantly improve the essay's focus and depth, ensuring it fully addresses the topic of multiculturalism and social psychology in the context of immigrant minorities and local residents.
<h1> Detailed Feedback </h1>
## Detailed Feedback on Student Essay
The student's essay summary provides a good overview of the concept of multiculturalism and its impact on social psychology. However, it falls short of fully addressing the specific relationship between immigrant minorities and local residents, which is the core focus of the assigned topic.
Here's a more detailed breakdown of the feedback, addressing the specific criteria:
**Strengths:**
* **Strong Argument:** The essay effectively presents the multifaceted nature of multiculturalism, highlighting both its potential benefits and challenges.
* **Quality of Evidence:** The student uses relevant and credible research to support their claims, citing several academic journals and studies on multiculturalism, prejudice reduction, and cultural competence.
* **Clarity and Organization:** The essay is well-organized and easy to follow, with a clear introduction, body paragraphs that develop the main points, and a concise conclusion.
* **Grammar and Mechanics:** The writing is grammatically correct and free of major errors.
**Weaknesses:**
* **Lack of Focus on Immigrant Minorities:** While the essay touches on the challenges of integrating multiple cultures, it doesn't delve deep into the specific dynamics between immigrant minorities and local residents. It primarily focuses on the broader concept of multiculturalism.
* **Missing Key Aspects:** The essay doesn't explore crucial psychological factors influencing the relationship between immigrant minorities and local residents, such as:
* **Intergroup Relations:** The essay should discuss theories and research related to how different groups interact and perceive each other.
* **Prejudice and Discrimination:** The essay should explore the role of prejudice and discrimination in shaping the relationship between these groups.
* **Acculturation:** The essay should examine the process of adapting to a new culture and how this impacts the relationship between immigrant minorities and local residents.
* **Limited Scope:** The essay primarily focuses on the benefits and challenges of multiculturalism, without providing specific examples or analyses of the complexities of the relationship between immigrant minorities and local residents.
**Recommendations for Improvement:**
To elevate the essay's quality and fully address the assigned topic, the student should:
1. **Refocus the Essay:** Shift the focus from multiculturalism in general to the specific relationship between immigrant minorities and local residents.
2. **Incorporate Relevant Research:** Explore research on intergroup relations, prejudice, discrimination, and acculturation, focusing on their impact on the immigrant minority-local resident dynamic.
3. **Provide Concrete Examples:** Use real-world examples from research studies or current events to illustrate the complexities of this relationship.
4. **Analyze the Psychological Dynamics:** Examine the psychological factors that contribute to positive or negative interactions between these groups, such as:
* **Stereotyping and Bias:** How do stereotypes and biases influence perceptions and interactions between immigrant minorities and local residents?
* **Social Identity Theory:** How does social identity theory explain the formation of in-groups and out-groups and their impact on intergroup relations?
* **Intergroup Conflict:** What are the factors that contribute to intergroup conflict, and how can it be mitigated?
5. **Consider Different Perspectives:** Explore the perspectives of both immigrant minorities and local residents, acknowledging their unique experiences and challenges.
6. **Evaluate Potential Solutions:** Discuss potential solutions for fostering positive intergroup relations, such as promoting cultural understanding, reducing bias, and creating inclusive environments.
By making these adjustments, the student can transform their essay into a compelling and insightful exploration of the complex relationship between immigrant minorities and local residents within the framework of social psychology.
The feedback generated by this process is truly impressive. Most of the suggestions are informative and actionable, providing students with valuable comments to improve their work before submission. I am confident that if students can incorporate these suggestions into their essays, they will produce high-quality work. Of course, for teachers, this will be a huge workload relief, right?
In conclusion, this workflow has demonstrated its impressive capabilities when we break down tasks in a way that mimics human thinking. By doing so, we can automate complex processes and achieve remarkable results.



Comments
Post a Comment