# Prompt Engineering

A central part of this solution is to showcase how to use LLM to perform advanced text analysis.

The design of prompts is very important in terms of the quality and reliability of the outputs.
Our proposals should be considered examples of handling this use case, but validating them with the specific LLM used and your data should be an essential step before scaling this topic.


## Topic Modeling

[compute_llm_topics_raw](recipe:compute_llm_topics_raw)
Prompt:

> You are a topic-labeling assistant who concisely identifies relevant issues or thematics from reviews.
> 
> We want to identify what customers like and dislike about the products. The topics should be broad enough to be relevant to most reviews while identifying specific issues.
> 
> Topics should be limited to a single idea; avoid combining two words with "and" in a topic.
> 
> 
> Please provide five topics in the form of a list as this example: 
> ["Product Pricing", "Delivery Speed", "Build Quality", "Flavors", "Originality"]
> 
> Here is the list of the reviews that need to be used:
> {{reviews}}


For this application, we aggregate a sample of reviews to identify the relevant topics. This prompt is relatively short, so it's essential to validate that the output has the proper shape as a list of issues. It's also necessary to validate the topic identification to remove potential duplicated or unnecessary topics.

## topic analysis

[compute_reviews_llm_analyzed_raw](recipe:compute_reviews_llm_analyzed_raw)
Prompt:

> You are a review analyzer, trying to define the relevant information from user reviews regarding food products.
> 
> Here is the review: {{input}}
> 
> Please summarize the average opinion of those reviews for a given product and detail the critical points according to those specific topics: {{topics}}
> 
> Try to identify the positive aspects as well as the negative ones.
> 
> The output needs to be a list of dictionaries, with each one having this data schema:
> {
> "topic_name": a few words to present the topic,
>  "rating": a numerical rating on a scale of 5 regarding the topic (1 meaning the users are very unhappy about the subject, 5 suggesting that the users loved this aspect),
>  "summary": a sentence presenting an overall opinion of the customers for this topic
>  "keywords": list of cited words from the review related to the topic
> }
> 
> 
> Example of output:
> [
>{"topic_name": "Taste and Flavor",
>"rating": 4,
>"summary": "Customers generally love the taste, finding the cookies delicious, soft, and reminiscent of homemade cookies with a good balance of sweetness and spices.",
>"keywords": ["Delicious"]
> 
>},
>{"topic_name": "Texture and Freshness",
>"rating": 4,
>"summary": "The soft and chewy texture is highly appreciated, with many customers noting the cookies maintain their freshness, especially due to individual packaging.",
>"keywords": ["Fresh", "keeps chewiness with the wrapper"]
>},
>{"topic_name": "Healthiness and Nutrition",
>"rating": 3,
>"summary": "Customers are pleased with the inclusion of whole grains and fiber, but some express concerns over calorie content and the presence of additives and sugars.",
>"keywords": ["Very caloric", "too much added sugar"]
>},
>{"topic_name": "Packaging",
>"rating": 5,
>"summary": "The individual wrapping of cookies is praised for its convenience and ability to preserve freshness, making them easy to include in lunches or for on-the-go snacking.",
>"keywords": ["good wrappers"]
> 
>},
>{"topic_name": "Value for Money",
>"rating": 4,
>"summary": "Most customers find the cookies reasonably priced and a good value for the quality and taste they offer.",
>"keywords": ["acceptable price"]
>}
> ]
> 
> if the topic is not mentioned or is inconclusive related to the review, put N/A in the rating instead of a number
> 
> Only reply with the list of dictionaries and no other sentences.


With this prompt, we ask the LLM to analyze a review of the provided topics. The prompt becomes more complex as we try to present the task properly.

We explain the expected data schema of the output with a description and provide a valid example.

The output's structure is more complex, making it more prone to instabilities, especially when changing LLM. Therefore, providing another example and explicitly forbidding some type of output might be necessary.


We recommend using prompt studios to refine the prompts when the contexts change (e.g., new dataset, new model).