面向开发者的LLM入门课程-测试评估英文版: 英文版 1. 找出产品和类别名称 import utils_en products_and_category = utils_en.get_products_and_category() products_a……
哈喽!伙伴们,我是小智,你们的AI向导。欢迎来到每日的AI学习时间。今天,我们将一起深入AI的奇妙世界,探索“面向开发者的LLM入门课程-测试评估英文版”,并学会本篇文章中所讲的全部知识点。还是那句话“不必远征未知,只需唤醒你的潜能!”跟着小智的步伐,我们终将学有所成,学以致用,并发现自身的更多可能性。话不多说,现在就让我们开始这场激发潜能的AI学习之旅吧。
面向开发者的LLM入门课程-测试评估英文版:
英文版
1. 找出产品和类别名称
import utils_en
products_and_category = utils_en.get_products_and_category()
products_and_category
{‘Computers and Laptops’: [‘TechPro Ultrabook’,
‘BlueWave Gaming Laptop’,
‘PowerLite Convertible’,
‘TechPro Desktop’,
‘BlueWave Chromebook’],
‘Smartphones and Accessories’: [‘SmartX ProPhone’,
‘MobiTech PowerCase’,
‘SmartX MiniPhone’,
‘MobiTech Wireless Charger’,
‘SmartX EarBuds’],
‘Televisions and Home Theater Systems’: [‘CineView 4K TV’,
‘SoundMax Home Theater’,
‘CineView 8K TV’,
‘SoundMax Soundbar’,
‘CineView OLED TV’],
‘Gaming Consoles and Accessories’: [‘GameSphere X’,
‘ProGamer Controller’,
‘GameSphere Y’,
‘ProGamer Racing Wheel’,
‘GameSphere VR Headset’],
‘Audio Equipment’: [‘AudioPhonic Noise-Canceling Headphones’,
‘WaveSound Bluetooth Speaker’,
‘AudioPhonic True Wireless Earbuds’,
‘WaveSound Soundbar’,
‘AudioPhonic Turntable’],
‘Cameras and Camcorders’: [‘FotoSnap DSLR Camera’,
‘ActionCam 4K’,
‘FotoSnap Mirrorless Camera’,
‘ZoomMaster Camcorder’,
‘FotoSnap Instant Camera’]}
def find_category_and_product_v1(user_input, products_and_category):
“””
从用户输入中获取到产品和类别
参数:
user_input:用户的查询
products_and_category:产品类型和对应产品的字典
“””
# 分隔符
delimiter = “####”
# 定义的系统信息,陈述了需要 GPT 完成的工作
system_message = f”””
You will be provided with customer service queries.
The customer service query will be delimited with {delimiter} characters.
Output a Python list of json objects, where each object has the following
format:
‘category’:
AND
‘products’:
Where the categories and products must be found in the customer service
query.
If a product is mentioned, it must be associated with the correct category in
the allowed products list below.
If no products or categories are found, output an empty list.
List out all products that are relevant to the customer service query based
on how closely it relates
to the product name and product category.
Do not assume, from the name of the product, any features or attributes such
as relative quality or price.
The allowed products are provided in JSON format.
The keys of each item represent the category.
The values of each item is a list of products that are within that category.
Allowed products: {products_and_category}
“””
# 给出几个示例
few_shot_user_1 = “””I want the most expensive computer.”””
few_shot_assistant_1 = “””
[{‘category’: ‘Computers and Laptops’,
‘products’: [‘TechPro Ultrabook’, ‘BlueWave Gaming Laptop’, ‘PowerLite
Convertible’, ‘TechPro Desktop’, ‘BlueWave Chromebook’]}]
“””
messages = [
{‘role’:’system’, ‘content’: system_message},
{‘role’:’user’, ‘content’: f”{delimiter}{few_shot_user_1}{delimiter}”},
{‘role’:’assistant’, ‘content’: few_shot_assistant_1 },
{‘role’:’user’, ‘content’: f”{delimiter}{user_input}{delimiter}”},
]
return get_completion_from_messages(messages)
2.在一些查询上进行评估
# 第一个评估的查询
customer_msg_0 = f”””Which TV can I buy if I’m on a budget?”””
products_by_category_0 = find_category_and_product_v1(customer_msg_0,
products_and_category)
print(products_by_category_0)
[{‘category’: ‘Televisions and Home Theater Systems’, ‘products’: [‘CineView
4K TV’, ‘SoundMax Home Theater’, ‘CineView 8K TV’, ‘SoundMax Soundbar’, ‘CineView
OLED TV’]}]
# 第二个评估的查询
customer_msg_1 = f”””I need a charger for my smartphone”””
products_by_category_1 = find_category_and_product_v1(customer_msg_1,
products_and_category)
print(products_by_category_1)
[{‘category’: ‘Smartphones and Accessories’, ‘products’: [‘MobiTech
PowerCase’, ‘MobiTech Wireless Charger’, ‘SmartX EarBuds’]}]
# 第三个评估查询
customer_msg_2 = f”””
What computers do you have?”””
products_by_category_2 = find_category_and_product_v1(customer_msg_2,
products_and_category)
products_by_category_2
” n [{‘category’: ‘Computers and Laptops’, ‘products’: [‘TechPro Ultrabook’,
‘BlueWave Gaming Laptop’, ‘PowerLite Convertible’, ‘TechPro Desktop’, ‘BlueWave
Chromebook’]}]”
# 第四个查询,更复杂
customer_msg_3 = f”””
tell me about the smartx pro phone and the fotosnap camera, the dslr one.
Also, what TVs do you have?”””
products_by_category_3 = find_category_and_product_v1(customer_msg_3,
products_and_category)
print(products_by_category_3)
[{‘category’: ‘Smartphones and Accessories’, ‘products’: [‘SmartX
ProPhone’]}, {‘category’: ‘Cameras and Camcorders’, ‘products’: [‘FotoSnap DSLR
Camera’]}, {‘category’: ‘Televisions and Home Theater Systems’, ‘products’:
[‘CineView 4K TV’, ‘SoundMax Home Theater’, ‘CineView 8K TV’, ‘SoundMax
Soundbar’, ‘CineView OLED TV’]}]
3.更难的测试用例
customer_msg_4 = f”””
tell me about the CineView TV, the 8K one, Gamesphere console, the X one.
I’m on a budget, what computers do you have?”””
products_by_category_4 = find_category_and_product_v1(customer_msg_4,
products_and_category)
print(products_by_category_4)
[{‘category’: ‘Televisions and Home Theater Systems’, ‘products’: [‘CineView
8K TV’]}, {‘category’: ‘Gaming Consoles and Accessories’, ‘products’:
[‘GameSphere X’]}, {‘category’: ‘Computers and Laptops’, ‘products’: [‘TechPro
Ultrabook’, ‘BlueWave Gaming Laptop’, ‘PowerLite Convertible’, ‘TechPro Desktop’,
‘BlueWave Chromebook’]}]
4.修改指令
def find_category_and_product_v2(user_input, products_and_category):
“””
从用户输入中获取到产品和类别
添加:不要输出任何不符合 JSON 格式的额外文本。
添加了第二个示例(用于 few-shot 提示),用户询问最便宜的计算机。
在这两个 few-shot 示例中,显示的响应只是 JSON 格式的完整产品列表。
参数:
user_input:用户的查询
products_and_category:产品类型和对应产品的字典
“””
delimiter = “####”
system_message = f”””
You will be provided with customer service queries.
The customer service query will be delimited with {delimiter} characters.
Output a Python list of JSON objects, where each object has the following
format:
‘category’:
AND
‘products’:
Do not output any additional text that is not in JSON format.
Do not write any explanatory text after outputting the requested JSON.
Where the categories and products must be found in the customer service
query.
If a product is mentioned, it must be associated with the correct category in
the allowed products list below.
If no products or categories are found, output an empty list.
List out all products that are relevant to the customer service query based
on how closely it relates
to the product name and product category.
Do not assume, from the name of the product, any features or attributes such
as relative quality or price.
The allowed products are provided in JSON format.
The keys of each item represent the category.
The values of each item is a list of products that are within that category.
Allowed products: {products_and_category}
“””
few_shot_user_1 = “””I want the most expensive computer. What do you
recommend?”””
few_shot_assistant_1 = “””
[{‘category’: ‘Computers and Laptops’,
‘products’: [‘TechPro Ultrabook’, ‘BlueWave Gaming Laptop’, ‘PowerLite
Convertible’, ‘TechPro Desktop’, ‘BlueWave Chromebook’]}]
“””
few_shot_user_2 = “””I want the most cheapest computer. What do you
recommend?”””
few_shot_assistant_2 = “””
[{‘category’: ‘Computers and Laptops’,
‘products’: [‘TechPro Ultrabook’, ‘BlueWave Gaming Laptop’, ‘PowerLite
Convertible’, ‘TechPro Desktop’, ‘BlueWave Chromebook’]}]
“””
messages = [
{‘role’:’system’, ‘content’: system_message},
{‘role’:’user’, ‘content’: f”{delimiter}{few_shot_user_1}{delimiter}”},
{‘role’:’assistant’, ‘content’: few_shot_assistant_1 },
{‘role’:’user’, ‘content’: f”{delimiter}{few_shot_user_2}{delimiter}”},
{‘role’:’assistant’, ‘content’: few_shot_assistant_2 },
{‘role’:’user’, ‘content’: f”{delimiter}{user_input}{delimiter}”},
]
return get_completion_from_messages(messages)
5.进一步评估
customer_msg_3 = f”””
tell me about the smartx pro phone and the fotosnap camera, the dslr one.
Also, what TVs do you have?”””
products_by_category_3 = find_category_and_product_v2(customer_msg_3,
products_and_category)
print(products_by_category_3)
[{‘category’: ‘Smartphones and Accessories’, ‘products’: [‘SmartX
ProPhone’]}, {‘category’: ‘Cameras and Camcorders’, ‘products’: [‘FotoSnap DSLR
Camera’]}, {‘category’: ‘Televisions and Home Theater Systems’, ‘products’:
[‘CineView 4K TV’, ‘SoundMax Home Theater’, ‘CineView 8K TV’, ‘SoundMax
Soundbar’, ‘CineView OLED TV’]}]
6.回归测试
customer_msg_0 = f”””Which TV can I buy if I’m on a budget?”””
products_by_category_0 = find_category_and_product_v2(customer_msg_0,
products_and_category)
print(products_by_category_0)
[{‘category’: ‘Televisions and Home Theater Systems’, ‘products’: [‘CineView
4K TV’, ‘SoundMax Home Theater’, ‘CineView 8K TV’, ‘SoundMax Soundbar’, ‘CineView
OLED TV’]}]
7.自动化测试
msg_ideal_pairs_set = [
# eg 0
{‘customer_msg’:”””Which TV can I buy if I’m on a budget?”””,
‘ideal_answer’:{
‘Televisions and Home Theater Systems’:set(
[‘CineView 4K TV’, ‘SoundMax Home Theater’, ‘CineView 8K TV’,
‘SoundMax Soundbar’, ‘CineView OLED TV’]
)}
},
# eg 1
{‘customer_msg’:”””I need a charger for my smartphone”””,
‘ideal_answer’:{
‘Smartphones and Accessories’:set(
[‘MobiTech PowerCase’, ‘MobiTech Wireless Charger’, ‘SmartX EarBuds’]
)}
},
# eg 2
{‘customer_msg’:f”””What computers do you have?”””,
‘ideal_answer’:{
‘Computers and Laptops’:set(
[‘TechPro Ultrabook’, ‘BlueWave Gaming Laptop’, ‘PowerLite
Convertible’, ‘TechPro Desktop’, ‘BlueWave Chromebook’
])
}
},
# eg 3
{‘customer_msg’:f”””tell me about the smartx pro phone and
the fotosnap camera, the dslr one.
Also, what TVs do you have?”””,
‘ideal_answer’:{
‘Smartphones and Accessories’:set(
[‘SmartX ProPhone’]),
‘Cameras and Camcorders’:set(
[‘FotoSnap DSLR Camera’]),
‘Televisions and Home Theater Systems’:set(
[‘CineView 4K TV’, ‘SoundMax Home Theater’,’CineView 8K TV’,
‘SoundMax Soundbar’, ‘CineView OLED TV’])
}
},
# eg 4
{‘customer_msg’:”””tell me about the CineView TV, the 8K one, Gamesphere
console, the X one.
I’m on a budget, what computers do you have?”””,
‘ideal_answer’:{
‘Televisions and Home Theater Systems’:set(
[‘CineView 8K TV’]),
‘Gaming Consoles and Accessories’:set(
[‘GameSphere X’]),
‘Computers and Laptops’:set(
[‘TechPro Ultrabook’, ‘BlueWave Gaming Laptop’, ‘PowerLite
Convertible’, ‘TechPro Desktop’, ‘BlueWave Chromebook’])
}
},
# eg 5
{‘customer_msg’:f”””What smartphones do you have?”””,
‘ideal_answer’:{
‘Smartphones and Accessories’:set(
[‘SmartX ProPhone’, ‘MobiTech PowerCase’, ‘SmartX MiniPhone’,
‘MobiTech Wireless Charger’, ‘SmartX EarBuds’
])
}
},
# eg 6
{‘customer_msg’:f”””I’m on a budget. Can you recommend some smartphones to
me?”””,
‘ideal_answer’:{
‘Smartphones and Accessories’:set(
[‘SmartX EarBuds’, ‘SmartX MiniPhone’, ‘MobiTech PowerCase’, ‘SmartX
ProPhone’, ‘MobiTech Wireless Charger’]
)}
},
# eg 7 # this will output a subset of the ideal answer
{‘customer_msg’:f”””What Gaming consoles would be good for my friend who is
into racing games?”””,
‘ideal_answer’:{
‘Gaming Consoles and Accessories’:set([
‘GameSphere X’,
‘ProGamer Controller’,
‘GameSphere Y’,
‘ProGamer Racing Wheel’,
‘GameSphere VR Headset’
])}
},
# eg 8
{‘customer_msg’:f”””What could be a good present for my videographer
friend?”””,
‘ideal_answer’: {
‘Cameras and Camcorders’:set([
‘FotoSnap DSLR Camera’, ‘ActionCam 4K’, ‘FotoSnap Mirrorless Camera’,
‘ZoomMaster Camcorder’, ‘FotoSnap Instant Camera’
])}
},
# eg 9
{‘customer_msg’:f”””I would like a hot tub time machine.”””,
‘ideal_answer’: []
}
]
8.与理想答案对比
import json
def eval_response_with_ideal(response,
ideal,
debug=False):
“””
评估回复是否与理想答案匹配
参数:
response: 回复的内容
ideal: 理想的答案
debug: 是否打印调试信息
“””
if debug:
print(“回复:”)
print(response)
# json.loads() 只能解析双引号,因此此处将单引号替换为双引号
json_like_str = response.replace(“‘”,'”‘)
# 解析为一系列的字典
l_of_d = json.loads(json_like_str)
# 当响应为空,即没有找到任何商品时
if l_of_d == [] and ideal == []:
return 1
# 另外一种异常情况是,标准答案数量与回复答案数量不匹配
elif l_of_d == [] or ideal == []:
return 0
# 统计正确答案数量
correct = 0
if debug:
print(“l_of_d is”)
print(l_of_d)
# 对每一个问答对
for d in l_of_d:
# 获取产品和目录
cat = d.get(‘category’)
prod_l = d.get(‘products’)
# 有获取到产品和目录
if cat and prod_l:
# convert list to set for comparison
prod_set = set(prod_l)
# get ideal set of products
ideal_cat = ideal.get(cat)
if ideal_cat:
prod_set_ideal = set(ideal.get(cat))
else:
if debug:
print(f”没有在标准答案中找到目录 {cat}”)
print(f”标准答案: {ideal}”)
continue
if debug:
print(“产品集合:n”,prod_set)
print()
print(“标准答案的产品集合:n”,prod_set_ideal)
# 查找到的产品集合和标准的产品集合一致
if prod_set == prod_set_ideal:
if debug:
print(“正确”)
correct +=1
else:
print(“错误”)
print(f”产品集合: {prod_set}”)
print(f”标准的产品集合: {prod_set_ideal}”)
if prod_set <= prod_set_ideal:
print("回答是标准答案的一个子集")
elif prod_set >= prod_set_ideal:
print(“回答是标准答案的一个超集”)
# 计算正确答案数
pc_correct = correct / len(l_of_d)
return pc_correct
print(f’用户提问: {msg_ideal_pairs_set[7][“customer_msg”]}’)
print(f’标准答案: {msg_ideal_pairs_set[7][“ideal_answer”]}’)
用户提问: What Gaming consoles would be good for my friend who is into racing games?
标准答案: {‘Gaming Consoles and Accessories’: {‘ProGamer Racing Wheel’, ‘ProGamer Controller’, ‘GameSphere Y’, ‘GameSphere VR Headset’, ‘GameSphere X’}}
response = find_category_and_product_v2(msg_ideal_pairs_set[7][“customer_msg”],
products_and_category)
print(f’回答: {response}’)
eval_response_with_ideal(response,
msg_ideal_pairs_set[7][“ideal_answer”])
回答:
[{‘category’: ‘Gaming Consoles and Accessories’, ‘products’: [‘GameSphere X’,
‘ProGamer Controller’, ‘GameSphere Y’, ‘ProGamer Racing Wheel’, ‘GameSphere VR
Headset’]}]
1.0
9.计算正确比例
import time
score_accum = 0
for i, pair in enumerate(msg_ideal_pairs_set):
time.sleep(20)
print(f”示例 {i}”)
customer_msg = pair[‘customer_msg’]
ideal = pair[‘ideal_answer’]
# print(“Customer message”,customer_msg)
# print(“ideal:”,ideal)
response = find_category_and_product_v2(customer_msg,
products_and_category)
# print(“products_by_category”,products_by_category)
score = eval_response_with_ideal(response,ideal,debug=False)
print(f”{i}: {score}”)
score_accum += score
n_examples = len(msg_ideal_pairs_set)
fraction_correct = score_accum / n_examples
print(f”正确比例为 {n_examples}: {fraction_correct}”)
示例 0
0: 1.0
示例 1
错误
产品集合: {‘MobiTech Wireless Charger’, ‘SmartX EarBuds’, ‘SmartX MiniPhone’,
‘SmartX ProPhone’, ‘MobiTech PowerCase’}
标准的产品集合: {‘MobiTech Wireless Charger’, ‘SmartX EarBuds’, ‘MobiTech
PowerCase’}
回答是标准答案的一个超集
1: 0.0
示例 2
2: 1.0
示例 3
3: 1.0
示例 4
错误
产品集合: {‘SoundMax Home Theater’, ‘CineView 8K TV’, ‘CineView 4K TV’, ‘CineView
OLED TV’, ‘SoundMax Soundbar’}
标准的产品集合: {‘CineView 8K TV’}
回答是标准答案的一个超集
错误
产品集合: {‘ProGamer Racing Wheel’, ‘ProGamer Controller’, ‘GameSphere Y’,
‘GameSphere VR Headset’, ‘GameSphere X’}
标准的产品集合: {‘GameSphere X’}
回答是标准答案的一个超集
4: 0.3333333333333333
示例 5
5: 1.0
示例 6
6: 1.0
示例 7
7: 1.0
示例 8
8: 1.0
示例 9
9: 1
正确比例为 10: 0.8333333333333334
嘿,伙伴们,今天我们的AI探索之旅已经圆满结束。关于“面向开发者的LLM入门课程-测试评估英文版”的内容已经分享给大家了。感谢你们的陪伴,希望这次旅程让你对AI能够更了解、更喜欢。谨记,精准提问是解锁AI潜能的钥匙哦!如果有小伙伴想要了解学习更多的AI知识,请关注我们的官网“AI智研社”,保证让你收获满满呦!
还没有评论呢,快来抢沙发~