8.LangChain中的格式化输出

我们之前说了如何对LangChain中的大模型进行输入提示

也说了如何使用自己的大模型进行调用

接下来我们需要讲解针对模型的输出，进行格式化的工具-输出解释器

输出解析器是专门格式化响应的类，其中包含两个核心方法

get_format_instructions: 返回一个字符串，需要传递给大模型，用于提示大模型如何生成回答
parse，根据字符串解析为特定的格式

当然还可以有可选方法，parse_with_prompt 根据一个字符串和一个提示解析为特定格式，可以由上下文知道该如何解析。

而在LangChain中，其官方提供的输出解析器，主要有

列表解析器，解析为一个列表的时候使用

日期时间解析器，解析为一个正确的时间或者日期

枚举解析器，用于解析为一组预先定义的值，比如一个问题的答案只能是是或者否，就可以利用枚举解析器

结构化输出解析器，用于处理复杂的，结构化的输出。

Pydantic解析器，JSON解析器，用于输出一个特定格式的Python对象

自动修复解析器，可以自动修复一些特定的模型输出错误

重试解析器，类似上面的自动修复解析器，只不过更加灵活。

前面的还好，尤其是结构化输出解析器，我们还用过。我们重点讲解下后三种解析器

Pydantic 解析器，自动修复解析器，重试解析器

对于Pydantic解析器

其有独特的特典，比如数据校验，在向Pydantic的代表的对象赋值的时候，会自动进行数据验证和数据转换，以及支持将类转换为JSON

这里首先声明一个Python的类，其要求继承pydantic的BaseModel，并且利用Field包裹字段。

from pydantic import BaseModel, Field

class FlowerDescription(BaseModel):

flower_type: str = Field(description=”鲜花的种类”)

price: int = Field(description=”鲜花的价格”)

description: str = Field(description=”鲜花的描述文案”)

reason: str = Field(description=”为什么要这样写这个文案”)

然后就是根据这个类创建一个输出解析器

from langchain.output_parsers import PydanticOutputParser

output_parser = PydanticOutputParser(pydantic_object=FlowerDescription)

并从其中获取到输出格式指示

format_instructions = output_parser.get_format_instructions()

输出格式：

The output should be formatted as a JSON instance that conforms to the JSON schema below.As an example, for the schema {“properties”: {“foo”: {“title”: “Foo”, “description”: “a list of strings”, “type”: “array”, “items”: {“type”: “string”}}}, “required”: [“foo”]}}the object {“foo”: [“bar”, “baz”]} is a well-formatted instance of the schema. The object {“properties”: {“foo”: [“bar”, “baz”]}} is not well-formatted.Here is the output schema:{“properties”: {“flower_type”: {“title”: “Flower Type”, “description”: “u9c9cu82b1u7684u79cdu7c7b”, “type”: “string”}, “price”: {“title”: “Price”, “description”: “u9c9cu82b1u7684u4ef7u683c”, “type”: “integer”}, “description”: {“title”: “Description”, “description”: “u9c9cu82b1u7684u63cfu8ff0u6587u6848”, “type”: “string”}, “reason”: {“title”: “Reason”, “description”: “u4e3au4ec0u4e48u8981u8fd9u6837u5199u8fd9u4e2au6587u6848”, “type”: “string”}}, “required”: [“flower_type”, “price”, “description”, “reason”]}

之后就是利用这个提示去创建模板，并让大模型输出我们想要的格式

prompt_template = “””您是一位专业的鲜花店文案撰写员。

对于售价为 {price} 元的 {flower} ，您能提供一个吸引人的简短中文描述吗？

{format_instructions}”””

# 根据模板创建提示，同时在提示中加入输出解析器的说明

prompt = PromptTemplate.from_template(prompt_template,

partial_variables={“format_instructions”: format_instructions})

input = prompt.format(flower=”玫瑰”, price=50)

output = model(input)

parsed_output = output_parser.parse(output)

parsed_output_dict = parsed_output.dict()

我们利用一个parser来进行了解析，解析为了FlowerDescription的实例。

{‘flower_type’: ‘Rose’, ‘price’: 50, ‘description’: ‘玫瑰是最浪漫的花，它具有柔和的粉红色，有着浓浓的爱意，价格实惠，50元就可以拥有一束玫瑰。’, ‘reason’: ‘玫瑰代表着爱情，是最浪漫的礼物，以实惠的价格，可以让您尽情体验爱的浪漫。’}’

这就是Pydantic解析器的使用。

之后是自动修复解析器的实战

其自动解析器，主要是根据常见的错误信息，自动修复一下，从而获取到正确的答案。

关于使用，我们可以拿一个手动触发的问题来查看。

class Flower(BaseModel):

name: str = Field(description=”name of a flower”)

colors: List[str] = Field(description=”the colors of this flower”)

parser = PydanticOutputParser(pydantic_object=Flower)

new_parser = OutputFixingParser.from_llm(parser=parser, llm=ChatOpenAI())

result = new_parser.parse( “{‘name’: ‘康乃馨’, ‘colors’: [‘粉红色’,’白色’,’红色’,’紫色’,’黄色’]}”)

result = new_parser.parse(misformatted）

上面我们给pydantic传递的字符串是单引号包裹的，而其期待的是双引号包裹.

对于这种小问题，自动修复解析器自动就把问题修复好了，也没再去调取大模型的API。

重试解析器 RetryWithErrorOutputParser解析器

简单的格式修复，出错不只是格式，比如输出不完整，这种是无法自身进行修复的。

这时候就需要重试解析器，帮助我们再次触发大模型能力

这里我们看下代码

首先是利用一个Pydantic类来创建一个格式化解析器

class Action(BaseModel):

action: str = Field(description=”action to take”)

action_input: str = Field(description=”input to the action”)

# 使用Pydantic格式Action来初始化一个输出解析器

from langchain.output_parsers import PydanticOutputParser

parser = PydanticOutputParser(pydantic_object=Action)

# 定义一个提示模板，它将用于向模型提问

from langchain.prompts import PromptTemplate

prompt = PromptTemplate(

template=”Answer the user query.\n{format_instructions}\n{query}\n”,

input_variables=[“query”],

partial_variables={“format_instructions”: parser.get_format_instructions()},

)

prompt_value = prompt.format_prompt(query=”What are the colors of Orchid?”)

这时候

我们在创建一个RetryWithErrorOutputParser，尝试获取一个正确的输出答案

retry_parser = RetryWithErrorOutputParser.from_llm(

parser=parser, llm=OpenAI(temperature=0)

)

parse_result = retry_parser.parse_with_prompt(‘{“action”: “search”}’, prompt_value)

这时候会发现得到的parse_result的结果为

‘{“action”: “search”,”action_input”:”colors of Orchid”}’

从而解决了问题

总结一下，我们说了结构化解析器和自动修复解析器，以及重试解析器

结构化解析器更偏向日常输出，而自动修复解析器，重试解析器则更加偏向于对parser的修复

Heaven.Blog

8.LangChain中的格式化输出

发表评论取消回复

发表评论 取消回复

发表评论取消回复