LangChain 中 qwen embeddings 使用 openai embeddings 兼容性问题

id: 01K7Z1ZT200EGFCF0ATD34NSCY

使用本地 embddings 遇到如下问题:

qwen3_embdings = OpenAIEmbeddings(
    model="text-embedding-qwen3-embedding-0.6b",
    base_url="http://127.0.0.1:1234/v1",
)

运行时报错:

Traceback (most recent call last):
  File "/Users/zhaochunqi/ghq/github.com/zhaochunqi/ai-agents-learning/main.py", line 20, in <module>
    _ = vector_store.add_documents(documents=all_splits)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zhaochunqi/ghq/github.com/zhaochunqi/ai-agents-learning/.venv/lib/python3.12/site-packages/langchain_core/vectorstores/in_memory.py", line 195, in add_documents
    vectors = self.embedding.embed_documents(texts)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zhaochunqi/ghq/github.com/zhaochunqi/ai-agents-learning/.venv/lib/python3.12/site-packages/langchain_openai/embeddings/base.py", line 590, in embed_documents
    return self._get_len_safe_embeddings(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zhaochunqi/ghq/github.com/zhaochunqi/ai-agents-learning/.venv/lib/python3.12/site-packages/langchain_openai/embeddings/base.py", line 480, in _get_len_safe_embeddings
    response = self.client.create(
               ^^^^^^^^^^^^^^^^^^^
  File "/Users/zhaochunqi/ghq/github.com/zhaochunqi/ai-agents-learning/.venv/lib/python3.12/site-packages/openai/resources/embeddings.py", line 132, in create
    return self._post(
           ^^^^^^^^^^^
  File "/Users/zhaochunqi/ghq/github.com/zhaochunqi/ai-agents-learning/.venv/lib/python3.12/site-packages/openai/_base_client.py", line 1259, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/zhaochunqi/ghq/github.com/zhaochunqi/ai-agents-learning/.venv/lib/python3.12/site-packages/openai/_base_client.py", line 1047, in request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'error': "'input' field must be a string or an array of strings"}

添加 check_embedding_ctx_length 参数即可

qwen3_embdings = OpenAIEmbeddings(
    model="text-embedding-qwen3-embedding-0.6b",
    base_url="http://127.0.0.1:1234/v1",
    check_embedding_ctx_length=False,  # 关键:禁用长度检查以兼容 LM Studio
)

原因是这个参数默认 True, LangChain 会检查输入文本是否超过模型的最大上下文长度,如果文本太长,会被分割成多个 chunk,这样向 api 发送的是 token 数组,不是原始的字符串。

具体的执行流程差异:

默认情况 (True) - 会出错的流程:

# 输入:"Hello world"
# 1. tiktoken 处理:[15496, 1917] (token IDs)
# 2. API 请求:{"input": [15496, 1917], "model": "..."}
# 3. LM Studio 收到整数数组,报错:"'input' field must be a string"

设为 False 后 - 正常的流程:

# 输入:"Hello world" 
# 1. 直接发送:{"input": "Hello world", "model": "..."}
# 2. LM Studio 收到字符串,正常处理
Tags:
langchainembeddings