Skip to content

Writing to turbopuffer#

We recommend turbopuffer as a fast search engine that combines vector and full-text search using object storage, making all your data easily searchable.

Example#

This example does the following:

  1. Reads the Open-Orca/OpenOrca dataset from Hugging Face
  2. Computes text embeddings on the response column, using either an OpenAI model or the BAAI/bge-base-en-v1.5 open model on Hugging Face with Sentence Transformers
  3. Writes the results to turbopuffer
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# $ pip install -U "daft[turbopuffer]"
# $ pip install -U "daft[openai]"
# OR
# $ pip install -U "daft[sentence-transformers]"
import os

import daft
from daft.functions.ai import embed_text

# Create an embedding with OpenAI, or Sentence Transformers
# Requires OPENAI_API_KEY to be set (https://platform.openai.com/settings/organization/api-keys)
if os.getenv("OPENAI_API_KEY"):
    provider = "openai"
    model = "text-embedding-3-small"
else:
    provider = "sentence_transformers"
    model = "BAAI/bge-base-en-v1.5"

turbopuffer_config = {
    "namespace": "daft-tpuf-example",
    "region": "gcp-us-central1",
    "distance_metric": "cosine_distance",
    "schema": {
        "text": {
            "type": "string",
            "full_text_search": True,
        }
    }
}

(
    daft.read_huggingface("Open-Orca/OpenOrca")
    .limit(8)
    .with_column("vector", embed_text(daft.col("response"), provider=provider, model=model))
    .write_turbopuffer(**turbopuffer_config)  # Requires TURBOPUFFER_API_KEY to be set (https://turbopuffer.com/dashboard)
)

For a more a detailed example, check out our guide on generating text embeddings.

Check out our DataFrame.write_turbopuffer documentation for more customization options.