BigQuery ML for LLM Classification

This guide walks through the one-time GCP setup needed to use LLM-based classification via BigQuery ML. Once complete, you can run ML.GENERATE_TEXT queries against Gemini models directly from BigQuery. Time estimate: ~15 minutes

Prerequisites

A GCP project with BigQuery enabled
roles/owner or roles/bigquery.admin + roles/aiplatform.admin on the project

Step 1: Enable the Vertex AI API

Go to APIs & Services in the Google Cloud Console
Select your project
Search for “Vertex AI API”
Click on it and click Enable (if not already enabled)

Step 2: Create a Vertex AI Connection in BigQuery

In the BigQuery console explorer, click ”+ Add Data”
Search for “Vertex AI” and select it
Select “BigQuery Federation”
Name it vertex-ai-conn (or any name you prefer)
Set the location to match your data (e.g., US)
Click “Create Connection”

Step 3: Grant the Connection’s Service Account Vertex AI Access

In BigQuery console explorer, expand External connections under your project
Click on the vertex-ai-conn connection you just created
Copy the Service account id (looks like bqcx-123456789-abcd@gcp-sa-bigquery-condel.iam.gserviceaccount.com)
Go to IAM & Admin in the Google Cloud Console
Click “Grant Access”
Paste the service account id in the “New principals” field
In “Select a role”, search for and select “Vertex AI User”
Click Save

This step occasionally fails on the first attempt saying the service account doesn’t exist. If that happens, wait a minute and try again — it usually works on the second attempt.

Step 4: Create the Remote LLM Model

In BigQuery console, click on your project in the explorer
Click the three dots next to your target dataset and select “Create ML Model”
Name the model (e.g., gemini_flash)
Choose “Connect to Vertex AI LLM service and CloudAI services”
In “Model Options” select “Google and Partner Models”
In “Model Selection” search for and select “gemini-2.5-flash-lite”
For the connection, select the vertex-ai-conn connection you created in Step 2
Click “Create Model”

Alternatively, run this SQL directly:

CREATE OR REPLACE MODEL `your_project.your_dataset.gemini_flash`
  REMOTE WITH CONNECTION `your_project.region.vertex-ai-conn`
  OPTIONS (ENDPOINT = 'gemini-2.5-flash-lite');

Replace your_project, your_dataset, and region with your actual values.

Step 5: Smoke Test

Run this query in the BigQuery console. If it returns a result, everything is working.

SELECT *
FROM ML.GENERATE_TEXT(
  MODEL `your_project.your_dataset.gemini_flash`,
  (SELECT 'What is 2 + 2? Reply with just the number.' AS prompt),
  STRUCT(0.0 AS temperature, 10 AS max_output_tokens, TRUE AS flatten_json_output)
);

Expected result: a row with ml_generate_text_llm_result containing 4.

Using Other Models

The same setup works for other Vertex AI models. Repeat Step 4 with a different model selection:

Use Case	Model	BigQuery Function
Text classification / generation	`gemini-2.5-flash-lite`	`ML.GENERATE_TEXT`
Text embeddings (similarity search)	`text-embedding-005`	`ML.GENERATE_EMBEDDING`
Multilingual embeddings	`text-multilingual-embedding-002`	`ML.GENERATE_EMBEDDING`

Each model needs its own CREATE MODEL statement, but they can all share the same Vertex AI connection.

Using with Malloy

There are three ways to use BigQuery ML models from Malloy, depending on your needs.

Approach 1: Inline LLM Call with `sql_string()`

Calls the LLM at query time for each row. The fastest way to get started — no separate pipeline needed. Requires ##! experimental{sql_functions}.

##! experimental{sql_functions}

source: feedback is my_connection.table('my_dataset.feedback') extend {
  dimension:
    ai_sentiment is sql_string("""
      JSON_EXTRACT_SCALAR(
        (SELECT ml_generate_text_result
         FROM ML.GENERATE_TEXT(
           MODEL `my_project.my_dataset.gemini_flash`,
           (SELECT CONCAT(
             'Classify this feedback as: POSITIVE, NEGATIVE, or NEUTRAL.\n',
             'Feedback: ', ${comment}
           ) AS prompt),
           STRUCT(0.0 AS temperature, 10 AS max_output_tokens)
         )),
        '$.candidates[0].content.parts[0].text'
      )
    """)

  measure:
    feedback_count is count()

  view: by_sentiment is {
    group_by: ai_sentiment
    aggregate: feedback_count
  }
}

This makes an LLM call per row at query time. Great for small datasets and prototyping. For large tables, use Approach 3 to batch-classify and join the results back.

Approach 2: Wrap ML Results as a Malloy Source with `connection.sql()`

Use connection.sql() to make a BigQuery ML query available as a joinable Malloy source. Useful when the ML output needs to be combined with other sources via joins.

source: item_embeddings is my_connection.sql("""
  SELECT *
  FROM ML.GENERATE_EMBEDDING(
    MODEL `my_project.my_dataset.embedding_model`,
    (SELECT item_name AS content, item_id FROM `my_project.my_dataset.items`),
    STRUCT(TRUE AS flatten_json_output, 'SEMANTIC_SIMILARITY' AS task_type)
  )
""")

source: items is my_connection.table('my_dataset.items') extend {
  join_one: item_embeddings on item_id = item_embeddings.item_id

  dimension:
    embedding is item_embeddings.ml_generate_embedding_result
}

Approach 3: Batch Classify via MalloySQL, Join Results Back

Best for classifying a large set of items once and reusing the results. Run a .malloysql file to materialize a classification table, then reference it as a Malloy source. Step 1 — Materialize (in a .malloysql file):

-- connection:my_connection

CREATE OR REPLACE TABLE `my_project.my_dataset.item_classification` AS
SELECT
  src.item_name,
  TRIM(llm.ml_generate_text_llm_result) AS category
FROM ML.GENERATE_TEXT(
  MODEL `my_project.my_dataset.gemini_flash`,
  (SELECT *, CONCAT(
    'Classify this item as: Electronics, Clothing, Food, or Other.\n',
    'Item: "', item_name, '"\nReturn one word only.'
  ) AS prompt FROM `my_project.my_dataset.distinct_items`),
  STRUCT(0.0 AS temperature, 10 AS max_output_tokens, TRUE AS flatten_json_output)
) AS llm
JOIN `my_project.my_dataset.distinct_items` AS src ON src.item_name = llm.item_name;

Step 2 — Join into your model (in a .malloy file):

source: item_classification is my_connection.table('my_dataset.item_classification') extend {
  primary_key: item_name
}

source: orders is my_connection.table('my_dataset.orders') extend {
  join_one: item_classification with item_name

  dimension:
    category is item_classification.category ?? 'Unclassified'

  measure:
    order_count is count()

  view: by_category is {
    group_by: category
    aggregate: order_count
    order_by: order_count desc
  }
}

Items not yet in the classification table fall back to 'Unclassified' via ??.

​Prerequisites

​Step 1: Enable the Vertex AI API

​Step 2: Create a Vertex AI Connection in BigQuery

​Step 3: Grant the Connection’s Service Account Vertex AI Access

​Step 4: Create the Remote LLM Model

​Step 5: Smoke Test

​Using Other Models

​Using with Malloy

​Approach 1: Inline LLM Call with sql_string()

​Approach 2: Wrap ML Results as a Malloy Source with connection.sql()

​Approach 3: Batch Classify via MalloySQL, Join Results Back

Prerequisites

Step 1: Enable the Vertex AI API

Step 2: Create a Vertex AI Connection in BigQuery

Step 3: Grant the Connection’s Service Account Vertex AI Access

Step 4: Create the Remote LLM Model

Step 5: Smoke Test

Using Other Models

Using with Malloy

Approach 1: Inline LLM Call with `sql_string()`

Approach 2: Wrap ML Results as a Malloy Source with `connection.sql()`

Approach 3: Batch Classify via MalloySQL, Join Results Back