Local Provider
Status: โ
Production Ready
Version: 1.0.0
Database: DuckDB, SQLite
Overview
The Local provider enables rapid development and testing without cloud costs. Perfect for:
- ๐ Learning Fluid Forge
- ๐งช Testing contracts before cloud deployment
- ๐ป Local data analysis with DuckDB
- ๐ฌ CI/CD testing pipelines
Quick Start
Installation
pip install fluid-forge duckdb
Minimal Contract
metadata:
name: local-analytics
version: 1.0.0
platform:
provider: local
database_path: ./analytics.duckdb
schema: main
assets:
- type: table
name: customers
query: |
SELECT * FROM read_csv_auto('./data/customers.csv')
Execute:
fluid apply contract.yaml --provider local
Supported Features
โ DuckDB Features
| Feature | Support | Notes |
|---|---|---|
| Tables | โ Full | CREATE TABLE, materialized |
| Views | โ Full | Standard SQL views |
| CSV/Parquet Loading | โ Full | Auto-schema detection |
| SQL Transformations | โ Full | Full SQL:2016 support |
| Indexes | โ Full | B-tree, ART indexes |
| CTEs & Window Functions | โ Full | Advanced SQL |
| JSON/Arrays | โ Full | Nested data structures |
โณ Limitations
- โ No IAM/authentication (local only)
- โ No partitioning (not needed for small data)
- โ No distributed queries
- โ ๏ธ Limited to single machine memory
Configuration
platform:
provider: local
# Database file path
database_path: ./my_database.duckdb
# Optional schema name
schema: analytics # Default: main
# DuckDB settings
settings:
memory_limit: 4GB
threads: 4
temp_directory: /tmp/duckdb
Use Cases
1. Development & Testing
Develop contracts locally, then deploy to cloud:
# Test locally
fluid apply contract.yaml --provider local
# Deploy to GCP when ready
fluid apply contract.yaml --provider gcp --project my-project
2. Data Analysis
Analyze local CSV/Parquet files:
sources:
- name: sales_data
type: csv
path: ./data/sales_*.csv
assets:
- type: view
name: monthly_revenue
query: |
SELECT
DATE_TRUNC('month', sale_date) as month,
SUM(amount) as revenue
FROM read_csv_auto('${sources.sales_data.path}')
GROUP BY month
3. CI/CD Testing
Test contracts in GitHub Actions:
# .github/workflows/test.yml
- name: Test Fluid Contract
run: |
fluid apply contract.yaml --provider local
fluid verify contract.yaml --provider local
Performance Tips
1. Use Parquet Instead of CSV
sources:
- name: large_dataset
type: parquet # 10x faster than CSV
path: ./data/events.parquet
2. Create Indexes
tables:
- name: customers
indexes:
- columns: [customer_id]
unique: true
- columns: [country, signup_date]
3. Optimize Memory
platform:
provider: local
settings:
memory_limit: 8GB # Increase for large datasets
max_memory: 80% # Use up to 80% of available RAM
Example Workflows
Load and Transform CSVs
metadata:
name: csv-pipeline
version: 1.0.0
platform:
provider: local
database_path: ./analytics.duckdb
sources:
- name: customers
type: csv
path: ./raw/customers.csv
- name: orders
type: csv
path: ./raw/orders.csv
assets:
- type: table
name: customer_orders
materialized: true
query: |
SELECT
c.customer_id,
c.name,
c.email,
COUNT(o.order_id) as total_orders,
SUM(o.amount) as total_spent
FROM read_csv_auto('${sources.customers.path}') c
LEFT JOIN read_csv_auto('${sources.orders.path}') o
ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name, c.email
Parquet Data Lake
platform:
provider: local
database_path: ./data_lake.duckdb
sources:
- name: events
type: parquet
path: ./lake/events/**/*.parquet # Wildcard glob
assets:
- type: view
name: daily_events
query: |
SELECT
DATE(event_time) as date,
event_type,
COUNT(*) as event_count
FROM read_parquet('${sources.events.path}')
WHERE event_time >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY date, event_type
Querying Results
Python
import duckdb
conn = duckdb.connect('analytics.duckdb')
# Query data
df = conn.execute("""
SELECT * FROM main.customers
WHERE total_spent > 1000
""").fetchdf()
print(df.head())
conn.close()
DuckDB CLI
duckdb analytics.duckdb
-- Interactive SQL
SELECT * FROM main.customers LIMIT 10;
-- Export to CSV
COPY (SELECT * FROM main.customer_summary)
TO 'export.csv' WITH (HEADER, DELIMITER ',');
-- Export to Parquet
COPY main.customer_summary TO 'export.parquet';
Cloud Migration
When ready to move to production:
1. Update contract:
# Change provider from 'local' to 'gcp'
platform:
provider: gcp # Changed!
project: my-project-id
region: us-central1
# Rest stays the same!
assets:
- type: dataset
name: analytics
# ... same tables and views
2. Deploy:
fluid apply contract.yaml --provider gcp
That's it! Your local development becomes cloud production.
Next Steps
- Local Walkthrough - Complete tutorial
- GCP Walkthrough - Migrate to cloud
- CLI Reference - Local provider commands
Perfect for development. Deploy to GCP when ready.
