Pipelines work with Inventory: Read/write raw physical objects
Reports work with Catalog: Access curated, secure business views
Dashboards consume Reports: Present analyzed, formatted insights
Definition: Registry of all physical data objects accessible to the platform
Tables: BigQuery, ClickHouse, PostgreSQL tables
Files: Parquet, CSV, Excel, JSON files in cloud storage
APIs: HTTP endpoints with structured responses
Automatic schema detection
Query dialect determination (BigQuery SQL, ClickHouse SQL, etc.)
Visibility controls (hide staging/internal objects)
Tenant-specific access based on subscription plan
Example Inventory entry:
- Name: sales_fact
- Type: bigquery.table
- Path: project.dataset.sales_fact
- Schema: {order_id: VARCHAR, customer_id: VARCHAR, amount: BIGINT, date: DATE}
- Visible: Yes
Definition: Curated business views of inventory data
Functions:
Abstraction: Map s3://data/sales_2024.parquet to sales
Consolidation: Combine multiple files into single logical table (Map s3://data/sales_*.parquet to all_sales)
Transformation: Apply pre-query filters, deduplication, calculations
Redirection: Seamlessly switch underlying storage (Parquet → ClickHouse)
Security: Row-level and column-level security rules
Catalog vs. Inventory:
- User sees "sales" (Catalog)
- Catalog maps to "s3://bucket/sales.delta" (Inventory)
- Catalog applies rule: "WHERE region = <<user_region>>"
- User gets filtered, business-friendly data
Role Hierarchy:
Viewer: Can view dashboards and reports
Analyst: Can create reports and dashboards
Designer: Can create pipelines and catalog rules
Admin: Full system access and user management
Tenant Admin: Manage users within a specific tenant
Permission Model:
Role-based access control (RBAC)
Catalog-level data permissions
Dashboard/report sharing controls
Pipeline execution privileges
Global Configuration:
Default date ranges and fiscal calendars
System-wide query limits and timeouts
Email/Slack notification settings
Branding and customization options
Tenant Settings:
Available data sources and connectors
Storage quotas and processing limits
Custom business rules and validations
Localization and formatting defaults
Orchestrate multiple pipelines in sequence
Example job:
name: nightly_sales_processing
steps:
- pipeline: import_sales
- pipeline: clean_sales_data
- pipeline: update_inventory_snapshot
- pipeline: send_daily_report
Automate job execution
Cron-based scheduling
Event-based triggers (file arrival, API call)
Manual override and immediate execution
Retry policies and failure notifications
Monitoring:
Execution history and logs
Performance metrics and bottlenecks
Failure analysis and debugging
Alerting and notification system
Dimensions: Hierarchical Smart Filters
Features:
Multi-level hierarchies (Category → Subcategory → Product)
Automatic roll-up and drill-down in reports
Cross-report/dashboard filtering
Time intelligence (Fiscal periods, ISO weeks)
Implementation:
Stored as catalog entries with special `dimension` type
Automatically available in Report and Dashboard filter panels
Maintained via pipelines or manual administration