Core concepts

Data Backbone

Workflow Integration

Pipelines work with Inventory: Read/write raw physical objects
Reports work with Catalog: Access curated, secure business views
Dashboards consume Reports: Present analyzed, formatted insights

Inventory: Physical Data Layer

Definition: Registry of all physical data objects accessible to the platform

Components:

Tables: BigQuery, ClickHouse, PostgreSQL tables
Files: Parquet, CSV, Excel, JSON files in cloud storage
APIs: HTTP endpoints with structured responses

Key Features:

Automatic schema detection
Query dialect determination (BigQuery SQL, ClickHouse SQL, etc.)
Visibility controls (hide staging/internal objects)
Tenant-specific access based on subscription plan

Example Inventory entry:

- Name: sales_fact

- Type: bigquery.table

- Path: project.dataset.sales_fact

- Schema: {order_id: VARCHAR, customer_id: VARCHAR, amount: BIGINT, date: DATE}

- Visible: Yes

Catalog: Business Abstraction Layer

Definition: Curated business views of inventory data

Functions:

Abstraction: Map s3://data/sales_2024.parquet to sales
Consolidation: Combine multiple files into single logical table (Map s3://data/sales_*.parquet to all_sales)
Transformation: Apply pre-query filters, deduplication, calculations
Redirection: Seamlessly switch underlying storage (Parquet → ClickHouse)
Security: Row-level and column-level security rules

Catalog vs. Inventory:

- User sees "sales" (Catalog)

- Catalog maps to "s3://bucket/sales.delta" (Inventory)

- Catalog applies rule: "WHERE region = <<user_region>>"

- User gets filtered, business-friendly data

Administration & Governance

Users and Roles

Role Hierarchy:

Viewer: Can view dashboards and reports
Analyst: Can create reports and dashboards
Designer: Can create pipelines and catalog rules
Admin: Full system access and user management
Tenant Admin: Manage users within a specific tenant

Permission Model:

Role-based access control (RBAC)
Catalog-level data permissions
Dashboard/report sharing controls
Pipeline execution privileges

System Setting

Global Configuration:

Default date ranges and fiscal calendars
System-wide query limits and timeouts
Email/Slack notification settings
Branding and customization options

Tenant Settings:

Available data sources and connectors
Storage quotas and processing limits
Custom business rules and validations
Localization and formatting defaults

Jobs

Orchestrate multiple pipelines in sequence

Example job:

name: nightly_sales_processing

steps:

- pipeline: import_sales

- pipeline: clean_sales_data

- pipeline: update_inventory_snapshot

- pipeline: send_daily_report

Schedules

Automate job execution

Cron-based scheduling
Event-based triggers (file arrival, API call)
Manual override and immediate execution
Retry policies and failure notifications
Monitoring:
- Execution history and logs
- Performance metrics and bottlenecks
- Failure analysis and debugging
- Alerting and notification system