Skip to main content

Overview

Kroo transforms complex construction API data into clean, queryable tables in your data warehouse. This page explains our data modeling methodology and sync strategies.

Data Modeling Approach

The Four-Step Process

1

Fetch Raw Data

Retrieve complete JSON payloads from your construction management APIs
2

Flatten Main Resources

Convert primary objects (RFIs, change orders, etc.) into main database tables
3

Extract Nested Data

Transform nested arrays and objects into separate sub-tables
4

Create Relationships

Link sub-tables back to main tables using defined join keys for easy querying

Real Example: RFI Data Transformation

Here’s how we process a typical RFI response from Procore: Source API Response:
{
  "rfi": {
    "id": "abc",
    "due_date": "2024-11-22",
    "status": "open",
    "logs": [
      {
        "created_by": "Jonny",
        "comment": "test"
      }
    ]
  }
}
Generated Tables:
  • Main Table: kroo_procore.rfis
  • Sub-table: kroo_procore.rfi_logs
procore_id (string)due_date (date)status (string)
“abc”2024-11-22”open”
Key Relationships:
  • procore_id in the main table comes from the API’s id field
  • rfi_id in the sub-table links back to the main RFI record
  • This structure enables easy JOINs for comprehensive reporting
Why This Approach? This methodology ensures your data warehouse follows relational database best practices while maintaining all the rich detail from your construction management systems.

Understanding Internal Metadata Columns

Important: Any column in the Kroo warehouse that starts with an underscore (_) is internal metadata added by Kroo’s data pipeline. These fields did not come directly from your source system.
Common Internal Metadata Columns:
  • _partner_id - Internal identifier used by Kroo’s sync system
  • _data_type - Internal classification used for pipeline processing
  • _fivetran_deleted - Soft delete flag (exception: this field MUST be used in WHERE clauses)
Best Practice: When building queries and views, focus on the business columns that come from your source systems. The only internal metadata column you should regularly use is _fivetran_deleted = 0 to filter out soft-deleted records.

Sync Efficiency & Performance

Smart Sync Strategies

Kroo uses several optimization techniques to ensure reliable, efficient data integration:

Rate Limiting

Client-side rate limits with back-off
  • Respects source system availability
  • Maximizes sync efficiency
  • Prevents API overload

Incremental Updates

Only sync what’s changed
  • Reduces data transfer volume
  • Minimizes warehouse load
  • Faster sync completion times

System Load Management

Our approach minimizes impact on your source systems:
  • Intelligent scheduling during low-usage periods
  • Gradual back-off when systems are under load
  • Monitoring and alerts for sync health
  • Error recovery with automatic retry logic
Result: Your construction management systems continue running smoothly while Kroo keeps your data warehouse up-to-date with the latest project information.

Data Quality Assurance

Validation & Consistency

Every sync includes:
  • Schema validation against expected data structures
  • Data type checking for consistency
  • Relationship integrity between main and sub-tables
  • Duplicate detection and handling

Monitoring & Alerts

Track data health with:
  • Pipeline sync status monitoring
  • Data freshness alerts
  • Error notifications and resolution
  • Performance metrics and trends

Next Steps

Ready to monitor your data pipelines? Check out our Pipeline Status Monitoring guide to set up real-time dashboards.