Data Ingestion Engine (DIE) / Ingestion Layer

Overview

The Ingestion Layer is the first point of interaction for users looking to add their data to our system. Whether it’s tweets from Twitter, Discord messages, emails, bank statements, or any other type of data, this layer is responsible for securely fetching, validating, and formatting the data. The system supports multiple data sources and methods of ingestion to ensure flexibility and comprehensiveness.

Key functions of the Ingestion Layer include:

  • Data Collection: Aggregating data from various external services and user uploads.

  • Data Structuring: Organizing data in a consistent, accessible format.

  • Data Verification: Validating the authenticity and integrity of the data.

  • Up-to-date Synchronization: Ensuring that data remains current with real-time updates or periodic checks.

Components

Ingestion Modules

This is the core functionality of the ingestion layer. These are the pieces of codes to ingest and verify the authenticity of any given data - e.g. we need some code to ingest the twitter tweets. Now this piece of ingestion layer needs to know the twitter api response structure etc. Other kind of ingestion modules might be based on new technologies like zkTLS etc.

Currently we support the following Ingestion Methods:

  1. API Based Ingestion

  2. zkTLS Based Ingestion

  3. OAuth Based Ingestion

  4. Raw Data Upload

More methods to be added as required

Scheduler

The role of ingestion scheduler will be to:

  1. Update refresh tokens if necessary (in case of OAuth)

  2. Schedule fetching jobs for when for recurring data sourcing.

Bridges

The role of bridges is to transform data from one format to other. For example: if the given response is in csv and the application required is in json, so we need to run a bridge on the data to convert it in required format.

Last updated