Data Ingestion Engine (DIE) / Ingestion Layer
Overview
The Ingestion Layer is the first point of interaction for users looking to add their data to our system. Whether it’s tweets from Twitter, Discord messages, emails, bank statements, or any other type of data, this layer is responsible for securely fetching, validating, and formatting the data. The system supports multiple data sources and methods of ingestion to ensure flexibility and comprehensiveness.
Key functions of the Ingestion Layer include:
Data Collection: Aggregating data from various external services and user uploads.
Data Structuring: Organizing data in a consistent, accessible format.
Data Verification: Validating the authenticity and integrity of the data.
Up-to-date Synchronization: Ensuring that data remains current with real-time updates or periodic checks.
Components
Ingestion Modules
This is the core functionality of the ingestion layer. These are the pieces of codes to ingest and verify the authenticity of any given data - e.g. we need some code to ingest the twitter tweets. Now this piece of ingestion layer needs to know the twitter api response structure etc. Other kind of ingestion modules might be based on new technologies like zkTLS etc.
Currently we support the following Ingestion Methods:
API Based Ingestion
zkTLS Based Ingestion
OAuth Based Ingestion
Raw Data Upload
More methods to be added as required
Scheduler
The role of ingestion scheduler will be to:
Update refresh tokens if necessary (in case of OAuth)
Schedule fetching jobs for when for recurring data sourcing.
Bridges
The role of bridges is to transform data from one format to other. For example: if the given response is in csv and the application required is in json, so we need to run a bridge on the data to convert it in required format.
Last updated