3 Critical functions are required in any BigData Processing Platform:
Our Methodology:
Governance: Five Processes
- Auditing – for access, by resource etc.
- Lineage – dataset traceability origin to usage
- Metadata – discover, search relevant data
- Data Lifecycle – ingest, process, retire & purge
- Stewardship & Curation – owner & cataloging
Data Management: Three Processes
- Ingestion & Transmission (import / export)
- Replication (across DCs, Hot / Warm / Cold)
- Aggregation & Retention of datasets
Process Management: Three Processes
- Cluster, Feed, Process specifications
- Late Data Handling (back off, final)
- Scheduling & Dependency Management