Experience:
- 5-10 years of experience in data engineering, with a focus on data quality, metadata, data lineage and data governance.
Technical Skills:
- Strong programming skills in Python, SQL, or Scala; experience with modern data frameworks such as PySpark and dbt (Data Build Tool) is a plus.
- Expertise in ETL/ELT tools like Informatica Intelligent Data Management Cloud (IDMC), Talend, Apache NiFi, or similar cloud-native solutions.
- Proficiency with metadata management platforms such as Informatica EDC, Collibra, or Alation, including automation of metadata ingestion, classification, and lineage mapping.
- Hands-on experience with data quality tools (e.g., Informatica Data Engineering Quality (DEQ), Collibra Data Quality & Observability, Great Expectations) and custom validation scripting.
- Strong knowledge of data governance frameworks and tools (e.g., Informatica Axon, Collibra Governance, Alation Data Governance).
- Experience with cloud data platforms and databases (e.g., Snowflake, Databricks, Oracle, SQL Server), as well as data lake/lakehouse architectures.
- Familiarity with multi-cloud and hybrid environments (e.g., AWS, Azure, Google Cloud Platform) and their native data services.
- Practical experience applying Generative AI and Large Language Models (LLMs) for:
- Metadata automation (auto-tagging, semantic enrichment, and data catalog population)
- Anomaly detection in pipelines and datasets using AI-driven observability tools
- Data quality improvement through AI-based rules generation, pattern recognition, and automated remediation suggestions