Roles and Responsibilities
1.Data Pipeline Development: Design, build, and maintain scalable data pipelines for ingesting, processing, and transforming large datasets from diverse sources into usable formats.
2.Data Integration and Transformation: Integrate data from multiple sources, ensuring data is accurately transformed and stored in optimal formats (e.g., Delta Lake, Redshift, S3).
3.Performance Optimization: Optimize data processing and storage systems for cost efficiency and high performance, including managing compute resources and cluster configurations.
4.Automation and Workflow Management: Automate data workflows using tools like Airflow, Databricks APIs, and other orchestration technologies to streamline data ingestion, processing, and reporting tasks.
5.Data Quality and Validation: Implement data quality checks, validation rules, and transformation logic to ensure the accuracy, consistency, and reliability of data.
6.Cloud Platform Management: Manage and optimize cloud infrastructure (AWS, Databricks) for data storage, processing, and compute resources, ensuring seamless data operations.
7.Migration and Upgrades: Lead migrations from legacy data systems to modern cloud-based platforms, ensuring smooth transitions and enhanced scalability.
8.Cost Optimization: Implement strategies for reducing cloud infrastructure costs, such as optimizing resource usage, setting up lifecycle policies, and automating cost alerts.
9.Data Security and Compliance: Ensure secure access to data by implementing IAM roles and policies, adhering to data security best practices, and enforcing compliance with organizational standards.
10.Collaboration and Support: Work closely with data scientists, analysts, and business teams to understand data requirements and provide support for data-related tasks.