Data Engineer
DESCRIPTION
Job summary
AWS Support is one of the largest and fastest growing business units within AWS. We are a highly technical, innovative organization revolutionizing the customer engagement processes and offers topnotch technical support for the portfolio of products and features of AWS. We are determined to redefine the word “Support” and lead the industry with best in class technology.
We are looking for an excellent Data Engineer who is passionate about data and the insights that large amounts of data sets can provide. You should possess both a data engineering background and a business acumen that enables you to think strategically. You will experience a wide range of problem solving situations requiring extensive use of data collection and analysis. The successful candidate will work in lock-step with BI Engineers, Data scientists, ML scientists, Business analysts, Product Managers and other stakeholders across organization. You will:
- Develop and improve the current data architecture, data quality, monitoring and data availability.
- Collaborate with Data Scientists to implement advanced analytics algorithms that exploit our rich data sets for statistical analysis, prediction, clustering and machine learning
- Partner with BAs across teams to build and verify hypothesis to improve the AWS Support business.
- Help continually improve ongoing reporting and analysis processes, simplifying self-service support for customers
- Keep up to date with advances in big data technologies and run pilots to design the data architecture to scale with the increased data sets of customer experience on AWS.
BASIC QUALIFICATIONS
- Bachelor’s/Masters degree in Computer Science or related technical field, or equivalent work experience.
- 4+ years of work experience with ETL, Data Modeling, and Data Architecture.
- 2+ years of work experience with Python, Scala or other scripting languages.
- Knowledge of AWS services including S3, Redshift, EMR, Kinesis and RDS.
- Experience with Big Data Technologies (Hadoop, Hive, Hbase, Pig, Spark, etc.)
- Knowledge of distributed systems as it pertains to data storage and computing
PREFERRED QUALIFICATIONS
- Experience in ETL optimization, designing, coding, and tuning big data processes using Apache Spark or similar technologies.
- Experience with building data pipelines and applications to stream and process datasets at low latencies.
- Experience handling data - tracking data lineage, ensuring data quality, and improving discoverability of data.
- Knowledge of distributed systems and data architecture (lambda)- design and implement batch and stream data processing pipelines, knows how to optimize the distribution, partitioning, and MPP of high-level data structures.
- Experience with native AWS technologies for data and analytics such as Redshift Spectrum, Athena, S3, Lambda, Glue, EMR, Kinesis, SNS, CloudWatch, etc.