The successful candidate will be a Data Engineer with experience designing and building data analytics platforms on cloud based infrastructure. The role will be focused on building data pipelines that transform and persist data for various analytics use cases, while ensuring the completeness, consistency, and security of the data. The data sets will include structured and unstructured data, including everything from time series data, metadata, and text.
Design and implementation of large scale data platforms in cloud based environments for ongoing production analytics is required.
The candidate should have experience with cloud-native data tools on AWS (Kinesis, Glue, Redshift, Athena, Lambda, EMR, RDS, Aurora) and/or Google (Pub/Sub, Dataflow, BigTable, BigQuery, Dataproc, Spanner), as well as with open source platforms such as NiFi, Kafka, Flume, Hadoop, Spark, and Hive.
The candidate must have experience developing production ready code to perform data transformations and basic analytics, in one or more programming languages, to include Python. Experience working with numerical, scientific, and machine learning libraries is desired.
Experience with text based analytics including basic NLP techniques (tokenization, stemming, NER, etc.) is a plus. Experience with Lucene based search engines is also a plus, with a preference for Elasticsearch.
The candidate should have experience persisting data in multiple forms for different types of analysis. Experience transforming and persisting data to relational, various forms of NoSQL, and graph data stores, is strongly desired as is experience working with unstructured data.
The candidate should have experience migrating traditional SQL based workloads to take advantage of cloud native and serverless data architectures. The candidate will need to work closely with developers and perform proof of concept implementations for various use cases.
Experience working collaboratively with a team and ensuring code review, testing, and automation is implemented, with a focus on being able to continuously integrate and deploy updates to the platform without impacting its resiliency or consistency. Principal Responsibilities
- Design large scale data analytics platforms
- Design data pipelines and persistence in a cloud environment
- Build solutions to process structured and unstructured data from multiple sources
- Work with teams to build data catalogs and track lineage
- Build solutions for OLTP and OLAP
- Mange ETL at scale with open source and cloud native tools
- Design schemas and normalization strategies based on the queries performed
- Work with Lucene based search engines, with a preference for Elasticsearch
- Experience with data transport tools and messaging busses
- Work closely with software developers, data scientists, and Quantitative analysts
- Design and implement highly available, scalable, cloud based storage solutions
- Manage, monitor, and operate production platforms
- Hands-on oversight of development work for the data analytics platforms
- Extensive experience managing data engineering for production environments
- Experience designing, building, and automating cloud environments
- Persisting data on Cloud native platforms that optimizes for performance, resiliency, and cost
- Building Data pipelines, transformations, and catalogs using cloud native and open source tools
- Experience programming against cloud platform APIs
- Experience with cloud infrastructure templating tools such as CloudFormation
- Building elastically scalable environments that leverage horizontal or vertical scaling
- Developing cost optimization using preemptible, spot, or reserved instances
- Experience developing collaboratively, including infrastructure as code
- Excellent written and verbal communications with an ability to summarize and translate between business and technical contexts
- Excellent troubleshooting and analytical skills
- Self-starter able to execute independently, with light supervision