Data Engineer Data Engineer …

S&P Global
à Cambridge, MA, États-Unis
CDI, Plein-temps
Dernière candidature, 19 janv. 21
Competitive
S&P Global
à Cambridge, MA, États-Unis
CDI, Plein-temps
Dernière candidature, 19 janv. 21
Competitive
Data Engineer
About Panjiva Panjiva is a data-driven technology company that uses machine learning to provide powerful search, analysis, and visualizations of billions of shipping records from nearly every country in the world. More than 3,000 customers in over 100 countries, ranging from Fortune 500 companies and startups to government agencies and hedge funds, rely on our platform for supply chain intelligence. In global trade, better insight means better decision making and stronger connections between companies and governments across the globe. Recognizing Panjiva's cutting-edge technology, S&P Global acquired Panjiva in 2018. This acquisition has grown our resources, dramatically expanded our access to data, and accelerated our growth plans. People are Panjiva's greatest strength - join our engineering team as we revolutionize a key and fascinating part of the world economy! Job Description As a data engineer on our team, you will play a key role in developing our next-generation data science infrastructure and underlying core technologies. You will work with Panjiva's world-class data scientists, analysts, and engineers to create products that solve important real-world business problems in a collaborative, fast-paced, and fun environment. You'll work closely with our data science team to develop new platforms, infrastructure, and tools that will allow for advanced machine learning and artificial intelligence applications at production scale over massive (and ever-growing) datasets. Using cutting-edge distributed parallel processing systems and technologies like Hadoop and Spark, you will be tasked with architecting and implementing systems that find ways to apply complex transformations over vast amounts of data that might typically take hours or days, on the scale of seconds or minutes. You'll design and leverage distributed computing technologies, data schemas, APIs, and event-driven architectures to construct powerful data science pipelines that bring-to-life Panjiva's machine learning and NLP algorithms. In addition, you'll be expected to participate in augmenting our infrastructure to seamlessly integrate orders-of-magnitude-more data through constant R&D of the technologies, and systems we use. Join us in building the next generation of products as we continue to deliver valuable and actionable insights to decision-makers in the $15 trillion global trade industry. Responsibilities
  • Architect and implement distributed systems that perform complex transformations, processing, and analysis over very large scale datasets
  • Working with our data scientists to turn large-scale messy, diverse, and often unstructured data into a source of meaningful insights for our customers
  • Maintaining data integrity across various data sources
  • Optimizing slow-running database queries and data pipelines
  • Helping enhance our search engine, capable of running sophisticated user queries quickly and efficiently
  • Building internal tools and backend services to enable our data scientists and product engineers to improve efficiency
Qualifications
  • B.S., M.S., or Ph.D. in Computer Science (or a related field) or equivalent work experience
  • 5+ years of experience in working with data-at-scale in a production environment
  • Experience designing and implementing large-scale, distributed systems
  • Experience in multi-threaded software development (or some form of parallelism)
  • Significant performance engineering experience (e.g., profiling slow code, understanding complicated query plans, etc.)
  • Solid understanding of core algorithms and data structures, including the ability to select (and apply) the optimal ones to computationally expensive operations over data-at-scale
  • Strong understanding of relational databases and proficiency with SQL
  • Familiarity working event-driven architecture + technologies (e.g., RabbitMQ, Kafka)
  • Deep knowledge of at least one scripting language (e.g., Python, Ruby, JavaScript)
  • Deep knowledge of at least one compiled language (e.g., Scala, C++, Java, Go)
  • Experience developing software on Linux-based operating systems
  • Experience with distributed version control systems
Nice-to-Haves
  • Working knowledge of Scala, or any JVM-based programming language
  • Experience with Hadoop or frameworks/tools in its ecosystem (especially Spark)
  • Working knowledge of one or more NoSQL database systems
  • Familiarity with relational database internals (especially PostgreSQL)
  • Proficiency with cloud computing platforms, specifically AWS
  • Working knowledge of probability & statistics
  • Contributions to open-source software
  • Startup experience
  • Experience building customer-centric products
S&P Global is an equal opportunity employer committed to making all employment decisions without regard to race/ethnicity, gender, pregnancy, gender identity or expression, color, creed, religion, national origin, age, disability, marital status (including domestic partnerships and civil unions), sexual orientation, military veteran status, unemployment status, or any other basis prohibited by federal, state or local law. Only electronic job submissions will be considered for employment. If you need an accommodation during the application process due to a disability, please send an email to: EEO.Compliance@spglobal.com and your request will be forwarded to the appropriate person.

20 - Professional (EEO-2 Job Categories-United States of America), IFTECH202.2 - Middle Professional Tier II (EEO Job Group), SWP Priority - Ratings - (Strategic Workforce Planning)

Job ID: 257141
Posted On: 2020-12-15
Location: Cambridge, Massachusetts, United States
S&P Global logo
Offres similaires
Plus d'offres
Close
Loading...
Loading...