Senior Software Engineer - Communication Channels, Infrastructure & Reliability
About Communication Channels(CC):
The Communication Channels team builds products used by the Bloomberg community for real-time communication, such as exchanging price quotes, trade ideas, news, and other financial information. Our email (MSG) and instant message (IB) products deliver more than 2 billion messages across millions of chat rooms per day. We have a broad user-base unlike any other company, including asset managers, brokers, traders, financial analysts, and desks across all asset classes. Our users rely on these products' real-time performance, massive scale, ironclad security, tight integration with financial data & applications on the Bloomberg Terminal, and most importantly their singular access to the Bloomberg network of 350,000 financial professionals. Infrastructure and Reliability Team:
Given the criticality of our products to the daily workflow of the financial community, and the scale at which they are used, the Infrastructure and Reliability Engineering team is one of the most visible teams across Bloomberg. Our products are continuously evolving and have experienced more than 100% growth in usage over the last year, which means we have very high standards for reliability, stability, and scalability. That's where you come in
What's in it for you
- As a member of the team, you'll be trusted to ensure that our production systems are well-monitored, healthy, automated, and designed to scale.
- You'll be building and standardizing observability tools to determine the health of MSG and IB systems for engineering as well as business partners.
- We'll depend on you to improve resiliency of our infrastructure through stress tests and chaos engineering, confirming the effectiveness of failover systems and auto-recovery.
- We'll trust you to define standards & maintain libraries for monitoring, logging, alarming, and provisioning across 90+ developers.
- You'll be involved from design to deployment, to ensure our infrastructure is reliable, performant and scalable.
- As a member of the team, you'll help build and standardize our performance and capacity planning environment and capacity of our system as we continue adding features and users.
Our projects include:
- A critical part of our mission is fostering a culture of reliability across Engineering teams in Communication Channels - you'll be able to make a significant impact on the design choices and decisions that go into developing MSG and IB infrastructure.
- This is an opportunity to forge your own path and drive the engineering culture forward. Making our infrastructure best-in-class will be your main mission, so you'll have many opportunities to create and implement your own improvements.
- We'll send you to professional conferences and meetups to keep up with the technology space outside Bloomberg, and apply that knowledge to building and improving our processes and products.
You'll need to have:
- Building downstream and upstream caller reports to quickly identify bottlenecks and dependencies of our system using Apache Spark and distributed tracing infrastructure
- Building tools and dashboards to track the availability and uptime of our products
- Creating black-box health testing frameworks to monitor the health of IB and MSG
- Building a comprehensive performance testing framework that will be utilized by all teams in Communication Channels for stress-testing and capacity measurement of key pieces of infrastructure
- Establishing standards and building dashboards, libraries and tools for metric collection, visualization and alarming
- Establishing procedures around scalability, failover, Service Level Objectives (SLOs), cluster provisioning, deployment strategies, etc. with the goal of improving the robustness of our infrastructure
We'd love to see:
- 3+ years of professional work experience in a software engineer, infrastructure or SRE role
- Proven experience with at least one object oriented language with preference towards C++, Python or Java
- Demonstrated experience with design and implementation of large scale distributed systems
- Experience with one or more of: system design, production monitoring, capacity management, deployment and rollback, provisioning, configuration and orchestration
- Strong communication skills
- BA, BS, MS, PhD in Computer Science, Engineering or related technology field
- Experience with big data technologies like Apache Spark, Amazon S3, Kafka
- Exposure to observability tools such as Graphite, Splunk, Humio and Distributed Tracing
- Exposure to containers and orchestration frameworks
- A track record of open-source contributions
Bloomberg is an equal opportunity employer, and we value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.