Data Engineer II
![]() | |
![]() United States, Texas, Irving | |
![]() 7000 State Highway 161 (Show on map) | |
![]() | |
OverviewDoes pioneering new and innovative ways to reimagine and transform end-user productivity across the breadth and depth of Microsoft's global workforce sound exciting to you? Are you passionate about the future of work, driving innovation and showcasing an employee experience blueprint that inspires customers and partners to navigate their digital transformation? If so, Microsoft Digital (MSD) team is an excellent place for you to grow your career as a Data Engineer II. Microsoft Digital (MSD)'s mission is to power, protect, and transform the employee experience at Microsoft around the world. Come build community, explore your passions, do your best work and be a part of the team within Microsoft Digital (MSD). Microsoft Digital (MSD) is the team that innovates, creates, and delivers the vision for Microsoft's employee experience, human resources, corporate and legal affairs, global real estate products, and runs Microsoft's internal network and infrastructure, plus builds campus modernization and hybrid solutions. You will leverage the latest technologies and focus on empowering Microsoft employees with the tools and services that define both the physical and digital future of work. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
ResponsibilitiesFollows data modeling and data handling procedures to maintain compliance with applicable laws and policies across assigned workstreams. Works with others to tag data based on categorization (e.g., personally identifiable information [PII], pseudo-anonymized, financial). Helps others document data type, classifications, and lineage to ensure traceability. Governs accessibility of data within assigned data pipelines and/or data model(s). Contributes to the relevant data glossary to document the origin, usage, and format of data for each program. Applies standard modification techniques and operations (e.g., inserting, aggregating, joining) to transform raw data into a form that is compatible with downstream data sources, databases, and formats. Uses software, query languages, and computing tools to transform raw data from assigned pipelines, under direction from others. Assesses data quality and completeness using queries, data wrangling, and basic statistical techniques. Helps others merge data into distributed systems, products, or tools for further processing. With guidance, independently implements basic code to extract raw data from identified upstream sources using common query languages or standard tools, and contributes to checks that support data accuracy, validity, and reliability across a data pipeline component. Participates in code reviews and provides constructive feedback to team members. Uses knowledge of one or more use cases to implement basic orchestration techniques that automate data extraction logic from one source to another. Uses basic data protocols and reduction techniques to validate the quality of extracted data across specific parts of the data pipeline, consistent with the Service Level Agreement. Uses existing approaches and tools to record, track, and maintain data source control and versioning. Applies knowledge of data to validate that the correct data is ingested and that the data is applied accurately across multiple areas of work. Designs and maintains assigned data tools that are used to transform, manage, and access data. Writes efficient code to test and validate storage and availability of data platforms and implements sustainable design patterns to make data platforms more usable and robust to failure and change. Works with others to analyze relevant data sources that allow others to develop insights into data architecture designs or solution fixes. Supports collaborations with appropriate stakeholders and records and documents data requirements. Evaluates project plan to understand data costs, access, usage, use cases, and availability for business or customer scenarios related to a product feature. Works with advisement to explore the feasibility of data needs and finds alternative options if requirements cannot be met. Supports negotiation of agreements with partners and system owners to understand project delivery, data ownership between both parties, and the shape and cadence of data extraction for an assigned feature. Proposes project-relevant data metrics or measures to assess data across varied service lines. Contributes to the appropriate data model for the project and drafts design specification documents to model the flow and storage of data for specific parts of a data pipeline. Works with senior engineers and appropriate stakeholders (e.g., Data Science Specialists) to contribute basic improvements to design specifications, data models, or data schemas, so that data is easy to connect, ingest, has a clear lineage, and is responsive to work with. Demonstrates knowledge of the tradeoff between analytical requirements and compute/storage consumption for data and begins to anticipate issues in the cadence of data extraction, transformation, and loading into multiple, related data products or datasets in cloud and local environments. Demonstrates an understanding of costs associated with data that are used to assess the total cost of ownership (TOC). Performs root cause analysis in response to detected problems/anomalies to identify the reason for alerts and implement basic solutions that minimize points of failure. Implements and monitors improvements across assigned product feature to retain data quality and optimal performance (e.g., latency, cost) throughout the data lifecycle. Uses cost analysis to suggest solutions that reduce budgetary risks. Works with others to document the problem and solution through postmortem reports and shares insights with team or leadership. Provides data-based insights into the health of data products owned by the team according to service level agreements (SLAs) across assigned features. Follows existing documentation to implement performance monitoring protocols across a data pipeline. Builds basic visualizations and smart aggregations (e.g., histograms) to monitor issues with data quality and pipeline health that could threaten pipeline performance. Contributes to troubleshooting guides (TSGs) and operating procedures for reviewing, addressing, and/or fixing basic problems/anomalies flagged by automated testing. Contributes to the support and monitoring of platforms. Embody our Culture and Values |