Data Engineer 数据工程师 in Beijing, China

Solves analytical problems using quantitative approaches through a combination of analytical, mathematical and technical skills. Researches, designs, implements and validates algorithms to analyze diverse sources of data to achieve project specific outcomes by leveraging statistical and predictive modeling concepts.

Leverages data science methodology to solve business problems

Creates individual algorithms using statistical methodologies through the use of statistical programming languages and tools

Partners with domain experts to verify model capabilities

Implements statistical techniques to clean, prepare and profile the data prior to deeper analysis



Abstract Reasoning - Envisions a solution before implementation by analyzing data, extracting patterns and relationships to establish a problem or solution's feasibility; develops new algorithms and analytical models using process diagrams, flow charts, and textual documentation to explain or conceptualize a complex problem.

Data Mining - Identifies relationships and patterns in data by using a suite of data exploration and data visualization techniques using tools such as PowerBI, R Shiny, SAS JMP, and extracts insights into multivariate data by applying principles of multivariate data mining, small sample statistical inferential tests, dimension reduction techniques to understand the underlying structure of the data and enable sound conclusions upon model building.

Data Reduction - Performs data reduction in the context of data mining using variable selection techniques to maximize signal to noise ratio in large datasets for further predictive modeling.

Predictive Modeling - Develops statistical and machine learning models using appropriate variable transformations, feature selection strategies, imputation strategies, class rebalancing, resampling strategies and performance metrics to generate descriptive, explanatory or predictive models.

Statistical Foundations - Builds statistical explanatory models for regression, classification, outlier detection, anomaly detection, time series forecasting using knowledge of foundational statistics such as Null Hypotheses Significance Tests, regression models, generalized linear modeling, time series analysis, rank statistics, probability distribution fitting survival analysis, etc. to validate hypotheses or generate predictions for any given statistical or business question.

Programming - Creates, writes and tests computer code, test scripts, and build scripts using algorithmic analysis and design, Cummins IT processes, standard and tools, version control, and build and test automation to meet business, technical, security, governance and compliance requirements.

Customer focus - Building strong customer relationships and delivering customer-centric solutions.

Global perspective - Taking a broad view when approaching issues, using a global lens.

Collaborates - Building partnerships and working collaboratively with others to meet shared objectives.

Communicates effectively - Developing and delivering multi-mode communications that convey a clear understanding of the unique needs of different audiences.

Self-development - Actively seeking new ways to grow and be challenged using both formal and informal development channels.

Nimble learning - Actively learning through experimentation when tackling new problems, using both successes and failures as learning fodder.

Education, Licenses, Certifications

College, university, or equivalent degree in statistics, information systems or related field required. PhD or Master’s degree in Statistics, Econometrics, Computer Science, or equivalent experience preferred


Minimal relevant work experience required

Focus on Data Engineering for data pipeline design on data lake.

Skill needed: IT, Data Science, Big Data tools/language


