2022 INFORMS QSR Data Challenge Competition

Sponsored by the Ford Motor Company

The Quality, Statistics, and Reliability (QSR) Section of the Institute for Operations Research and the Management Sciences (INFORMS) announces the Data Challenge Award to recognize excellence in data modeling techniques among the submissions to the 2022 QSR Data Challenge Competition. This award program brings prestige to the QSR Section as well as to the recipients honored. 

Important Note (09/06/2022): Some of explanatory files that can cause some confusion with the root cause. The explanatory files has been updated, please check the Downloading section of this page to see the updated version.

Click here to download dataset.

Important Note (08/28/2022): If any participating team has questions regarding files with missing Phase IDs associated with a root cause, then we would appreciate it if they could notify us immediately. We will be happy to address their questions.

Problem Summary

Automatic defective sample detection, i.e., anomaly detection, is a major challenge in production lines during the manufacturing process of products. A variety of quality tests are applied at the end of the production lines to ensure the quality of products. Each part’s technical and practical performances are examined through a set of comprehensive evaluative test procedures. The early detection of quality issues in the manufacturing process can prevent potential in-vehicle failures in the future, reduce warranty cost, and improve customer satisfaction. During the End of Line (EoL) test process, an extensive set of data is collected and analyzed using sensor signals (time series) and images.

The task of this QSR Data Challenge Competition is to develop a (unsupervised) machine learning model that can successfully detect anomalous patterns from multichannel time series data recorded during a manufactural EoL quality test from a specific vehicle component (we use the terms of “product”, “part”, “component”, “system”, or “unit” interchangeably to refer to a specific part of the vehicle).

Important Info

Please read this section carefully for policies.

  • Submission instructions: 

July 29, 2022

  • Final Submission Deadline

September 16, 2022

  • Finalist Announcement

September 26, 2022

  • Presentation Date

October 15, 2022 (to be held during the Second QSR Workshop)

Important Dates

  • At least one member of a finalist team must be a QSR member.
  • Team submissions (with no more than four members per team) are welcome. We especially encourage the participation of industry members on the teams.
  • At least one member of each of the finalist teams must be committed to presenting their work in-person during the INFORMS Workshop on Quality, Statistics, and Reliability. In the unfortunate case that the workshop is to be virtually held due to COVID, we will follow INFORMS guidelines and seek alternative presentation formats.

Eligibility Requirements

Three to four finalists for the Data Challenge Award will be selected to make presentations at the 2nd INFORMS Workshop on QSR prior to the 2022 INFORMS Annual Meeting. The workshop will be held on October 15, 2022 in Indianapolis, IN. The winner will be announced at the QSR Business Meeting later at the Annual Meeting.

Finalists Selection

Q&A Zoom Meeting

The organizers of the QSR Data Challenge Competition will host a one-hour Q&A Zoom session on Friday, August 26 at 4:00 PM (Eastern) The details of this zoom session are as follows:

Topic: 2022 INFORMS QSR Data Challenge Competition FAQ Session
Time: Aug 26, 2022 04:00 PM Eastern Time (US and Canada)

Join Zoom Meeting
https://purdue-edu.zoom.us/j/99945967362?pwd=VWNZTXR5REhkaStOTGNldHlrOXNDQT09
Meeting ID: 999 4596 7362
Passcode: 475609
One tap mobile
+16465588656,,99945967362#,,,,*475609# US (New York)
+16469313860,,99945967362#,,,,*475609# US
Dial by your location
+1 646 558 8656 US (New York)
+1 646 931 3860 US
+1 301 715 8592 US (Washington DC)
+1 309 205 3325 US
+1 312 626 6799 US (Chicago)
+1 669 444 9171 US
+1 669 900 6833 US (San Jose)
+1 719 359 4580 US
+1 253 215 8782 US (Tacoma)
+1 346 248 7799 US (Houston)
+1 386 347 5053 US
+1 564 217 2000 US
Meeting ID: 999 4596 7362
Passcode: 475609
Find your local number: https://purdue-edu.zoom.us/u/aOfio7Z9N

Downloading

Please click to download detailed call for participants and problem descriptions.

Join the Competition!

Please fill out the survey to join INFORMS QSR Data Challenge.

FAQ

Please contact the organizors if you cannot find answers to your questions in this section.

  • 1. Can we use any supervised or unsupervised method for root cause analysis?
  • Yes, you can use any method that you want for the root cause analysis.
  • 2. In the shared code using the autoencoder and LSTM, the models are trained on the normal data. Is this model still qualified as an unsupervised model? As we know in this problem that most of the accepted files are not anonymous, the LSTM can learn normal behaviors in this way. However, you can consider it as a semi-supervised model as well.
  • 3. In the dataset, the phase number is not ordered in a time sequence. For example, phase 90 may show up earlier than phase 80. Does it matter for the final analysis?
  • Phase numbers do not occur in order, so their order is not important here. The order of the time series is based on time stamps.
  • 4. Each Excel file contains many rows associated with a binary label. Should we calculate the precision, recall, and F1 scores based on each row (i.e., classify the label of each row), or should we calculate them based on each Excel file (i.e., classify each file as accept/reject)?
  • You should calculate the precision, recall, and F1 scores based on each Excel file. Each file has all the information that should be processed for an accept/reject (i.e., normal/anomaly) decision to be made. Please note that there are some relations between feature pairs and there are some phases that are more important, which have been mentioned in the sample code.
  • 5. In the Root Cause look-up table, some Phases or Steps associated with some rejection codes are left as empty entries. What does this mean? Is it meant to convey that all Phases and Steps are problematic?
  • The empty entries actually correspond to some aspects of the data that are not easily explainable (but would be more straightforward to discuss over the Zoom meeting that is scheduled for Friday, August 26). In short, some root causes have fine-level details (including Phase and Step) and some do not. For those cases that do not, indication of the name of the feature (and/or feature and phase, if at all applicable) would be sufficient.
  • 6. Several datasets are labeled as rejected (1) in the feature 55, but they are still placed in the "accept" folder instead of "reject" folder. For example, dataset #27, #157 and #257 have feature 55 labeled as rejected (1) at the end of the datasets, but are listed in the "accept" folder. Is this a mistake?
  • This is part of the challenge that we have with real data. From time to time, some minor samples may be labeled with different rejection criteria. The Ford team tried to minimize such effects. However, this is part of the normal behavior of the manufacturing process, and so some noisy labels may be found. We expect participants to deal with noisy label removal/correction as part of the real challenge. There are 3 noisy samples in the training set, including: Sweep_trans_157_accept.csv, Sweep_trans_257_accept.csv, and Sweep_trans_27_accept.csv. There are 2 in the test set, including: Sweep_trans_56_accept.csv, and Sweep_trans_76_accept.csv.
  • 7. When checking the corresponding feature-phase-step of the root cause codes given by the ground truth algorithm (the root cause code appeared in the feature 50-54 in the rejection datasets), there are many root cause codes shown in the feature 50-54 rejection datasets that cannot be found in the root_cause_look_up_table.xlsx. Alternatively, for some of the root cause codes detected by the ground truth algorithm, one cannot find the corresponding feature-phase-step instruction in the root_cause_look_up_table.xlsx. For example, root cause codes 8299, 8499, 3717, 1791, 2756, 8399, 2672 are the codes shown in the feature 50-54 but not shown in the root_cause_look_up_table.xlsx. Does this indicate a mistake with the given data?
  • Some samples have more than one rejection code. These occur due to technical engineering rules. However, the nature of those codes are usually quite similar in terms of root cause. That being said, all samples should have at least one rejection code available in the root cause chart. That code can be used as the main root cause and the rest of the codes could be ignored (if not available in the chart)
  • 8. It is important to recognize that the ground-truth algorithm is just a sample code to demonstrate the general approach and aspects of modeling the data. The code was not meant to be a gold-standard program. The participants of this competition are expected to develop their own solutions based on the challenge description, and to recognize that the sample code is just an illustration of one particular model for the data.
  • 9. Some reject samples have an extra phase step, indicated by the 0-index at the end, whereas others do not. These are again part of the manufacturing process and the inherited behavior of the data that we have.
  • 10. The participants in this competition should only focus on the relevant phase IDs as discussed in both code and the documents, and exclude the redundant phases such as this 0-index (if exists). The latter should only be used for informational purposes.
  • 11. Some samples/components have been rejected by certain root cause codes, but their corresponding Phases/Steps are missing in the data. Why is this the case?
  • If a team finds a root cause that does not have a step or phase ID, then the team just needs to mention the name of the signal in their submission package. The reason why the step and/or phase ID are not present is because the test stand considers the signal to be corrupted.
  • 12. We found that when the rejection code in the rejected data samples refers to Phase 400, Step 500, the rejected data samples actually do not contain any data from that Phase and Step combination (i.e., Phase 400, Step 500 does not exist in those data samples). Is this an issue with the given rejection codes? Also, when our root cause analysis will be evaluated, will the test data (especially the rejection codes) have a similar type of issue?
  • In certain situations when a failure mode is sensed in a certain phase, the reject sample will have a shutdown step ID called "500" at the end of that phase. Although the rejection code would be associated with the expected phase ID, this specific step ID may sometimes be missed. Also, not all rejections are perfectly assigned by the ground-truth rule-based method, so if your team justified your findings (e.g., if you found the abnormal behavior to exist in another phase) then that would be a big plus for your team, and these cases would be evaluated and your team will receive extra credits.

Meeting Notes from Online FAQ Session (08/26/2022)

Aspects of the Data Challenge

this database directly comes from a real manufacturing process, we can expect noise in the data. Accordingly, there will not necessarily be a single "optimum" model or algorithm that can yield perfect results on the data.

In addition, as the data are generated from a complicated manufacturing process, there may be data labeled as "normal" but that are actually either abnormal or close to being abnormal. Furthermore, it would be extremely valuable if one could identify the truly abnormal data from the set of data labeled as "normal" via a root cause. However, any team that claims to have done so should provide a good justification if they wish to mention it in their presentation.

Originally, there were over 1000 features/signals recorded during the tests. However, this Data Challenge involves only a subset of all the possible features. Accordingly, some aspects of the process may be missed in this Data Challenge, which is acceptable.

Within each Excel file, there exists a variable called "Phase". There are multiple values in "Phase". These can be considered as different components or segments of one test or subtest. The teams can focus on individual phases if they believe them to be valuable. Also, they can ignore uncommon or redundant phases (e.g., phases with index 0), as those arise during the normal process. For example, Phase 65 is a temporary phase that can be ignored and removed from the data. We put those phases in the data that we provided in case the teams want to look at the entire signal together.

Questions and Answers

1. The file "input_output_relationship.csv" file seems to capture the relationships between several features. But the intention of the CSV file is not clear, nor is it clear how the teams can effectively utilize this information.

When the test is conducted in the manufacturing line, some signals stimulate the component, and other signals capture what occurs. However, some signals have mutual interactions. For example, an output signal can act as a signal for another input. Given these interactions in the file, the teams should consider temporary changes in the patterns of the output signal, and also study relevant input signals to that output signal. This is important to recognize because the test will not always be performed in an identical manner across time. For example, there could be a warm-up period that can have an effect. When looking at the responses of this component, the teams need to consider the behaviors of inputs as well. That is what is being captured in this CSV file.

2. Is the root cause code given in the Reject dataset based on the ground-truth algorithm?

The ground-truth label does not come from any machine learning algorithm or method. Instead, it comes from a test stand device that tests the hardware. The "ground-truth algorithm" was provided by the Ford team just so that the participating teams can better understand how they should develop a solution for this challenge.

3. Our team found a dataset that has a specific root cause code, but in actuality that sample/component did not preceed that step/phase. Why is this the case?

There exist thousands of rejection codes. We did not provide them so as to facilitate the Data Challenge. The participating teams can have up to five rejection codes for a component if it has been rejected. At the very least, a team should be able to find one of these five rejection codes in the root cause look-up table. If one cannot do so, they should inform us so that we can examing the root cause look-up table and fix any potential issue that may exist in it. Also, the teams can proceed to use the ones that are available after noting such potential issues in their submission.

4. The root cause look-up table contains very similar root cause codes that correspond to the same features of the phase. For example, 4247 and 4248. Why is this the case?

This is not an issue. This occuts because we have different versions of this component. In some submodels, the rejection code is slightly different, although the root cause is exactly the same. We wanted to establish a comprehensive root cause table so as to preemptively address potential minor differences in the root cause codes, and so we don't have to add submodel codes to the data. It is important to note that when the submitted algorithms are evaluated, we expect the teams to provide the name of the feature, the name of the phase, and the name of the step. It is also important to note that, during the test, if there is a significant issue with the component, then the test will be stopped. The phase and step will be recorded, but the full-length signal will not be captured completely.

5. Do you have general suggestions for the teams?

We won't emphasize one method or another. Having said that, unsupervised methods and algorithms could be more helpful, and so we suggest that teams consider unsupervised methods. Having said that, a supervised method that yields good evaluation metrics would also be acceptable. Regardless of whether a submission involves a supervised or unsupervised method, we expect the method to be able to detect the anomalous cases with a reasonable root cause.

6. Which is more important for the evaluation of a submission? a good accept/reject classification, or a good root cause analyis?

It depends on the context and the nature of the submission. Having said that, it would be best for a team if they provide a great confusion matrix and a good root cause analysis.

Organizers

  • Saman Alani-Azar, Ford Motor Company
  • Karunesh Arora, Ford Motor Company
  • Mohammad Babakmehr, Ford Motor Company
  • Xiaoyu Chen, University of Louisville, xiaoyu.chen@louisville.edu
  • Parinaz Farajiparvar, Ford Motor Company
  • Andrew Henry, Ford Motor Company
  • Milad Zafar Nezhad, Ford Motor Company
  • Arman Sabbaghi, Purdue University, sabbaghi@purdue.edu