Overkill: 5 Examples of the Best Type of Yield Loss

Overkill is when a perfectly good die fails testing on an ATE even though it is a functioning part. This is the best type of yield loss because the yield can be recovered almost immediately, without modifying the design or changing anything in the fab.

A perfectly good device that gets thrown away (overkill) is tragic for the following 4 reasons:

Die are thrown away that could otherwise be sold (duh!)
Yield loss summaries like bin data, which are assumed to be defect driven, are skewed
Parts submitted to Physical Failure Analysis (PFA) will result in No Defect Found because there never was a defect in the first place
Fully manufactured die are trashed even though they may contain some rare earth elements (I’m from Portland, so I have to be an environmentalist)

There are many of types of overkill. Let’s discuss some examples:

1) Bad ATPG Patterns

Believe it or not, ATPG is not perfect. The scan patterns generated by tools like TetraMAX and Tessent are only as good as the constraints that the designer accounted for. In the specific case of Transition Delay Fault (TDF) patterns the designer will use the results from Static Timing Analysis (STA) to mask certain paths from being tested on the ATE. These paths are non-functional paths and were never meant to be tested in a single clock cycle. The 2 main types are false-paths (paths that are never exercised functionally) and multi-cycle paths (paths that are exercised functionally but allow multiple cycles before capture). What happens if these paths aren’t masked? They result in yield loss, but only when the manufacturing process is in that particular corner. This means that the yield loss wont be obvious, it will only happen occasionally and therefore is easily missed. In order to eliminate these issues you must first detect these systematic failures (first failing pattern/cycle is a good way to detect it), then do run STA on the systematic failing flop at all corners, then take any failures from STA and feed them into the ATPG masking process. Not trivial, but at least you don’t have to modify the mask or alert the foundry about the problem.

2) Power Droop

Another issue that is becoming a hot topic is the inability to supply enough power to the chip. The reason this is especially important as an overkill topic is that scan testing a chip uses a lot more power than the functional operation of the chip. This means that a perfectly good chip could fail because its not getting enough power during test, but has plenty of power during functional operation. This is a very important topic, and there are many ways that people deal with this, so lets discuss a few.

Slow down the shift speed
Wait extra cycles between the last shift cycle and the launch cycle. This gives the chip type to quiesce before the launch event.
Run some dummy cycles right before the launch in order to prevent a sudden IR drop by priming the chip with activity. Mentor Graphics (former LogicVision) has something called BurstMode that attempts to simulate this effect.
Use low-power ATPG patterns that have less transitions during shift. The Synopsys capability for this is called Power-Aware Test.
Design For Power has many things that help with the testing problem, for example clock-gaters

3) Bad limits on parametric screens

Earlier on in my career I spent an unbelievable amount of time studying parametric screens applied at wafer sort. Parametric screens are things like IDDQ, minVDD, and process speed. In fact, my master’s thesis entitled ‘Minimum Testing Requirements to Screen Temperature Dependent Defects’ is almost entirely focused on parametric testing (including temperature as a parameter). There is a large body of evidence pointing out that parametric testing can reduce the number of test escapes and reliability escapes. The issue with overkill comes from the fact that these screens aren’t applied with surgical precision due to the fact that there is so many sources of variation in a chip. This means that while a good number of escapes are screened, there are also die screened that would never go on to be a quality or reliability escape. In order to reduce this type of overkill:

Properly characterize the screening limits on EVERY product
Regularly review the yield loss associated with the parametric screens. An increase in fallout for a screen either indicates elevated defect levels or elevated overkill, both of which need to be fixed.
Understand the types of defects that are begin targetted and consider if there is an alternative strategy to capture the defects. IDDQ was historically applied in order to capture stuck-at defects that were being missed because the test coverage was low. Nowadays with modern ATPG and the possibility of test point insertion it may be possible to capture these defects by driving up the actual stuck-at coverage, rather than hoping to catch defects with IDDQ.

4) Poor ATE or prober calibration

When I was working in the IC Design and Test Laboratory at Portland State University with Dr. Daasch we had to (somewhat) regularly calibrate the tester. Occasionally things get out of whack and cause the ATE to produce a fail state when it should be a passing state. This happens more for analog tests than it does for digital testing, but it is possible for the digital pins to go nuts. Typically the technicians on the testfloor will have a regular calibration schedule, but occasionally the ATE can become miscalibrated outside the schedule and impact the yield subtly. Note that since it’s a calibration issue rather than a hard fail, it really is a subtle issue most of the time.

5) Scraped wafers

Anytime a wafer is scrapped that isn’t zero yielding, we consider this overkill. The reason the wafer is scrapped is that the remaining yielding die generally have a much higher probability of containing test escapes or reliability escapes. What’s needed for this type of overkill is an extremely accurate scrap dispositioning strategy. For example being able to track individual softbins that are more likely to result in a test or reliability escape (for example TDF failures) is more accurate than just tracking hardbins (for example any scan fail). The most ideal case is to have a high confidence Pareto of the defect types (opens, bridges, resistive contacts, etc), along with a way to measure the likelihoods that defect will effect the Shipped Product Quality Level (SPQL). In order to get defect type level of data you would need to pursue a volume diagnosis strategy or else closely tracking and understanding the inline inspection results.

Latest Images

Trending Articles

Latest Images