Density and Disturbances: The NAND Block

This post is part of the “Learning Flash: For Storage Engineers” series.  Click here, if you missed the intro post.

I apologize for the entirely inexcusable delay between posts in this series.  I had every intention of getting this post completed earlier this year, but life and work have both been very busy.  Better late than never, I suppose…

In the previous post, I demonstrated how the transistor has been modified to enable non-volatile data retention.  I also showed how flash cells are programed, read and erased.  At first glance, flash seems pretty simple and, if density and cost were of no consequence, flash would have remained this simple.

If we take another look at the basic model of the flash cell, you will notice it has 2 dedicated terminals.  This sort of architecture, where each flash cell–each bit–is individually programable is referred to as NOR.   The granularity of addressability makes programming NOR much easier.  The dedicated terminals that provide per-cell addressability, however, also greatly increase the space, circuitry and costs required for NOR.

NOR Cell

While NOR flash is very flexible, the required size and materials make it cost-prohibitive as a disk alternative.  To replace disk, we needed to make some compromises to get cost and density down to comparable levels.  This is where NAND comes in.

The first compromise we find in NAND architectures is the loss of per-cell programming granularity.  Instead of flash cells each having their own dedicated source and drain terminals, the are instead connected in a serial string.  This is a similar concept to a string of christmas lights, where every light in the string participates in the circuit.  For a light in the middle of the string to receive current, all other lights in the string must also receive and pass current.  If any light does not pass current, the entire string fails.  Likewise, NAND flash is formed in long strings of transistors.  Programming and reading any one cell requires the entire string to participate.

NAND is designed this way because it greatly reduces the circuitry required for flash.  Instead of placing terminals for each cell, the terminals can be placed at the ends of strings containing 32 or 64 of cells.  This results in much greater density within the flash die and significantly reduced costs.  Further density is achieved by sharing control gates across rows of strings to form a block.  I’ll go into more detail on the NAND block later in this post.

NAND String

As described in my prior posts, a read operation is a basic conductivity test.  If a flash cell is programmed, the charge in the floating gate will inhibit conductivity, resulting in a “programmed” response.  Empty cells will not interfere, resulting in conductivity.  The challenge with NAND  is that a read operation on any one cell in the string requires the rest of the string to be conducting.  This is accomplished by placing higher, overdrive voltages on the control gates of other cells in the string in order to force conductivity regardless of their programmed state.

Voltage Thresholds are the key to flash operations.  A voltage threshold is the difference in voltage required to allow for the movement of electrons.  An empty cell has a lower voltage threshold (slightly negative, actually) than a programmed cell because the charged floating gate interference increases the voltage differential in a programmed cell.  Therefore, a cell can be tested(read) by using a control gate voltage that is slightly higher than that of an empty cell, but lower than a programmed cell.  Using this same principle, even a programmed cell can be forced to conduct, regardless of programmed state, by using a high enough voltage.

In the below diagram, the second cell in the string needs to be read.  In order to test this cell, the rest of the string must be forced into conductivity.  While 0V is applied to the control gate of the cell being read, 5V is applied to all other cells to force conductivity regardless of whether they are programmed.  Since the cell being read is not programmed, it conducts and results in a current through the string which is sensed and interpreted as a “1”.

 

NAND Read Empty

If the same read operation is performed on a cell that has been programmed a different result is measured.  All non-read cells are still forced to conduct with 5v.  The charged floating gate on the read cell is enough to inhibit the low voltage being applied to that control gate.  As a result, the read cell does not conduct.   Without a complete conductive circuit, the string does not pass current which is sensed and interpreted as a “0”.

Cell Read While Programmed

The NAND Block: Let’s get complicated!

Up until this point, I have been modeling flash architectures from the perspective of a single string.  While it is easier to learn about the fundamentals using a basic string model, now we must zoom out a bit to look at how NAND operates within the larger structure of a NAND block.

A  block , also known as an erase block, is a collection of flash cells organized into a planar grid.  In one direction, the string direction, cells are aligned to the semiconductor between source and drain terminals located on opposite ends of each string.  This is also referred to as the bit line.  Intersecting with the cells along the bit line are the shared control gates which form the second set of lines used in page referencing.  Pages are also referred to as word lines.  By sharing source and drain terminals in the string direction and sharing control gates in the word direction, the circuitry within a NAND block can be kept to a minimum.

Strings (bit lines) typically only contain 64 cells, whereas control gates can connect across thousands of cells based on the page size of the NAND (4k, 8k,16k, etc..).  Consequently, program and read operations occur at the page granularity.

NAND Block

Page Sizes typically increase over time as flash die’s become more dense.  A 64GB drive may use a page size of 4K, whereas a 512GB drive may use a page size of 8K.  Contrast this to HDD sector granularity, which has long been 512bytes.  This  continual shift in page size as new drives are released creates foundational architecture problems for storage array manufacturers that didn’t exist with HDDs.  I’ll discuss these in more detail in a later post.

In order to help orient you to the grid diagram vs the string diagram, below is the same read operation on a programmed cell that was diagramed previously in the string view.

NAND Block Read Programmed

The tight orientation of NAND makes it more susceptible to disturbances during read and program operations.   This is a fundamental challenge of NAND; the more tightly we pack the components for density, the greater the challenge in ensuring reliability.

In the above diagram, notice how the control gate voltages are applied across all strings, not just the selected string being read/tested.  Since all strings share the same control gates, this can’t be avoided.  Over time, the voltage increases shared with non-selected strings causes electrons to slowly attach to other non-programmed cells in those strings.  After 100’s of thousands of reads, weak charge build-up can cause un-programmed cells to hold some charge and begin to interfere with read accuracy.  To ensure this read disturbance never becomes a problem, reads per page must be counted and monitored.  Any blocks with page reads over safe limits should have their contents written to a new block so the old block can be erased.  Read disturbance is not permanent.  Erasing a block resets the cells with read disturbance charge build-up and restarts the read counter to zero.

Cell programming is also significantly more complex within a NAND block.  High voltages are required for NAND cell programming, which create high voltage differences throughout the tightly organized NAND block.  Even more so than read disturbance, high voltage differences during programming can cause electrons to accumulate on un-programmed, un-selected cells which later appear as a weak charge.  Just like read disturbance, program disturbance must be carefully accounted for during program operations.

NAND Block Cell Program

To reduce disturbance, the voltage differences within the NAND block have to be partially balanced during program operations.  Today this is most often done with a concept called self-boosted inhibit.  Self-boosting involves applying a moderate voltage to the unselected control gates throughout the NAND block while also electrically isolating the unselected bit lines. This creates a capacitive coupling that raises the channel voltage of the unselected strings and reduces the likelihood for program disturbance in those cells.

Any remaining read or program disturbances are identified and compensated for through ECC (Error Correcting Code).

Erase operations on a NAND block are relatively simple.  The semiconductor base (also known as the well) that supports the channel for each cell is actually shared across all cells.  As indicated before, the semiconductor base is where electrons migrate from during programming.  It is also where electrons migrate to during erase operations.  To erase the block, all control gates for the selected block are held at 0V while a strong positive voltage is applied to the entire well.  This creates a voltage differential that pulls the electrons from the floating gates and pushes them back into the well.

NAND Block Cell Erase

Erases are made simple because they target the entire block, not individual cells or pages.  This simplicity comes at a cost, however.  Unlike disk, the contents of a flash cell cannot be modified without first erasing that cell.  Since erases are only performed at a NAND block granularity (8+MB in size), this creates more challenges that storage controllers must compensate for when overwriting flash media.  I’ll discuss this more in a later post.

Moar Density!

Density improvements haven’t stopped at NAND alone.   The vast majority of flashed used in storage arrays today is Multi-Level Cell Flash.  The physical structure is nearly identical to traditional Single-Level-Cell Flash, but MLC media stores more than one bit per cell by varying the actual amount of charge within the cell.  Whereas SLC has 2 bit positions (charged and not-charged), MLC has 4 bit positions (charged, mostly-charged, mostly not-charged and not-charged).  MLC provides double the density but requires much more sophisticated programming logic and is far more sensitive to electrical variations than SLC.

MLC programming requires the cell to be programed and tested in multiple, incremental steps before achieving the desired charge level.  Read operations require multiple passes on the cell as well in order to accurately determine the cell’s charge.  This is why MLC media is slower than SLC for program and read operations.  It is also why it wears significantly faster than SLC.

Triple-Level Cell Flash is provides triple the density over SLC with 8 bit positions (voltage levels).  TLC requires even more interaction with the flash cells during programs and reads which makes TLC the slowest of the NAND media.  It also wears significantly faster than MLC making it suitable primarily for workloads with minimal overwrites such as archive.

The most significant breakthrough in recent NAND technology is 3 dimensional (or vertical) NAND.  Whereas NAND has historically been organized in a 2 dimensional, planar design; 3 dimensional NAND stacks flash cells vertically as well as horizontally.  It is the equivalent of replacing motels with skyscrapers.  This new dimension in NAND scaling allows for individual flash cells to actually become larger while still increasing the density of the NAND overall.  Larger cells mean components are less tightly packed which translates directly to greater reliability.  The 3-dimensional design also gives control-gates greater surface area with the floating gates which makes the flow of electrons into and out of the cells even faster.  The result is a denser, faster and more reliable NAND.  This technology is just now starting to appear in enterprise arrays and I’m excited to see how it further transforms the storage industry.

Creating a high performance, high density, reliable and low-cost storage medium requires a lot of intuitive engineering and is not without some compromises.  I hope this series helps underscore the amount of engineering effort involved in bringing this revolutionary technology to market at a price and size where everyone can benefit.

Comments and questions are always welcome!

 

Floating Gates and Electron Tunnels: Flash Retains Data

This post is part of the “Learning Flash: For Storage Engineers” series.  Click here, if you missed the intro post.

Finally, my friends, we are getting to the good stuff.  In this post, I’ll be discussing how the greatest limitation of the transistor (data-loss after power-loss) was overcome by introducing some additional components into the transistor.  I remind you that modern day flash cells are 10’s of nanometers in size and the mechanics by which they store and retrieve data occur at the sub-atomic level.  Very, very impressive when compared to how mechanical storage works.  Lets jump right in!

The first thing you should be aware of is that there are a few more components added to the transistor to enable non-volatility in NAND.  The cell structure of (Single-Level-Cell) SLC and (Multi-Level-Cell) MLC is identical, but I’ll only be describing SLC behavior below to keep things simple at first.  I’ll go into MLC behavior in a later post.

  • Floating Gate: Stores electrons while the cell is programmed.  Stored electrons then generate a counter electric field that partially screens the electric field produced by the control gate.
  • Tunnel Oxide: This is a weaker insulator that, given a strong enough voltage differential, can allow electrons to “tunnel” through.  Acts as a valve, only allowing electrons to pass into and out of the floating gate when desired.
  • Blocking Oxide: This isn’t necessarily new–it is a similar insulator like what was used in the transistor.  It is a strong insulator that prevents electrons from passing all the way through the floating gate and continuing on into the control gate.

Flash Cell

 

*Many current-generation flash cells actually have evolved into what is known as a Charge-Trap Flash design.  Charge-Trap Flash replaces both the floating gate and the tunnel oxide with a single high-k dilectric layer that is acts as both the weak insulator and the floating gate.  This approach allows greater density and reliability in flash cell designs.  Further advancements currently underway include moving flash from a planar architecture into a 3 dimensional design.  I will go into more detail on this later.

The first operation I want to show you is actually an un-programmed (empty) cell.  I’m starting with an empty cell, because it functions just like the transistor on the prior post.  The floating gate’s job is to provide interference between the control gate and the semiconductor when charged.  In an un-charged cell, there is no interference so control gate voltage causes the semiconductor to conduct just like a basic transistor.  This would represent a 1 in binary.

Cell Read While Empty

 

So how do we get electrons into the floating gate?  For NAND, this is done through a method known as Fowler Nordheim Tunneling.  It is complicated, but as a storage engineer, you don’t need to know how all sausage gets made.  Therefore, here is simple explanation.

During a program operation, a high voltage is applied to the control gate.  This creates a high voltage differential and strong electric field.  The voltage is sufficiently high to pull electrons from the semiconductor substrate, through the tunnel oxide (weak insulator), and into the floating gate.  The blocking oxide located between the floating gate and the control gate to prevent those electrons from passing straight through the floating gate and draining out the control gate.

Programming

 

When the program operation completes, and the voltage on the control gate is dropped, the electrons don’t have enough energy to pass back through the tunnel oxide.  They are trapped in the floating gate until a future erase operation.  Most importantly, while they are trapped within the floating gate, the generate their own electric field.  This field will later interfere with future conductivity tests to indicate the cell is programmed.

Retaining Charge

 

When programmed, the cell responds differently to read requests.  Like before, a voltage differential is applied between the source and drain and a small voltage is applied on the control gate.  If the cell were empty, the electric field generated by the control gate would be enough to cause the semiconductor to conduct.  However, the presence of electrons in the floating gate creates a counter-field (or screen), which partially blocks the electric field from the control gate.  The screened electric field is no longer sufficient to establish conductivity and the cell returns 0, indicating it is programmed.

Cell Read While Programmed

 

Do keep in mind that the floating gate screen is not absolute.  In transistors and NAND there is a concept of a voltage threshold, which is the control gate voltage required to cause conductivity.  By charging the floating gate, we increase the voltage threshold so the cell won’t conduct when a low voltage is applied.  But even with a programmed cell, we could increase the control gate voltage to a point where it would over-power the effects of the floating gate and cause the channel to conduct regardless.

MLC:  The amount of charge in a cell will also affect the threshold voltage required to allow conductivity.  Multi-Level-Cell (MLC) Flash, which allows 2 bits per cell instead of 1 bit, use this principle.  MLC cells can contain different levels of charge within the floating gate.  The amount of charge can be tested by stepping up the control voltage until conductivity occurs.  This threshold voltage is then decoded into a binary result. More on this later…

Finally, the cell can be erased by reversing the processed used to program it.  The control gate voltage is held at ground while a strong voltage is applied to the semiconductor substrate (also known as bulk terminal).  The voltage is sufficiently high enough to cause the electrons to jump back through the tunnel oxide and drain through the substrate leaving the floating gate empty.

Erasing

 

This post describes the basics of how flash cells are programmed, read and erased.  However, these functions are not as simple as they appear.  The story of NAND is one of 3 conflicting goals; density, cost and reliability.  In order to provide the low cost, highly dense and highly reliable NAND we put in our datacenters today there are a lot of problems that need to be worked out.

In the next post, I’ll discuss the many challenges of flash storage and how working through those challenges exposes new ones.  Thanks for reading!  Comments and questions are always welcome.

Magnets vs Electrons: The Foundation of Flash

This post is part of the “Learning Flash: For Storage Engineers” series.  Click here, if you missed the intro post.

If we are going to really understand flash, we need to start at the basics.  Back to the bits.  The storage and transmission of digital information is done through the use of bits, which represents the smallest addressable unit of information.  A bit represents either a 1 or a 0.  Combinations of 1s and 0s form binary code which makes up all the data in the digital universe.

In the datacenter, the storage and transmission of 1s and 0s is done through a variety of methods, but the two most common in the datacenter today are magnetic and electric.

Magnetic field storage uses sub-micrometer magnetic regions on a material to store information.  Grains within these regions are magnetized according to the binary data being stored.  Magnetic recording is non-volatile, which means it can retain information even without power.  Tape drives, hard drives and credit cards are all forms of magnetic storage

Electric field (Solid-State) storage uses transistors measuring a few 10s of nanometers to store information.  Transistors act like digital switches.  They use an electric field to manipulate a semiconductor.  The presence and strength of the electric field affects the conductivity of the semiconductor, which is then reported as binary code.  Solid-State data storage has historically been volatile, which means the state of the transistors, and associated information, cannot be retained without power.  CPUs, RAM and network equipment are common implementations of solid-state storage in the datacenter.  Modern day flash is a form of solid-state storage that is actually non-volatile.  I’ll discuss this in much more detail throughout this series.

Both magnetic and solid-state storage each have benefits and draw backs.  Because solid-state storage is built on silicon circuits, it doesn’t require any moving parts to access information.   This makes access times incredibly fast.   However, the density and intricacy of the embedded circuitry has historically made it a very expensive option when large amount of data storage is required.

Magnetic storage, on the other-hand, uses mechanical components to access specific regions of the medium for programming and reading.  Since circuitry isn’t required within the magnetic medium, data storage can both be very dense and inexpensive.  However, the mechanical parts cause access to be 1000’s of times slower than solid-state, which makes magnetic storage a poor choice for higher-performance needs.

The above qualities of both types of data storage are what has shaped the modern day datacenter hardware.  CPUs, RAM and network devices all require the fastest, lowest latency data processing.  They also only need to store in-flight data or small working sets.  Combined, this makes them best suited for solid-state (transistor-based) hardware.  Storage systems, however, store TBs and PBs of information.  They need that information to be retained, even through power failures.  While low latencies are preferred, the economics of magnetic storage have driven traditional arrays to use magnetic storage as a low-cost compromise.

Flash Changes Everything

Flash promises the best of both worlds;  data access speeds that are finally in-line with the rest of the silicon-based infrastructure, the non-volatility required for long-term storage, and densities that will soon surpass even the largest hard disks.  The economics are already better for T1 and in-line with tiered storage, especially for more I/O intensive environments.  End-to-end silicon infrastructures are finally possible.  Mechanical datacenters are living dinosaurs.

How Does Flash Work?

Not so fast!  First, we have to discuss the foundational technology of flash.  Flash memory technology is based off of transistors just like DRAM.  There are some significant differences, but let’s start with a basic MOSFET transistor commonly used in DRAM, and then I’ll describe how it has been modified for flash in the next post.  Storage is binary, so the goal of a transistor in RAM is to represent binary code. This nanoscopic, digital switch is turned on an off to represent 1’s and 0’s.

There are 4 main parts to a transistor:

  • Semiconductor: A silicon substrate that is doped with impurities.  Silicon, like glass, acts as a nature insulator.  However, the impurities embedded during the doping process give it conductive properties.  Through careful doping the silicon can be made to only conduct under certain conditions–such as the application of an electric field.
  • Source and Drain: Terminals at opposite ends of the semiconductor used to carry current to and away from the semiconductor.  These terminals are responsible for creating the voltage differential needed to test conductivity of the semiconductor.
  • Control Gate: A third terminal situated on-top of the semiconductor.  This terminal is responsible for generating the electric field which causes the semiconductor to conduct.
  • Insulator: A layer of non-conductive material  that insulates the Control Gate from the semiconductor.  This prevents electrons from leaking into the control gate while the control gate has a voltage applied.  The gate should only generate a field, not be a conductive pathway.

Transistor Components

In a resting state, the semiconductor behaves like an insulator.  The distribution of electrons in the semiconductor are too far apart for conductivity.  If a small voltage differential were applied between the source and drain terminals, current would not pass through the semiconductor.  This would represent a 0 in binary.  The digital switch is off.

Transistor Off

When a voltage is applied to the control gate, an electric field is generated that attracts electrons toward the gate.  Electrons then concentrate near the top of the semiconductor–a region known as the Channel.  The concentration of electrons is enough to create a conductive pathway between the source and drain terminals.  The Channel carries current through the transistor which then represents a 1 in binary.  The digital switch is on.

Transistor ON

I want to stop right here for a minute to reinforce how friggin awesome this is.  Modern day transistors are 10s of nanometers in size.  A strand of hair is about 100,000 nanometers in diameter and an atom of Gold is about 0.33 nanometers.  A single modern processor can have over 4 Billion transistors.  This should give you some perspective on the feat of engineering modern computing really is.

Nano-scale transistor technology is manufactured in multi-billion dollar semiconductor plants through a process called nano-lithography.  These plants use nano-focused patterns of light to manipulate chemicals on the surface of silicon wafers.  The chemical patterns control how further chemical baths engrave the wafer and allow for the transistor components to be constructed.  Since a single particle of dust can span and destroy thousands of transistors during the manufacturing process, specialized clean rooms are used to keep the air as clean as possible.

Transistors themselves have one major flaw when it comes to storing information.  They require constant power.  The control gates on transistors must be powered in order to retain the 1 or 0 state they represent.  If power were cut to a DRAM bank, for example, all of the transistors would reset to a 0 state–effectively erasing the contents.  This is where flash memory comes in.

In the next post, I’ll discuss how flash memory overcomes the volatility of DRAM with a modified transistor design.  I hope you are enjoying the series, comments or corrections are always welcome!

Next Post: Floating Gates and Electron Tunnels: Flash Retains Data

Latency vs IOPs: Why Flash Matters

This post is part of the “Learning Flash: For Storage Engineers” series.  Click here, if you missed the intro post.

For us storage engineers, more has almost always been the answer to storage performance problems.  For any given performance complaint, it is often solved by adding more cache, more ports, more controllers and, especially, more . . . → Read More: Latency vs IOPs: Why Flash Matters

Learning Flash: For Storage Engineers

If you are a storage engineer and have a pulse, then you cannot deny we are in the midst of the single most defining era in data storage history.  Never before have we witnessed a quantum leap of such magnitude in storage performance, efficiency and density.  The Flash Revolution is NOW!

The great thing . . . → Read More: Learning Flash: For Storage Engineers