J.J. Woo & J.W. Kim - Delta FTL

From OpenSSDWiki
Jump to: navigation, search

Contents

Project Information

  • Project title: DFTL(Demand based FTL)
  • Team members: Jungjae Woo, Jinwon Kim


Goals

  • The goals of DFTL is to reduce the amount of DRAM by using nand and improve performance with SRAM cache.


Motivation

  • The page-mapping ftl has more mapping informations than block-mapping ftls, so we have optimize DRAM memory usage.


Design and Implementation

Overall architecture

  • Step1 : DFTL uses some nand area to store mapping information which called translation page.
  • Step2 : DFTL uses CMT(Cache Mapping Table) to improve performance.
  • Step3 : DFTL uses GTD(Global Translation Directory) to store translation page address.

D FTL STEP123.PNG

  • For details, see policies.

D FTL Overall.PNG

Main data structures

In DRAM

D FTL DRAM.PNG

  • GC BUF : Store lpn lists of victim block when GC
  • VCOUNT : Valid page count of each vblock, used in GC
  • TRANS BUF : Store translation page when searching cache misses or one entry in cache evicts

In SRAM

D FTL SRAM.PNG

  • In CMT
    • lpn : Cached logical page number
    • d  : Dirty bit. Represent whether vpn has changed after cached from translation page.
    • vpn : Physical page number of cached lpn
    • sc  : Second chance bit
  • In GTD
    • ppn : Physical page number of physical block


Functions

  • ftl_write & ftl_read are simply shown below:

D FTL readwrite.PNG

  • Address translation progress is one of the most important part of DFTL. For example, get_vpn is shown below:

D FTL getvpn.PNG

  • victim entry by second chance algorithm in CMT is evicted from CMT. This is shown below:

D FTL evict.PNG


policies

In write

  • Reduced original data load
    • When write, partial-page write needs nand_page_read of old page because of left hole and right hole.
    • But full-page write doesn't need nand_page read of old page.


In CMT cache

  • Dirty bit
    • We use Dirty bit, which shows that vpn is modified from translation mappings in nand.
    • When write, old vpn replaced with new vpn, the dirty bit is set.
    • When evict, we see dirty bit of victim mapping. if bit is set, update GTD and evict victim, else only evict victim. This reduces amounts of nand page writes.
    • We uses dirty bit as MSB of vpn - the maximum vpn uses only 21bits(64G / 32K = 2M), we can use MSB of vpn(32bits).
  • Translation page maximum update
    • When one entry of CMT evicts, if this entry is dirty, DFTL loads one page of translation mapping from nand, updates this mapping, and write to nand.
    • Because of reducing nand page writes for translation pages, when one entry evicts, we have maximum update for loaded translation page.
    • For each entry in CMT, if one entry belongs to loaded translation mapping page and it's dirty bit is set, DFTL updates mappings in the translation mapping page, and clean dirty bit.
  • Victim selection
    • In victim selection, original DFTL paper uses SLRU -> We uses Second Chance


In nand

  • Dividing area
    • Divide nand space into page mapping table area(block 2, 3) and data(user) area else, so managing nand mapping is simple.


Not implemented

  • Power Off Recovery
  • Run-time Badblock Management
  • Lazy copying & batch updates - Though in our effort to fix error of it, it makes bugs.


Evaluation

Workloads

  • trace program - Web32G_NTFS Trace
  • Ubuntu install & upgrade
  • Kernel version 3.9.5 compile


Evaluation methodology

  • test with ftl_test
    • All pass
  • Web32GNTFS trace
    • Handles all requests normally(*but no way to figure out read/write right)
  • Install Ubuntu 12.04 i386
    • Can Boot with Jasmine Board, Install Upgrades, Internet Surf, etc...
  • Kernel version 3.9.5 compile
    • Compile success with no error


Test Environment

  • jasmine.h options
    • Turn Assertion, Uart options on
    • User Space 8GB, Spare Space 384MB
      • define VBLK_PER_BANK (256 + 12) -> DFTL has trouble with remapping bad block over the vblock boundary because DFTL does not use block mapping scheme.
      • define SPARE_VBLKS_PER_BANK (12)
      • define NUM_LSECTORS (16777216)
  • Test computer S/W options(ubuntu)
    • Release 10.04(lucid)
    • Kernel : Linux 2.6.32-29-generic
  • Test computer H/W options
    • Memory : 992.0Mib
    • Processor 0 : Intel(R) Pentium(R) Dual CPU E2180 @ 2.00 GHz
    • Processor 1 : Intel(R) Pentium(R) Dual CPU E2180 @ 2.00 GHz
  • Gcc Version
    • 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
  • Connecion
    • SATA 2.0


Test outputs

  • Function Calls
    • ftl_read, ftl_write - ftl_read, ftl_write
    • Nand Page Read/Programs - data_read(include partial read for partial page program), data_program
    • get/set vpn match with lpn - get_vpn, set_vpn
    • CMT Read/Write Hit - cache hit on read, cache hit on write
    • Translation table Read/Programs From Nand - mapping_read, mapping_program
    • Garbage Collections - garbage_collection
    • Nand Block Erase - erase


Results

  • Benchmark on Ubuntu 10.04 i386

Dftl-benchmark.png


  • Tracing Web32G
    • Elapsed time
  • CMT : 256

D FTL web32g 256.PNG

  • CMT : 512

D FTL web32g 512.PNG

  • CMT : 1024

D FTL web32g 1024.PNG


  • Counts of functions

D FTL NTFS1.PNG

D FTL NTFS2.PNG

(Garbage_collection and Translation block erase is too small to show first image, so it shows in second image.)

  • Write cache hit ratio :
    • 256 : 156008 / 278500 = 56.0%
    • 512 : 168303 / 278592 = 60.4%
    • 1024 : 179861 / 278538 = 64.6%
  • Read cache hit ratio :
    • 256 : 23796 / 108231 = 22.0%
    • 512 : 26896 / 108349 = 24.8%
    • 1024 : 24080 / 104767 = 23.0%
  • Program amplify ratio :
    • 256 : (264530 + 13684) / 264530 = 105.2%
    • 512 : (264622 + 7368) / 264622 = 102.8%
    • 1024 : (264568 + 3860) / 264568 = 101.5%


  • Install Ubuntu 12.04 i386(install ubuntu ~ reboot, no gc occured)

D FTL Ubuntu1.PNG

D FTL Ubuntu2.PNG

  • Result of CMT size 1024 is omitted because of abnormal result


  • Elapsed time
    • 256 : 8m 2s
    • 512 : 7m 35s
  • Write cache hit ratio :
    • 256 : 15633 / 116143 = 13.5%
    • 512 : 15018 / 115160 = 13.2%
  • Read cache hit ratio :
    • 256 : 16275 / 32965 = 49.4%
    • 512 : 17735 / 34116 = 52.0%
  • Program amplify ratio :
    • 256 : (116143 + 4893) / 116143 = 104.2%
    • 512 : (115160 + 3207) / 115160 = 102.8%
  • Linux kernel 3.9.5 compile(CMT size : 512)

D FTL Kernel1.PNG

D FTL Kernel2.PNG

  • Elapsed time : 1h 12m
  • Write cache hit ratio :
    • 43409 / 219099 = 19.8%
  • Read cache hit ratio :
    • 71517 / 196741 = 36.4%
  • Program amplify ratio :
    • (219099 + 6972) / 219009 = 103.0%


  • IO Meter Result

DFTL Max reponse time.JPG


Conclusion

  • By increasing CMT size
    • Cache hit also increases.
    • Translation page read/write decreases.
    • Elapsed time improves.


References

  • A. Gupta, Y. Kim, and B. Urgaonkar, "DFTL: A Flash Translation Layer Employing Demand-based Selective Caching of Page-level Address Mappings," Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2009.
  • A. Gupta, Y. Kim, and B. Urgaonkar, "DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings," In Proc. of the Architectural Support for Programming Languages and Operating Systems, pp. 229–240, 2009.




Whos here now:   Members 0   Guests 1   Bots & Crawlers 0
 
Personal tools