英偉達(dá)(Nvidia)日前揭開其客制化 64位ARM核心處理器之神秘面紗,這款代號“丹佛(Denver)”的處理器開發(fā)案早在 2011年1月就首度曝光,采用微指令(microcode)架構(gòu),具備新一代執(zhí)行優(yōu)化功能(execution optimizer)。
該款Nvidia預(yù)定在今年推出的雙核心處理器是Tegra K1的升級,鎖定平板設(shè)備應(yīng)用;目前的32位版本Tegra K1目標(biāo)應(yīng)用是Android平臺產(chǎn)品,已進(jìn)駐了宏碁(Acer)的Chromebook、Goole的Project Tango平板設(shè)備、小米(Xiaomi)的MyPad,以及Nvidia自家的Shield平板設(shè)備。
Nvidia聲 稱,64位的Tegra K1將可讓移動設(shè)備具備PC等級的性能,支持游戲、企業(yè)應(yīng)用以及內(nèi)容創(chuàng)作等;根據(jù)該公司表示,基準(zhǔn)檢驗數(shù)據(jù)顯示Denver的效能與英特爾(Intel) Haswell處理器相當(dāng),且超越蘋果(Apple)的A7系列處理器10~25%。

Nvidia 展示的數(shù)據(jù)為x86架構(gòu)處理器與32位ARM處理器的性能測試比較
mbOesmc
不過Nvidia并沒有提供Denver與ARM的標(biāo)準(zhǔn)64位A57核心之性能比較;鎖定服務(wù)器與網(wǎng)絡(luò)設(shè)備應(yīng)用,AMD最近開始提供采用A57核心的處理器樣品,而Applied Micro也推出了客制化64位ARM核心的芯片樣品。
因為缺乏標(biāo)準(zhǔn)與客制化64位ARM核心處理器的性能測試比較數(shù)據(jù),Nvidia是否能藉Denver提升在移動設(shè)備應(yīng)用領(lǐng)域的地位還不清楚;在該領(lǐng)域,Nvidia還遠(yuǎn)遠(yuǎn)落后龍頭廠商高通(Qualcomm)。

Denver處理器核心架構(gòu)
mbOesmc
Denver 每頻率最多能執(zhí)行7個指令集,最高運(yùn)作頻率2.5GHz,內(nèi)涵128+64KB L1高速緩存,以及2MB的16路集合關(guān)聯(lián)(set associative) L2高速緩存。該處理器最新奇的部分則是取代全亂序執(zhí)行的優(yōu)化執(zhí)行功能,可處理包括緩存器重新命名、回路展開(unrolling loops)、斷開對false指令歸屬(breaking false code dependencies),以及移除未用的運(yùn)算等。
該優(yōu)化程序鏈結(jié)了相關(guān)的例行程序(routines),并應(yīng)用了128MB的主存儲器,在操作系統(tǒng)開機(jī)之前進(jìn)行安全分割(securely partitioned)。Nvidia架構(gòu)長Darrell Boggs在近日于美國舉行的Hot Chip大會上表示:“我們看到優(yōu)化程序可帶來兩倍以上的速度提升。”
Denver代表Nvidia使用協(xié)同處理器核心 (companion core)的時代已經(jīng)結(jié)束,這是該公司早期32位ARM處理器的優(yōu)勢所在,而ARM仍持續(xù)尋求混合搭配32位與64位核心的解決方案。其他 Denver的特點包括重復(fù)使用內(nèi)存管線(pipeline)以統(tǒng)整流量,以及可補(bǔ)償高速緩存遺漏的預(yù)先擷取(pre-fetch)功能。
本文授權(quán)編譯自EE Times,版權(quán)所有,謝絕轉(zhuǎn)載
編譯:Judith Cheng
參考英文原文:Nvidia Flexes Custom 64-Bit ARM,by Rick Merritt
{pagination}
Nvidia Flexes Custom 64-Bit ARM
Rick Merritt
CUPERTINO, Calif. — Nvidia has opened the hood on its custom 64-bit ARM core first announced in January 2011. "Denver" is an ARM processor that uses microcode to enable a novel execution optimizer.
Two cores will ship this year in an SoC that is an upgrade to Nvidia's Tegra K1, targeting tablets. The existing 32-bit chip targets Android and is used in an Acer Chromebook, Google's Project Tango tablet, Xaomi's MyPad, and Nvidia's own Shield tablet.
Nvidia clams the 64-bit Tegra K1 will sport PC-class performance in mobile systems for gaming, business apps, and content creation. Denver was nearly on par with an Intel Haswell processor and surpassed by 10 to 25% an Apple A7 series SoC in benchmarks Nvidia showed.
Nvidia only showed benchmarks against the x86 and 32-bit ARM SoCs.
The company did not give any comparisons with a standard A57 64-bit core from ARM. Targeting servers and networking gear, AMD just started to sample SoCs using the A57, and Applied Micro has started sampling its custom 64-bit ARM.
Until benchmarks against standard and custom 64-bit ARM SoCs emerge, it's not clear whether Denver will help Nvidia improve its position in mobile systems, where it significantly trails leader Qualcomm.
Denver can execute as many as seven instructions per clock, running up to a 2.5 GHz rate. It packs a 128+64 kbyte L1 cache and 2 Mbyte 16-way set associative L2 cache.
The most novel aspect of Denver is an optimized execution feature used as an alternative to a full out-of-order design. It handles a variety of optimizations such as renaming registers, unrolling loops, breaking false code dependencies, and removing unused computations.
The optimizer chains related routines and uses 128 Mbytes of main memory, securely partitioned before an operating system boots. "We see a 2x speed-up or better with optimized routines," said Darrell Boggs, chief architect on the project, speaking in a talk at the annual Hot Chips conference here.
The new core marks the end of Nvidia's use of a companion core, something it pioneered with its early 32-bit ARM SoCs. ARM continues to pursue the approach with mixed 32- and 64-bit cores.
Among other techniques, Denver can reuse memory pipelines for integer traffic, and it has a pre-fetch to compensate for cache misses.
Denver is a microcoded seven-wide superscalar 64-bit ARM.
責(zé)編:Quentin