Intel® Parallel Computing Center at Princeton University, Princeton Neuroscience Institute and Computer Science Dept.

Principal Investigators:

Princeton - Kai Li

Kai Li is a professor at Computer Science Department of Princeton University. He pioneered Distributed Shared Memory allowing shared-memory programming on clusters of computers, which one the ACM SIGOPS Hall of Fame Award and proposed user-level DMA which evolved into RDMA in the Infiniband standard. He led the PARSEC project which became the de factor benchmark for multicore processors. He recently co-led the ImageNet project and propelled the advancement of deep learning methods. He co-founded Data Domain, Inc. (now an EMC division) and led the innovation of deduplication storage system products to displace tape automation market. He is an ACM fellow, IEEE fellow and a member of National Academy of Engineering.

Princeton - Sebastian Seung

Sebastian Seung is Professor at the Princeton Neuroscience Institute and Department of Computer Science. Over the past decade, he has helped pioneer the new field of connectomics, developing new computational technologies for mapping the connections between neurons. His lab created EyeWire.org, a site that has recruited 200,000 players from 150 countries to a game to map neural connections. His book Connectome: How the Brain's Wiring Makes Us Who We Are was chosen by the Wall Street Journal as Top Ten Nonfiction of 2012. Before joining the Princeton faculty in 2014, Seung studied at Harvard University, worked at Bell Laboratories, and taught at the Massachusetts Institute of Technology.

Description:

Over the past few years, convolutional neural networks (rebranded as “deep learning”) have become the leading approach to big data. In order to perform well, deep learning requires large amount of training data and substantial amount of computing power for training and classification. Most deep learning implementations use GPUs instead of general-purpose CPUs because the conventional wisdom is that a GPU is an order-of-magnitude faster than a CPU for deep learning at a similar cost. As a result, the machine learning community as well as vendors have invested a lot of efforts to develop deep learning packages.

Intel® Xeon Phi™ coprocessors, based on Many-Integrated-Core (MIC) architecture, offer an alternative to GPUs for deep learning, because its peak floating-point performance and cost are on par with a GPU, while offering several advantages such as easy to program, binary compatible with host processor, and direct access to large host memory. However, it is still challenging to fully take advantage of the hardware capabilities. It requires running many threads in parallel (e.g. 240+ threads for 60+ cores), executing 16 floating point operations in parallel (for AVX-512), and reducing the working set for each thread (128KB L2 cache per thread).

This center will develop an efficient deep learning package for Intel® Xeon Phi™ coprocessor. The project is built on Sebastian Seung’s lab’s work on ZNN, a deep learning package (https://github.com/seung-lab/znn-release) based on two key concepts, both of which leverage the advantages of CPUs. (1) FFT-based convolution becomes more efficient when FFTs are cached and reused. This trades memory for speed, and is therefore appropriate for the larger working memory of CPUs. (2) Task parallelism on CPUs can make more efficient use of computing resources than SIMD parallelism on GPUs. Our preliminary results with ZNN are encouraging. We have shown that CPUs can be competitive with GPUs in speed of deep learning, for certain network architectures. Furthermore, an initial port to Intel® Xeon Phi™ coprocessor (Knights Corner) was done quickly, supporting the idea that CPU implementations are likely to incur relatively low development cost.

The proposed optimizations for the future Intel® Xeon Phi™ processor family include trading memory space for computation (transforming convolution networks to reusable FFTs), intelligently choosing direct vs. FFT-based convolution for each layer of the network, choosing the right flavor of task parallelism, intelligent tiling to optimize L2 cache performance, and careful data structure layouts to maximize the utilization of AVX-512 vector units. We will carefully evaluate the deep learning package with 2D ImageNet dataset, 3D electron microscopy image dataset, and 4D fMRI dataset. We plan to deploy the software package and datasets in the public domain.

Related websites:

http://www.cs.princeton.edu/~li/

Intel® Parallel Computing Center at Princeton University, Princeton Neuroscience Institute and Computer Science Dept.

Principal Investigators:

Description:

Related websites:

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112