Yann LeCunThis is a demo of "LeNet 1", the first convolutional network that could recognize handwritten digits with good speed and accuracy.
It was developed in early 1989 in the Adaptive System Research Department, headed by Larry Jackel, at Bell Labs in Holmdel, NJ.
This "real time" demo ran on a DSP card sitting in a 486 PC with a video camera and frame grabber card. The DSP card had an AT&T DSP32C chip, which was the first 32-bit floating-point DSP and could reach an amazing 12.5 million multiply-accumulate operations per second.
The network was trained using the SN environment (a Lisp-based neural net simulator, the predecessor of Lush, itself a kind of ancestor to Torch7, itself the ancestor of PyTorch). We wrote a kind of "compiler" in SN that produced a self-contained piece of C code that could run the network. The network weights were array literals inside the C source code.
The network architecture was a ConvNet with 2 layers of 5x5 convolution with stride 2, and two fully-connected layers on top. There were no separate pooling layer (it was too expensive). It had 9760 parameters and 64,660 connections.
Shortly after this demo was put together, we started working with a development group and a product group at NCR (then a subsidiary of AT&T). NCR soon deployed ATM machines that could read the numerical amounts on checks, initially in Europe and then in the US. The ConvNet was running on the DSP32C card sitting in a PC inside the ATM. Later, NCR deployed a similar system in large check reading machines that banks use in their back offices. At some point in the late 90's these machines were processing 10 to 20% of all the checks in the US.
The network shown in this demo is described in our NIPS 1989 paper "Handwritten digit recognition with a back-propagation network". https://direct.mit.edu/neco/article-abstract/1/4/541/5515/Backpropagation-Applied-to-Handwritten-Zip-Code
The check reading system is described in our 1998 Proc. IEEE paper "Gradient-Based Learning Applied to Document Recognition" and in various shorter papers before that.
Thanks to Larry Jackel for digitizing and editing the old VHS tape (and for holding the camera). There are cameo appearances by Donnie Henderson (who put together much of the demo) and Rich Howard, our lab director.
Convolutional Network Demo from 1989Yann LeCun2014-06-02 | This is a demo of "LeNet 1", the first convolutional network that could recognize handwritten digits with good speed and accuracy.
It was developed in early 1989 in the Adaptive System Research Department, headed by Larry Jackel, at Bell Labs in Holmdel, NJ.
This "real time" demo ran on a DSP card sitting in a 486 PC with a video camera and frame grabber card. The DSP card had an AT&T DSP32C chip, which was the first 32-bit floating-point DSP and could reach an amazing 12.5 million multiply-accumulate operations per second.
The network was trained using the SN environment (a Lisp-based neural net simulator, the predecessor of Lush, itself a kind of ancestor to Torch7, itself the ancestor of PyTorch). We wrote a kind of "compiler" in SN that produced a self-contained piece of C code that could run the network. The network weights were array literals inside the C source code.
The network architecture was a ConvNet with 2 layers of 5x5 convolution with stride 2, and two fully-connected layers on top. There were no separate pooling layer (it was too expensive). It had 9760 parameters and 64,660 connections.
Shortly after this demo was put together, we started working with a development group and a product group at NCR (then a subsidiary of AT&T). NCR soon deployed ATM machines that could read the numerical amounts on checks, initially in Europe and then in the US. The ConvNet was running on the DSP32C card sitting in a PC inside the ATM. Later, NCR deployed a similar system in large check reading machines that banks use in their back offices. At some point in the late 90's these machines were processing 10 to 20% of all the checks in the US.
The network shown in this demo is described in our NIPS 1989 paper "Handwritten digit recognition with a back-propagation network". https://direct.mit.edu/neco/article-abstract/1/4/541/5515/Backpropagation-Applied-to-Handwritten-Zip-Code
The check reading system is described in our 1998 Proc. IEEE paper "Gradient-Based Learning Applied to Document Recognition" and in various shorter papers before that.
Talk is here: youtu.be/DokLw1tILlwYann LeCun: A Path Towards Autonomous AI, Baidu 2022-02-22Yann LeCun2022-02-25 | Technical talk by Yann LeCun: "A Path Towards Autonomous AI" Hosted virtually by Baidu on 2022-02-22.
TL;DR: - autonomous AI requires predictive world models - world models must be able to perform multimodal predictions - solution: Joint Embedding Predictive Architecture (JEPA) - JEPA makes prediction in representation space, and can choose to ignore irrelevant or hard-to-predict details. - JEPA can be trained non-contrastively by (1) making the representations of input maximally informative, (2) making the representations predictable from each other, (3) regularizing latent variables necessary for prediction. - JEPAs can be stacked to make long-term/long-range predictions in more abstract representation spaces. - Hierarchical JEPAs can be used for hierarchical planning.
Topics: - How to get machines to learn like humans and animals? - Challenges in AI: self-supervised learning, reasoning, hierarchical planning - Learning models of the world - architecture for autonomous AI: world model, cost, actor, perception, configurator, short-term memory. - perception-action cycle: Mode-1 (reactive) and Mode-2 (planning) - Intrinsic Cost and Trainable Cost modules - building and training a world model - self-supervised learning (SSL) - Energy-Based Models: - contrastive and regularized training methods - EBM architectures: Joint Embedding Predictive Architecture (JEPA) - contrastive methods for training JEPA (bad) - regularized (non-contrastive) methods for training JEPA (good) - VICReg: Variance Invariance Covariance Regularization - hierarchical JEPA for world models. - hierarchical planning under uncertainty with hierarchical JEPAReal-Time Object Recognition with Convolutional Net (2008)Yann LeCun2022-01-11 | A demo from 2008 of a convolutional network performing object detection and recognition in real time on a laptop. This is a regular 2008 laptop (with no GPU) and a USB camera.
The ConvNet was trained on the NORB dataset, which has 5 categories (animal, human, car, truck, airplane), and 5 object instance per category (5 toy airplanes, etc) painted with a uniform color. There are many images of each object under different viewpoints, lighting, and backgrounds. https://cs.nyu.edu/~ylclab/data/norb-v1.0/Yanns 60th Birthday video from his padawans.Yann LeCun2021-07-09 | A video of current and former students and postdocs of Yann LeCun's wishing him a happy 60th birthday on July 8th 2020.
Put together by Aishwarya Kamath.Face detector demo with ConvNet (NEC Labs 2003)Yann LeCun2015-02-14 | Demo of the convolutional network face detector built at NEC Labs in 2003 by Rita Osadchy, Matt Miller and Yann LeCun.
M. Osadchy, Y. LeCun and M. Miller: Synergistic Face Detection and Pose Estimation with Energy-Based Models, Journal of Machine Learning Research, 8:1197-1215, May 2007
http://yann.lecun.com/exdb/publis/index.html#osadchy-07DrLIM: learning an embedding with Siamese nets (2006)Yann LeCun2014-06-21 | Raia Hadsell, Sumit Chopra and Yann LeCun: Dimensionality Reduction by Learning an Invariant Mapping, Proc. Computer Vision and Pattern Recognition Conference (CVPR'06), IEEE Press, 2006Off-Road robot navigation with convolutional networks(LAGR Project 2008))Yann LeCun2014-06-21 | Raia Hadsell, Pierre Sermanet, Marco Scoffier, Ayse Erkan, Koray Kavackuoglu, Urs Muller and Yann LeCun: Learning Long-Range Vision for Autonomous Off-Road Driving, Journal of Field Robotics, 26(2):120-144, February 2009Semantic Segmentation (8 categories)Yann LeCun2014-06-21 | Clement Farabet, Camille Couprie, Laurent Najman and Yann LeCun: Learning Hierarchical Features for Scene Labeling, IEEE Transactions on Pattern Analysis and Machine Intelligence, August 2013.Semantic Segmentation (33 categories)Yann LeCun2014-06-21 | Clement Farabet, Camille Couprie, Laurent Najman and Yann LeCun: Learning Hierarchical Features for Scene Labeling, IEEE Transactions on Pattern Analysis and Machine Intelligence, August 2013.Pedestrian detection with convolutional networks, part 1 (CVPR 2013)Yann LeCun2014-06-21 | Demo of ConvNet-based pedestrian detection as described in the paper: "Pedestrian Detection with Unsupervised Multi-stage Feature Learning" Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala, Yann LeCun; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 3626-3633
openaccess.thecvf.com/content_cvpr_2013/html/Sermanet_Pedestrian_Detection_with_2013_CVPR_paper.htmlPedestrian detection with convolutional networks, part 2 (CVPR 2013)Yann LeCun2014-06-21 | Demo of ConvNet-based pedestrian detection as described in the paper: "Pedestrian Detection with Unsupervised Multi-stage Feature Learning" Pierre Sermanet, Koray Kavukcuoglu, Soumith Chintala, Yann LeCun; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 3626-3633