1. Bibliography

Arjovsky et al., 2017

Arjovsky, M., Chintala, S., & Bottou, L. (2017 , January). Wasserstein GAN. arXiv:1701.07875 [cs, stat]. arXiv:1701.07875

Atito et al., 2021

Atito, S., Awais, M., & Kittler, J. (2021 , November). SiT: Self-supervised vIsion Transformer. arXiv:2104.03602 [cs]. arXiv:2104.03602

Ba et al., 2016

Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016 , July). Layer Normalization. arXiv:1607.06450 [cs, stat]. arXiv:1607.06450

Badrinarayanan et al., 2016

Badrinarayanan, V., Kendall, A., & Cipolla, R. (2016 , October). SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. arXiv:1511.00561 [cs]. arXiv:1511.00561

Bahdanau et al., 2016

Bahdanau, D., Cho, K., & Bengio, Y. (2016 , May). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:1409.0473 [cs, stat]. arXiv:1409.0473

Bi & Poo, 2001

Bi, G.-q., & Poo, M.-m. (2001). Synaptic Modification by Correlated Activity: Hebb's Postulate Revisited. Annual Review of Neuroscience, 24(1), 139–166. doi:10.1146/annurev.neuro.24.1.139

Bienenstock et al., 1982

Bienenstock, E. L., Cooper, L. N., & Munro, P. W. (1982 , January). Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J Neurosci, 2(1), 32–48.

Binder et al., 2016

Binder, A., Montavon, G., Bach, S., Müller, K.-R., & Samek, W. (2016 , April). Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers. arXiv:1604.00825 [cs]. arXiv:1604.00825

Bojarski et al., 2016

Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., … Zieba, K. (2016 , April). End to End Learning for Self-Driving Cars. arXiv:1604.07316 [cs]. arXiv:1604.07316

Brette & Gerstner, 2005

Brette, R., & Gerstner, W. (2005 , November). Adaptive Exponential Integrate-and-Fire Model as an Effective Description of Neuronal Activity. Journal of Neurophysiology, 94(5), 3637–3642. doi:10.1152/jn.00686.2005

Caron et al., 2021

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., & Joulin, A. (2021 , May). Emerging Properties in Self-Supervised Vision Transformers. arXiv:2104.14294 [cs]. arXiv:2104.14294

Chollet, 2017a

Chollet, F. (2017). Deep Learning with Python. Manning publications.

Chollet, 2017b

Chollet, F. (2017 , April). Xception: Deep Learning with Depthwise Separable Convolutions. arXiv:1610.02357 [cs]. arXiv:1610.02357

Chung et al., 2014

Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014 , December). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv:1412.3555 [cs]. arXiv:1412.3555

Clopath et al., 2010

Clopath, C., Büsing, L., Vasilaki, E., & Gerstner, W. (2010 , March). Connectivity reflects coding: a model of voltage-based STDP with homeostasis. Nature Neuroscience, 13(3), 344–352. doi:10.1038/nn.2479

Dayan & Abbott, 2001

Dayan, P., & Abbott, L. F. (2001 , September). Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. The MIT Press.

Demircigil et al., 2017

Demircigil, M., Heusel, J., Löwe, M., Upgang, S., & Vermet, F. (2017 , July). On a model of associative memory with huge storage capacity. Journal of Statistical Physics, 168(2), 288–299. arXiv:1702.01929, doi:10.1007/s10955-017-1806-y

Devlin et al., 2019

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019 , May). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs]. arXiv:1810.04805

Dosovitskiy et al., 2021

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., … Houlsby, N. (2021 , June). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv:2010.11929 [cs]. arXiv:2010.11929

Fukushima, 1980

Fukushima, K. (1980 , April). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202. doi:10.1007/BF00344251

Gers & Schmidhuber, 2000

Gers, F. A., & Schmidhuber, J. (2000 , July). Recurrent nets that time and count. Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium (pp. 189–194 vol.3). doi:10.1109/IJCNN.2000.861302

Gerstner et al., 2014

Gerstner, W., Kistler, W., Naud, R., & Paninski, L. (2014). Neuronal Dynamics - a Neuroscience Textbook. Cambridge University Press.

Gerstner & Kistler, 2002

Gerstner, W., & Kistler, W. M. (2002 , December). Mathematical formulations of Hebbian learning. Biological Cybernetics, 87(5), 404–415. doi:10.1007/s00422-002-0353-y

Girshick, 2015

Girshick, R. (2015 , September). Fast R-CNN. arXiv:1504.08083 [cs]. arXiv:1504.08083

Girshick et al., 2014

Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014 , October). Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv:1311.2524 [cs]. arXiv:1311.2524

Glorot & Bengio, 2010

Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. AISTATS (p. 8).

Goodfellow et al., 2016

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.

Goodfellow et al., 2015

Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015 , March). Explaining and Harnessing Adversarial Examples. arXiv:1412.6572 [cs, stat]. arXiv:1412.6572

Goodfellow et al., 2014

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … Bengio, Y. (2014 , June). Generative Adversarial Networks. arXiv:1406.2661 [cs]. arXiv:1406.2661

Gou et al., 2020

Gou, J., Yu, B., Maybank, S. J., & Tao, D. (2020 , June). Knowledge Distillation: A Survey. arXiv:2006.05525 [cs, stat]. arXiv:2006.05525

Guo et al., 2017

Guo, X., Liu, X., Zhu, E., & Yin, J. (2017). Liu, D., Xie, S., Li, Y., Zhao, D., & El-Alfy, E.-S. M. (Eds.). Deep Clustering with Convolutional Autoencoders. Neural Information Processing (pp. 373–382). Cham: Springer International Publishing. doi:10.1007/978-3-319-70096-0_39

Gupta et al., 2014

Gupta, S., Girshick, R., Arbeláez, P., & Malik, J. (2014 , July). Learning Rich Features from RGB-D Images for Object Detection and Segmentation. arXiv:1407.5736 [cs]. arXiv:1407.5736

Hannun et al., 2014

Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., … Ng, A. Y. (2014 , December). Deep Speech: Scaling up end-to-end speech recognition. arXiv:1412.5567 [cs]. arXiv:1412.5567

Haykin, 2009

Haykin, S. S. (2009). Neural Networks and Learning Machines, 3rd Edition. Pearson.

He et al., 2018

He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2018 , January). Mask R-CNN. arXiv:1703.06870 [cs]. arXiv:1703.06870

He et al., 2015a

He, K., Zhang, X., Ren, S., & Sun, J. (2015 , December). Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs]. arXiv:1512.03385

He et al., 2015b

He, K., Zhang, X., Ren, S., & Sun, J. (2015 , February). Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. arXiv:1502.01852 [cs]. arXiv:1502.01852

Higgins et al., 2016

Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., … Lerchner, A. (2016 , November). Beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. ICLR 2017.

Hinton & Salakhutdinov, 2006

Hinton, G. E., & Salakhutdinov, R. R. (2006 , July). Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786), 504–507. doi:10.1126/science.1127647

Hinton et al., 2015

Hinton, G., Vinyals, O., & Dean, J. (2015 , March). Distilling the Knowledge in a Neural Network. arXiv:1503.02531 [cs, stat]. arXiv:1503.02531

Hinton et al., 2006

Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006 , July). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554. doi:10.1162/neco.2006.18.7.1527

Hochreiter & Schmidhuber, 1997

Hochreiter, S., & Schmidhuber, J. (1997 , November). Long short-term memory. Neural computation, 9(8), 1735–80.

Hochreiter, 1991

Hochreiter, S. (1991). Untersuchungen Zu Dynamischen Neuronalen Netzen (Doctoral dissertation). TU München.

Hopfield, 1982

Hopfield, J. J. (1982 , April). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, 79(8), 2554–2558. doi:10.1073/pnas.79.8.2554

Hopfield et al., 1983

Hopfield, J. J., Feinstein, D. I., & Palmer, R. G. (1983 , July). `Unlearning' has a stabilizing effect in collective memories. Nature, 304(5922), 158–159. doi:10.1038/304158a0

Huang et al., 2018

Huang, G., Liu, Z., van der Maaten, L., & Weinberger, K. Q. (2018 , January). Densely Connected Convolutional Networks. arXiv:1608.06993 [cs]. arXiv:1608.06993

Intrator & Cooper, 1992

Intrator, N., & Cooper, L. N. (1992 , January). Objective function formulation of the BCM theory of visual cortical plasticity: Statistical connections, stability conditions. Neural Networks, 5(1), 3–17. doi:10.1016/S0893-6080(05)80003-6

Ioffe & Szegedy, 2015

Ioffe, S., & Szegedy, C. (2015 , February). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv:1502.03167 [cs.LG]. arXiv:1502.03167

Isola et al., 2018

Isola, P., Zhu, J.-Y., Zhou, T., & Efros, A. A. (2018 , November). Image-to-Image Translation with Conditional Adversarial Networks. arXiv:1611.07004 [cs]. arXiv:1611.07004

Izhikevich, 2003

Izhikevich, E. M. (2003 , January). Simple model of spiking neurons. IEEE transactions on neural networks, 14(6), 1569–72. doi:10.1109/TNN.2003.820440

Jaeger, 2001

Jaeger, H. (2001). The "Echo State" Approach to Analysing and Training Recurrent Neural Networks. Jacobs Universität Bremen.

Joshi & Triesch, 2009

Joshi, P., & Triesch, J. (2009 , June). Rules for information maximization in spiking neurons using intrinsic plasticity. 2009 International Joint Conference on Neural Networks (pp. 1456–1461). doi:10.1109/IJCNN.2009.5178625

Karras et al., 2020

Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020 , March). Analyzing and Improving the Image Quality of StyleGAN. arXiv:1912.04958 [cs, eess, stat]. arXiv:1912.04958

Kendall et al., 2016

Kendall, A., Grimes, M., & Cipolla, R. (2016 , February). PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization. arXiv:1505.07427 [cs]. arXiv:1505.07427

Kheradpisheh et al., 2018

Kheradpisheh, S. R., Ganjtabesh, M., Thorpe, S. J., & Masquelier, T. (2018 , March). STDP-based spiking deep convolutional neural networks for object recognition. Neural Networks, 99, 56–67. doi:10.1016/j.neunet.2017.12.005

Kim, 2014

Kim, Y. (2014 , September). Convolutional Neural Networks for Sentence Classification. arXiv:1408.5882 [cs]. arXiv:1408.5882

Kingma & Ba, 2014

Kingma, D., & Ba, J. (2014). Adam: A Method for Stochastic Optimization. Proc. ICLR (pp. 1–13). doi:10.1145/1830483.1830503

Kingma & Welling, 2013

Kingma, D. P., & Welling, M. (2013 , December). Auto-Encoding Variational Bayes. arXiv:1312.6114 [cs]. arXiv:1312.6114

Krizhevsky et al., 2012

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems (NIPS).

Krotov & Hopfield, 2016

Krotov, D., & Hopfield, J. J. (2016 , September). Dense Associative Memory for Pattern Recognition. arXiv:1606.01164 [cond-mat, q-bio, stat]. arXiv:1606.01164

Laje & Buonomano, 2013

Laje, R., & Buonomano, D. V. (2013 , July). Robust timing and motor patterns by taming chaos in recurrent neural networks. Nature neuroscience, 16(7), 925–33. doi:10.1038/nn.3405

Lapuschkin et al., 2019

Lapuschkin, S., Wäldchen, S., Binder, A., Montavon, G., Samek, W., & Müller, K.-R. (2019 , March). Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1), 1096. doi:10.1038/s41467-019-08987-4

Le, 2013

Le, Q. V. (2013 , May). Building high-level features using large scale unsupervised learning. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 8595–8598). Vancouver, BC, Canada: IEEE. doi:10.1109/ICASSP.2013.6639343

LeCun et al., 1998

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86(11), 2278–2324. doi:10.1109/5.726791

Li et al., 2018

Li, H., Xu, Z., Taylor, G., Studer, C., & Goldstein, T. (2018 , November). Visualizing the Loss Landscape of Neural Nets. arXiv:1712.09913 [cs, stat]. arXiv:1712.09913

Lillicrap et al., 2016

Lillicrap, T. P., Cownden, D., Tweed, D. B., & Akerman, C. J. (2016 , November). Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications, 7(1), 1–10. doi:10.1038/ncomms13276

Linnainmaa, 1970

Linnainmaa, S. (1970). The Representation of the Cumulative Rounding Error of an Algorithm as a Taylor Expansion of the Local Rounding Errors (Master's thesis). Univ. Helsinki.

Liu et al., 2016

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). SSD: Single Shot MultiBox Detector. arXiv:1512.02325 [cs], 9905, 21–37. arXiv:1512.02325, doi:10.1007/978-3-319-46448-0_2

Maas et al., 2013

Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier Nonlinearities Improve Neural Network Acoustic Models. ICML (p. 6).

Maass et al., 2002

Maass, W., Natschläger, T., & Markram, H. (2002 , November). Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural computation, 14(11), 2531–60. doi:10.1162/089976602760407955

Malinowski et al., 2015

Malinowski, M., Rohrbach, M., & Fritz, M. (2015 , October). Ask Your Neurons: A Neural-based Approach to Answering Questions about Images. arXiv:1505.01121 [cs]. arXiv:1505.01121

McEliece et al., 1987

McEliece, R., Posner, E., Rodemich, E., & Venkatesh, S. (1987 , July). The capacity of the Hopfield associative memory. IEEE Transactions on Information Theory, 33(4), 461–482. doi:10.1109/TIT.1987.1057328

McInnes et al., 2020

McInnes, L., Healy, J., & Melville, J. (2020 , September). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426 [cs, stat]. arXiv:1802.03426

Mikolov et al., 2013

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013 , September). Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs]. arXiv:1301.3781

Mirza & Osindero, 2014

Mirza, M., & Osindero, S. (2014 , November). Conditional Generative Adversarial Nets. arXiv:1411.1784 [cs]. arXiv:1411.1784

Nowozin et al., 2016

Nowozin, S., Cseke, B., & Tomioka, R. (2016 , June). F-GAN: Training Generative Neural Samplers using Variational Divergence Minimization. arXiv:1606.00709 [cs, stat]. arXiv:1606.00709

Oja, 1982

Oja, E. (1982 , January). A simplified neuron model as a principal component analyzer. Journal of mathematical biology, 15(3), 267–73.

Olshausen & Field, 1997

Olshausen, B. A., & Field, D. J. (1997 , December). Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research, 37(23), 3311–3325. doi:10.1016/S0042-6989(97)00169-7

Radford et al., 2015

Radford, A., Metz, L., & Chintala, S. (2015 , November). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:1511.06434 [cs]. arXiv:1511.06434

Ramsauer et al., 2020

Ramsauer, H., Schäfl, B., Lehner, J., Seidl, P., Widrich, M., Adler, T., … Hochreiter, S. (2020 , December). Hopfield Networks is All You Need. arXiv:2008.02217 [cs, stat]. arXiv:2008.02217

Razavi et al., 2019

Razavi, A., van den Oord, A., & Vinyals, O. (2019 , June). Generating Diverse High-Fidelity Images with VQ-VAE-2. arXiv:1906.00446 [cs, stat]. arXiv:1906.00446

Redmon et al., 2016

Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016 , May). You Only Look Once: Unified, Real-Time Object Detection. arXiv:1506.02640 [cs]. arXiv:1506.02640

Redmon & Farhadi, 2016

Redmon, J., & Farhadi, A. (2016 , December). YOLO9000: Better, Faster, Stronger. arXiv:1612.08242 [cs]. arXiv:1612.08242

Redmon & Farhadi, 2018

Redmon, J., & Farhadi, A. (2018 , April). YOLOv3: An Incremental Improvement. arXiv:1804.02767 [cs]. arXiv:1804.02767

Reed et al., 2016

Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016 , June). Generative Adversarial Text to Image Synthesis. arXiv:1605.05396 [cs]. arXiv:1605.05396

Ren et al., 2016

Ren, S., He, K., Girshick, R., & Sun, J. (2016 , January). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv:1506.01497 [cs]. arXiv:1506.01497

Ronneberger et al., 2015

Ronneberger, O., Fischer, P., & Brox, T. (2015 , May). U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv:1505.04597 [cs]. arXiv:1505.04597

Rossant et al., 2011

Rossant, C., Goodman, D. F. M., Fontaine, B., Platkiewicz, J., Magnusson, A. K., & Brette, R. (2011). Fitting Neuron Models to Spike Trains. Frontiers in Neuroscience, 5. doi:10.3389/fnins.2011.00009

Rumelhart et al., 1986

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986 , October). Learning representations by back-propagating errors. Nature, 323(6088), 533–536. doi:10.1038/323533a0

Salimans et al., 2016

Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016 , June). Improved Techniques for Training GANs. arXiv:1606.03498 [cs]. arXiv:1606.03498

Simoncelli & Olshausen, 2001

Simoncelli, E. P., & Olshausen, B. A. (2001 , March). Natural Image Statistics and Neural Representation. Annual Review of Neuroscience, 24(1), 1193–1216. doi:10.1146/annurev.neuro.24.1.1193

Simonyan & Zisserman, 2015

Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations (ICRL), pp. 1–14. doi:10.1016/j.infsof.2008.09.005

Sohn et al., 2015

Sohn, K., Lee, H., & Yan, X. (2015). Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., & Garnett, R. (Eds.). Learning Structured Output Representation using Deep Conditional Generative Models. Advances in Neural Information Processing Systems 28 (pp. 3483–3491). Curran Associates, Inc.

Springenberg et al., 2015

Springenberg, J. T., Dosovitskiy, A., Brox, T., & Riedmiller, M. (2015 , April). Striving for Simplicity: The All Convolutional Net. arXiv:1412.6806 [cs]. arXiv:1412.6806

Srivastava et al., 2014

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15(56), 1929–1958.

Srivastava et al., 2015

Srivastava, R. K., Greff, K., & Schmidhuber, J. (2015 , November). Highway Networks. arXiv:1505.00387 [cs]. arXiv:1505.00387

Sussillo & Abbott, 2009

Sussillo, D., & Abbott, L. F. (2009 , August). Generating coherent patterns of activity from chaotic neural networks. Neuron, 63(4), 544–57. doi:10.1016/j.neuron.2009.07.018

Sutskever et al., 2014

Sutskever, I., Vinyals, O., & Le, Q. V. (2014 , December). Sequence to Sequence Learning with Neural Networks. arXiv:1409.3215 [cs]. arXiv:1409.3215

Szegedy et al., 2015

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2015 , December). Rethinking the Inception Architecture for Computer Vision. arXiv:1512.00567 [cs]. arXiv:1512.00567

Taigman et al., 2014

Taigman, Y., Yang, M., Ranzato, Marc'Aurelio, & Wolf, L. (2014 , June). DeepFace: Closing the Gap to Human-Level Performance in Face Verification. 2014 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1701–1708). Columbus, OH, USA: IEEE. doi:10.1109/CVPR.2014.220

Tanaka et al., 2019

Tanaka, G., Yamane, T., Héroux, J. B., Nakane, R., Kanazawa, N., Takeda, S., … Hirose, A. (2019 , July). Recent advances in physical reservoir computing: A review. Neural Networks, 115, 100–123. doi:10.1016/j.neunet.2019.03.005

Oord et al., 2016

van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., … Kavukcuoglu, K. (2016 , September). WaveNet: A Generative Model for Raw Audio. arXiv:1609.03499 [cs]. arXiv:1609.03499

Vaswani et al., 2017

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017 , June). Attention Is All You Need. arXiv:1706.03762 [cs]. arXiv:1706.03762

Vincent et al., 2010

Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P.-A. (2010). Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion. Journal of Machine Learning Research, p. 38.

Vinyals et al., 2015

Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015 , April). Show and Tell: A Neural Image Caption Generator. arXiv:1411.4555 [cs]. arXiv:1411.4555

Vogels et al., 2011

Vogels, T. P., Sprekeler, H., Zenke, F., Clopath, C., & Gerstner, W. (2011 , December). Inhibitory Plasticity Balances Excitation and Inhibition in Sensory Pathways and Memory Networks. Science, 334(6062), 1569–1573. doi:10.1126/science.1211095

Wang et al., 2018

Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L., & Yang, M. (2018 , September). Toward Characteristic-Preserving Image-based Virtual Try-On Network. arXiv:1807.07688 [cs]. arXiv:1807.07688

Werbos, 1982

Werbos, P.J. (1982). Applications of advances in nonlinear sensitivity analysis. System Modeling and Optimization: Proc. IFIP. Springer.

Wu et al., 2020

Wu, N., Green, B., Ben, X., & O'Banion, S. (2020 , January). Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case. arXiv:2001.08317 [cs, stat]. arXiv:2001.08317

Wu et al., 2016

Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., … Dean, J. (2016 , September). Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. arXiv:1609.08144 [cs]. arXiv:1609.08144

Xu et al., 2015

Xu, K., Ba, J. L., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., … Bengio, Y. (2015). Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Proceedings of the 32nd International Conference on Machine Learning - Volume 37 (pp. 2048–2057). JMLR.org.

Zhou & Tuzel, 2017

Zhou, Y., & Tuzel, O. (2017 , November). VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. arXiv:1711.06396 [cs]. arXiv:1711.06396

Zhu et al., 2020

Zhu, Y., Gao, T., Fan, L., Huang, S., Edmonds, M., Liu, H., … Zhu, S.-C. (2020 , February). Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense. Engineering. doi:10.1016/j.eng.2020.01.011