Three dominant themes in the discussion
-
Scale‑driven breakthroughs – Modern deep‑learning successes are tied to massive compute and data, not just clever architecture.
“Don’t understimate the massive data you need to make those networks tick. Also, impracticable in slow training algorithms, beyond if they were in GPUs or CPUs.” – wslh
-
Historical inflection points – AlexNet (2012) ignited the CNN wave, and the 2017 Attention Is All You Need paper sparked the transformer boom.
“The inflection point was 2012, when AlexNet achieved a step‑change improvement in the ImageNet classification competition.” – pash
“Here’s where I’m missing understanding: for decades the idea of neural networks had existed with minimal attention. Then in 2017 Attention Is All You Need gets released and since then there is an exponential explosion in deep learning.” – RyanShook -
Hardware and dataset prerequisites – Without the recent surge in GPUs and big‑data resources, earlier theoretical ideas (e.g., transformers) could not be realized.
“The ‘bitter lesson’ is that more compute and more data eventually beats better models that don’t scale.” – pash