The recent successes of neural networks producing human-like language have captured the attention of the general public. They have also caused significant stir in cognitive science, with many researchers arguing that classical puzzles about human cognition and challenges to artificial intelligence are being solved by neural networks. An article recently published in Nature, covered by the journal’s media department as a “breakthrough” in AI, argues that a particular machine-learning technique has succeeded where others failed: to match and perhaps explain the human ability to reverse engineer generative processes (rules) based on few examples. We demonstrate that these conclusions are premature. Among other results, we found that the model displays different rates of generalization success depending on what labels are attached to what meanings. This is in sharp contrast with the fact that there are no linguistic or broader cognitive benefits from calling a carbonated beverage “pop” or “soda,” nor from calling the objects of study of dendrology “trees” or “Bäume.” Crucially, our examples of failures lie squarely within the narrow task that the article focuses on, calling into question the ambitious conclusions and the bullish media coverage the article received.