The data used for training the learning models stems from the force-aligned stimuli in the Massive Auditory Lexical Decision (MALD) database (Tucker et al., 2019). The data shared here contains the results of training the models, which are used for statistical analyses afterwards. The study is published in the peer-reviewed journal Language & Cognition, under the title "A learning perspective on the emergence of abstractions: The curious case of phone(me)s".