Introduction

Understanding others’ emotional states through their facial expressions is an important aspect of effective social interactions throughout the lifespan. Behavioral data suggest that facial emotion processing emerges very early in life1,2, as infants just months old can distinguish happy and sad faces from surprised faces35. However, children’s emotion recognition is substantially less accurate than adults, and this ability prominently improves across childhood and adolescence611. Although extensive research in cognitive and affective neuroscience has assessed developmental changes using behavioral and non-invasive neuroimaging approaches, our understanding of brain development related to facial expression perception remains limited.

One influential perspective on the development of face recognition is that it depends on the maturation of face-selective brain regions, including the fusiform face area (FFA), occipital face area (OFA), and posterior superior temporal sulcus (pSTS)12. Supporting this view, Gomez, et al. found evidence for microstructural proliferation in the fusiform gyrus during childhood, suggesting that improvements in face recognition are a product of an interplay between structural and functional changes in the cortex13. Additionally, monkeys raised without exposure to faces fail to develop normal face-selective patches, suggesting that face experience is necessary for the development of the face-processing network14. It is likely that the gradual maturation of pSTS and FFA, two early sensory areas involved in the processing of facial expressions15,16, contributes to the improved facial expression recognition over development. Yet, few studies have investigated the development of neural representation of emotional facial expressions in FFA and pSTS from early childhood to adulthood in human.

Besides the visual processing of facial configurations, understanding the emotional meaning of faces requires the awareness and interpretation of the emotional state of the other person, which is significantly shaped by life experience17,18. Thus, some researchers have proposed that the maturation of emotional information processing is related to the progressive increase in functional activity in the prefrontal cortex 6,19,20. With development, greater engagement of the prefrontal cortex may facilitate top-down modulation of activity in more primitive subcortical and limbic regions, such as the amygdala2123. Despite these theoretical advances, the functional changes in the prefrontal cortex during the perceptual processing of emotional facial expressions over development remains largely unknown.

Here, we analyze intracranial EEG (iEEG) data collected from childhood (5-10 years old) and post-childhood groups (13-55 years-old) while participants were watching a short audiovisual film. In our results, children’s dorsolateral prefrontal cortex (DLPFC) shows minimal involvement in processing facial expression, unlike the post-childhood group. In contrast, for both children and post-childhood individuals, facial expression information is encoded in the pSTC, a brain region that contributes to the perceptual processing of facial expressions. Furthermore, the encoding of complex emotions in the pSTC increases with age. These neuroimaging data imply that social and emotional experiences shape the prefrontal cortex’s involvement in processing the emotional meaning of faces throughout development, probably through top-down modulation of early sensory areas.

Results

Using AI and encoding models to study the neural representation of facial expression

In this study, we analyzed intracranial EEG (iEEG) data collected from a large group of human neurosurgical patients while they watched a short audiovisual film at the University Medical Center Utrecht24. The movie consisted of 13 interleaved blocks of videos accompanied by speech or music, 30 seconds each (Figure 1A). To characterize the neural representation of facial expression in the prefrontal cortex and low-level sensory areas across development, we analyzed iEEG data from 9 children (5-10 years old) and 31 post-childhood individuals (13-55 years old) who have electrode coverage in DLPFC, pSTC or both. First, Hume AI facial expression models were used to continuously extract facial emotion features from the movie (Figure 1B). Then, we tested how well encoding models constructed from the 48 facial emotion features (e.g., fear, joy) predict cortical high-frequency band (HFB) activity (110-140 Hz) induced by the presented movie (Figure 1B). The model performance was quantified as the correlation between the predicted and actual HFB activities, which is also called prediction accuracy.

Task design and analysis methods.

(A) Movie structure. A 6.5-minute short film was created by editing fragments from Pippi on the Run into a coherent narrative. The movie consisted of 13 interleaved blocks of videos accompanied by speech or music. (B) Data analysis schematic. Standard analysis pipeline for extracting emotion features from the movie and constructing encoding model to predict iEEG responses while participants watching the short film.

Differential representation of facial expression in children’s DLPFC

Using the analysis approach described above, we examined how facial emotion information is represented by DLPFC (Figure 2A) while watching videos accompanied by speech (i.e. speech condition) in childhood and post-childhood groups. The prediction accuracy of the encoding model was significantly greater than zero in the post-childhood group (Figure 2B, P=0.0096, two-tailed permutation test), suggesting that the neural responses in DLPFC were dynamically modulated by the facial emotion features from the movie. However, facial emotion features were not encoded in children’s DLPFC (Figure 2B, P=0.825, two-tailed permutation test). Moreover, the prediction accuracy in children’s DLPFC was significantly lower than in the post-childhood group (P=0.0114, two-tailed permutation test). These findings show that the DLPFC dynamically encodes facial expression information in post-childhood individuals but not in young children.

Prediction performance of encoding models in DLPFC.

(A) Spatial distribution of electrodes in DLPFC. Electrodes in all participants from each group are projected onto MNI space and shown on the average brain. Red shaded areas indicate middle frontal cortex provided by the FreeSurfer Desikan-Killiany atlas25. Electrodes outside DLPFC are not shown. (B) The average prediction accuracy across participants for speech condition. The performance of encoding model is measured as Pearson correlation coefficient (r) between measured and predicted brain activities. (C) Prediction accuracy difference between speech condition and music condition for each group. Error bars are standard error of the mean. *P<0.05; **P<0.01.

To further understand the functional development of children’s DLPFC, we compared the effect of human voice on the representation of facial expression in DLPFC between the two groups. The effect of human voice was quantified as difference in prediction accuracy between the speech and music conditions. Our results showed that human voice influences facial expression representation in the DLPFC differently across development (Figure 2C, P=0.0034, two-tailed permutation test). The presence of human voice enhances facial expression representation in the DLPFC of post-childhood individuals but impairs it in children.

Taken together, there are significant developmental changes in DLPFC’s involvement in facial expression perception.

The neural representation of facial expression in young children’s pSTC

After identifying developmental differences in the involvement of high-level brain areas in processing facial expression, we next examined the neural representation of facial expression in children’s early sensory areas. As an area in the core face network, posterior superior temporal sulcus (pSTS) has been associated with early stages of facial expression processing stream12,15,26,27. Although previous studies suggested that the development of facial recognition depends on the maturation of face-selective brain regions13,14, it is still unclear how facial expression information is encoded in children’s pSTS. Here, we examined the performance of the facial expression encoding model in a rare sample of two children (S19:8-year-old and S39: 5-year-old) with electrode coverage in pSTC (Figure 3A). In both cases, the encoding model significantly predicts the HFB neural signals in the pSTC under the speech condition (Figure 3B, S19speech: P=0.0014, r=0.1951; S39speech: P=0.0183, r=0.15). The prediction accuracy is reduced when human voice is absent from the video (S19music: P=0.0313, r=0.1674; S39music: P=0.3688, r=0.0574). Similarly, group-level results showed that the model performance is significantly greater than zero in the pSTC of post-childhood individuals (N=25, Figure 3C and 3D, P=0.003, two-tailed permutation test) and this neural representation of facial expression information is significantly reduced when human voice is absent (paired-t-test, t24=2.897,P=0.0079). These results provide evidence that children’s sensory areas encode facial emotion features from a high-dimensional, continuous space in a manner similar to that of post-childhood individuals.

Prediction performance of encoding models in pSTC.

(A) The electrode distribution of two children (s19 and s39). Electrodes in pSTC are green. (B) Prediction accuracy of encoding models in the two children. (C) Spatial distribution of recording contacts in post-childhood participants’ pSTC. The pSTC electrodes identified in individual space are projected onto MNI space and shown on the average brain. Contacts other than pSTC are not shown. Blue shaded areas indicate superior temporal cortex provided by the FreeSurfer Desikan-Killiany atlas25. (D) Average prediction accuracy across post-childhood participants. Error bars are standard error of the mean. **P<0.01.

The complexity of facial expression encoding in the pSTC increases across development

To understand how facial expression representation in pSTC changes across development, we examined the feature weights of the facial expression encoding models in all participants with significant prediction accuracy (10 post-childhood individuals and 2 children). The weight for each feature represents its relative contribution to predicting the neural response. First, we calculated the encoding weights for complex emotions (averaging guilt, embarrassment, pride, and envy, which were selected as the most representative complex emotions based on previous studies2830) and basic emotions (averaging joy, sadness, fear, anger, disgust, and surprise). Then, we calculated their correlations with age separately. Our results showed that the encoding weight of complex emotion was significantly positively correlated with age (r12=0.8512,P=0.004, Figure 4A left). No significant correlation between encoding weight of basic emotion and age was observed (r12=0.3913,P=0.2085, Figure 4A right). In addition, we computed Pearson correlations between each individual feature weight and age, ranking the r values from largest to smallest (Figure 4B). The highest correlations were found for embarrassment, guilt, pride, interest, and envy—emotions that are all considered complex emotion. Among them, the weights for embarrassment, guilt, pride, and interest showed significant positive correlations with age (Figure 4C, embarrassment: r=0.7666, P=0.0036; pride: r=0.6773, P=0.0155; guilt: r=0.6421, P=0.0244, interest: r=0.6377, P=0.0257, uncorrected for multiple comparisons), suggesting that the encoding of these complex emotions in pSTC increases with age. Thus, our results suggest that as development progresses, the pSTC becomes increasingly engaged in encoding complex emotions which requires representing others’ mental states and emerges later in development 3133.

Correlation between encoding weights and age.

(A) Left: Correlation between averaged encoding weights of five complex emotions and age. Right: Correlation between averaged encoding weights of six basic emotions and age. (B) Pearson correlation coefficient between encoding weights of 48 facial expression features and age. The results are ranked from largest to smallest. Significant correlations noted with *(P<0.05, uncorrected) or **(P<0.01, uncorrected). (C) Correlation between encoding weights of embarrassment, pride, guilt, interest and age (N=12).

Methods

In this study, iEEG data from an open multimodal iEEG-fMRI dataset were analyzed24.

Participants and electrode distribution

Due to the research purposes of the current study, only participants who had at least four electrode contacts in either DLPFC or pSTC were included in the data analysis (Table 1 and Table 2). Nine children (5-10 years old, 5 females) and thirty-one post-childhood individuals (13-55 years old, 18 females) are included in the present study. In the childhood group, eight participants had enough electrodes implanted in the DLPFC, and two had enough electrodes implanted in the pSTC. In the post-childhood group, thirteen participants had enough electrodes implanted in the DLPFC, and twenty-five had enough electrodes implanted in the pSTC.

Demographic information of childhood group.

Demographic information of post-childhood group.

Experimental procedures

A 6.5-minute short film was crafted by editing fragments from Pippi on the Run into a coherent narrative. The film is structured into 13 interleaved 30-second blocks of either speech or music, with seven blocks featuring background music only and six blocks retaining the original dialogue and voice from the video. Patients were asked to watch the movie while the intracranial EEG signals were recorded. No fixation cross was displayed in the middle of the screen or elsewhere. The movie was presented using the Presentation software (Neurobehavioral Systems, Berkeley, CA) and the sound was synchronized with the neural recordings. More data acquisition details can be found in Berezutskaya et al.’s article24.

iEEG data processing

Electrode contacts and epochs contaminated with excessive artifacts and epileptiform activity were removed from data analysis by visual inspection. Raw data were filtered with a 50-Hz notch filter and re-referenced to the common average reference. For each electrode contact in each patient, the preprocessed data were band-pass filtered (110– 140 Hz, 4th-order Butterworth). The Hilbert transform was then applied to extract the analytic amplitude. Each event (block) was extracted in the 0 to 30 s time window around its onset. The fifth music block was excluded, as there were no faces presented on screen. Subsequently, the data were down-sampled to 400 Hz and square-root transformed. Finally, the data were normalized by z-scoring with respect to baseline periods (−0.2 to 0 s before stimulus onset).

Contact Location and Regions of Interest

We identified electrode contacts in STC in individual brains using individual anatomical landmarks (i.e., gyri and sulci). Superior temporal sulci and lateral sulci were used as boundaries. A coronal plane including the posterior tip of the hippocampus served as an anterior/posterior boundary. To identify electrode contacts in DLPFC, we projected the electrode contact positions provided by the open dataset onto Montreal Neurological Institute-152 template brain (MNI) space, using FreeSurfer. DLPFC was defined based on the following sets of HCP-MMP134 labels on both left and right hemispheres: 9-46d, 46, a9-46v, and p9-46v.

Emotion feature extraction

Hume AI (“http://www.hume.ai/”) was used to extract the facial emotion features from the video. When multiple faces appeared in the movie, the maximum score of the facial expression features across all faces was used for each emotion category. All the time courses of facial emotion features were resampled to 2Hz. No facial emotion features were extracted for the fifth music block due to the absence of faces. The full list of the 48 facial emotion features is shown in Figure 4B.

Encoding model fitting

To model iEEG responses to emotion, we used a linear regression approach with 48 facial emotion features extracted by Hume AI. Time-lagged versions of each feature (with 0, 0.5 and 1-second delays) were used in the model fitting. For each participant, high-frequency broadband (HFB) responses from all electrode contacts within each area were concatenated. To match the temporal resolution of the emotion feature time course, the HFB responses were binned into 500 ms windows. We then modeled the processed HFB response for each participant, each brain area, and each condition (speech vs music) using ridge regression. The optimal regularization parameter was assessed using 5-fold cross-validation, with the 20 different regularization parameters (log spaced between 10 and 10000). To keep the scale of the weights consistent, a single best overall value of the regularization coefficient was used for all areas in both the speech and music conditions in all patients. We used cross-validation iterator to fit the model and test it on held-out data. The model performance was evaluated by calculating Pearson correlation coefficients between measured and predicted HFB response of individual brain areas. The mean prediction accuracy (r value) of the encoding model with 5-fold cross-validation was then calculated.

Non-parametric permutation tests were used to test whether the encoding model performance was significantly > 0 and whether there was a significant difference between groups. Specifically, we shufled facial emotion feature data in time, and then we conducted the standard data analysis steps (described above) using the shufled facial emotion features. This shufle procedure was repeated 5000 times to generate a null distribution, and p-values were calculated as the proportion of results from shufled data more extreme than the observed real value. A two-sided paired t-test was used to examine differences in encoding accuracy between speech and music conditions in post-childhood group (Figure 3D).

Weight analysis

To examine the correlation between encoding model weights and age, we obtained 48 encoding model weights from all folds of cross-validation for all participants whose pSTC significantly encoded facial expression (i.e. the p-value of prediction accuracy is less than 0.05). Thus, 10 post-childhood individuals and 2 children were involved in the weight analysis. The weight for each feature represents its relative contribution to predicting the neural response. A higher weight indicates that the corresponding feature has a stronger influence on neural activity, meaning that variations in this feature more significantly impact the predicted response. We used the absolute value of weights and therefore did not discriminate whether facial emotion features were mapped to an increase or decrease in the HFB response.

Discussion

The current study examines functional changes in both low-level and high-level brain areas across development to provide valuable insights into the neural mechanisms underlying the maturation of facial expression perception. Based on our findings, we propose that young children rely primarily on early sensory areas rather than the prefrontal cortex for facial emotion processing. As development progresses, the prefrontal cortex becomes increasingly involved, perhaps serving to modulate responses in early sensory areas based on emotional context and enabling them to process complex emotions. This developmental progression ultimately enables the full comprehension of facial emotions in adulthood.

Behavioral results suggest that infants as young as only 7–8 months can categorize some emotions 35. However, sensitivity to facial expressions in young children does not mean that they can understand the meaning of that affective state. For example, Kaneshige and Haryu (2014)35 found that although 4-month-old infants could discriminate facial configurations of anger and happiness, they responded positively to both, suggesting that at this stage, they may lack knowledge of the affective meaning behind these expressions. This underscores the idea that additional processes need to be developed for children to fully grasp the emotional content conveyed by facial expressions. Although the neural mechanism behind this development is still unclear, a reasonable perspective is that it requires both visual processing of facial features and emotion-related processing for the awareness of the emotional state of the other person17,36,37. Indeed, growing evidence suggests that the prefrontal cortex plays an important role in integrating prior knowledge with incoming sensory information, allowing interpretation of the current situation in light of past emotional experience19,38.

In the current study, we observed differential representation of facial expressions in the DLPFC between children and post-childhood individuals. First, in post-childhood individuals, neural activity in the DLPFC encodes high-dimensional facial expression information, whereas this encoding is absent in children. Second, while human voice enhances the representation of facial expressions in the DLPFC of post-childhood individuals, it instead reduces this representation in children. These results suggest that the DLPFC undergoes developmental changes in how it processes facial expressions. The absence of high-dimensional facial expression encoding in children implies that the DLPFC may not yet be fully engaged in emotional interpretation at an early age. Additionally, the opposite effects of human voice on facial expression representation indicate that multimodal integration of social cues develops over time. In post-childhood individuals, voices may enhance emotional processing by providing congruent information3941, whereas in children, the presence of voice might interfere with or redirect attentional resources away from facial expression processing 39,42,43.

There have been few neuroimaging studies directly examining the functional role of young children’s DLPFC in facial emotion perception. Some evidence suggest that the prefrontal cortex continues to develop until adulthood to achieve its mature function in emotion perception9,20,44, and for some emotion categories, this development may extend across the lifespan. For example, prefrontal cortex activation during viewing fearful faces increases with age 18,19,45. As there were not enough participants for us to calculate correlation between encoding model performance in DLPFC and age, it is still unclear whether the representation of facial expression in DLPFC increase linearly with age. One possibility is that the representation of facial expressions in the DLPFC gradually increases with age until it reaches an adult-like level. This would suggest a continuous developmental trajectory, where incremental improvements in neural processing accumulate over time. Another possibility is that development follows a more nonlinear pattern, showing improvement with prominent changes at specific ages. Interestingly, research has shown that performance on matching emotional expressions improves steadily over development, with notable gains in accuracy occurring between 9 and 10 years and again between 13 and 14 years, after which performance reaches adult-like levels 6.

Although there are only two children in our sample with enough electrodes in pSTC, our results clearly showed that facial expression is encoded in each child’s pSTC. Moreover, the prediction accuracy of the encoding model in the two children was comparable to or higher than the average level in the post-childhood group. In the 5-year-old child (S19) who had electrode coverage in both DLPFC and pSTC, facial expressions were represented in the pSTC but not in the DLPFC. This rare and fortunate sampling allows us to rule out the possibility that the low prediction accuracy of the facial expression encoding model in the DLPFC is due to the reduced engagement in the movie-watching task for children. Consistent with our findings, previous studies have shown that the fusiform and superior temporal gyri are involved in emotion-specific processing in 10-year-old children46. Meanwhile, some other researchers found that responses to facial expression in the amygdala and posterior fusiform gyri decreased as people got older20, but the use of frontal regions increased with age44. Therefore, we propose that early sensory areas like the fusiform and superior temporal gyri play a key role in facial expression processing in children, but their contribution may shift with age as frontal regions become more involved. Consistent with this perspective, our results revealed that the encoding weights for complex emotions in pSTC increased with age, suggesting a developmental trajectory in the neural representation of complex emotions in pSTC. This finding aligns with previous behavioral studies showing that social complex emotion recognition does not fully mature until young adulthood3133. In fact, our results suggests that the representation of complex facial expressions in pSTC continues to develop over the lifespan. As for the correlation between basic emotion encoding and age, the lack of a significant effect in our study does not necessarily indicate an absence of developmental change but may instead be due to the limited sample size.

In summary, our study provides novel insights into the neural mechanisms underlying the development of facial expression processing. As with any study, several limitations should be acknowledged. First, most electrode coverage in our study was in the left hemisphere, potentially limiting our understanding of lateralization effects. Second, while our results provide insights into the role of DLPFC during development, we were unable to examine other prefrontal regions, such as the orbitofrontal cortex (OFC) and anterior cingulate cortex (ACC), to examine their unique contributions to emotion processing. Lastly, due to sample size constraints, we were unable to divide participants into more granular developmental stages, such as early childhood, adolescence, and adulthood, which could provide a more detailed characterization of the neural mechanisms underlying the development of facial expression processing. Future studies using non-invasive methods, with more age-diverse samples will be essential for refining our understanding of how facial emotion processing develops across the lifespan.

Acknowledgements

We would like to thank Dr. Julia Berezutskaya for providing the audiovisual film that was used for iEEG data collection. This work was supported by fundings from United States National Institutes of Health (R01-MH127006).

Additional information

Author contributions

X.F. and A.T. performed the data analysis. X.F. and K.R.B wrote the paper.