My research lies at the intersection of artificial intelligence and materials science, focusing on developing innovative methods and agents for automated materials discovery. I have contributed to research on three main topics: (1) generative models for the inverse design of functional materials, (2) agents based on large language models for synthesis prediction of crystal structures and scientific formula discovery, and (3) interpretable machine learning for catalyst design.

I have published 16 peer-reviewed SCI papers with an h-index of 11, receiving total . Among these, I am the first author on 7 papers, including 3 in Nature Communications, 1 in Materials Horizons, 1 in Advanced Functional Materials, and 1 in The Journal of Physical Chemistry Letters. Additionally, I have three important first-authored manuscripts currently under review, available as preprints.

My interdisciplinary work has led to the development of several innovative frameworks including MAGECS (Material Generation with Efficient Global Chemical Space Search) for guiding generative models to globally explore chemical space, SSAGEN (Stability and Symmetry-Assured GENerative framework) for crystal structure generation with inherent stability and symmetry, CSLLM (Crystal Synthesis Large Language Models) for accurate synthesizability prediction of crystal structures, and LLM-Feynman for universal scientific formula and theory discovery. My research has substantially improved discovery efficiency and success rates for novel functional materials.

🎖 Honors and Awards

2020.01 Top Prize in “HUAWEI Cup” The 16th Chinese Post-Graduate Mathematical Contest in Modeling, Award ratio: 1.3% (188 teams out of 14,014)

📖 Education

2021.09 - 2025.06, Ph.D. in Physics, School of Physics, Southeast University, Nanjing
2018.09 - 2021.06, Master in Physics, Soochow Institute for Energy and Materials InnovationS, Soochow University, Suzhou
2014.09 - 2018.06, Bachelor in New Energy Science and Technology, Nanjing Tech University, Nanjing
2011.09 - 2014.06, High School, Suzhou No.10 High School, Suzhou
2008.09 - 2011.06, Middle School, Suzhou Zhenhua Middle School, Suzhou

💬 Talks

2024.06, Oral Presentation at the 34th Academic Meeting of the Chinese Chemical Society, Guangzhou, China [Photo] [Abstract]
2025.03, Oral Presentation at the American Chemical Society Spring 2025 Meeting, San Diego, USA [Photo] [Abstract]

💼 Work Experience

2025.06 - present, Junior Research Fellow, Department of Advanced Materials, Suzhou National Laboratory, Suzhou, China
2024.07 - 2024.09, Research Intern, Shanghai Artificial Intelligence Laboratory, Shanghai, China
2019.10 - 2020.01, Research Intern, National Institute for Materials Science (NIMS), Tsukuba, Japan

📝 Key Research Contributions

Nat. Commun. 6, 1, 6530 (2025)

Accurate prediction of synthesizability and precursors of 3D crystal structures via large language models

Zhilong Song, Shuaihua Lu, Minggang Ju, Qionghua Zhou, Jinlan Wang

Accessing the synthesizability of crystal structures is crucial for transforming theoretical materials into real-world applications. Nevertheless, there is a significant gap between actual synthesizability and thermodynamic or kinetic stability commonly used to screen synthesizable structures. Herein, we develop the Crystal Synthesis Large Language Models (CSLLM) framework, which utilizes three specialized LLMs to predict the synthesizability of arbitrary 3D crystal structures, possible synthetic methods, and suitable precursors, respectively. We construct a comprehensive dataset including synthesizable/non-synthesizable crystal structures and develop an efficient text representation for crystal structures to fine-tune LLMs. Our Synthesizability LLM achieves state-of-the-art accuracy (98.6%), significantly outperforming traditional synthesizability screening based on thermodynamic and kinetic stability. Its outstanding generalization ability is further demonstrated in experimental structures with complexity considerably exceeding that of the training data. Furthermore, both the Method and Precursor LLMs exceed 90% accuracy in classifying possible synthetic methods and identifying solid-state synthetic precursors for common binary and ternary compounds, respectively. Leveraging CSLLM, tens of thousands of synthesizable theoretical structures are successfully identified, with their 23 key properties predicted using accurate graph neural network models.

Nat. Commun. 6, 1, 1053 (2025)

Inverse design of promising electrocatalysts for CO₂ reduction via generative models and bird swarm algorithm

Zhilong Song, Linfeng Fan, Shuaihua Lu, Chongyi Ling, Qionghua Zhou, Jinlan Wang

Directly generating material structures with optimal properties is a long-standing goal in material design. Traditional generative models often struggle to efficiently explore the global chemical space, limiting their utility to localized space. Here, we present a framework named Material Generation with Efficient Global Chemical Space Search (MAGECS) that addresses this challenge by integrating the bird swarm algorithm and supervised graph neural networks, enabling effective navigation of generative models in the immense chemical space towards materials with target properties. Applied to the design of alloy electrocatalysts for CO₂ reduction (CO₂RR), MAGECS generates over 250,000 structures, achieving a 2.5-fold increase in high-activity structures (35%) compared to random generation. Five predicted alloys— CuAl, AlPd, Sn₂Pd₅, Sn₉Pd₇, and CuAlSe₂ are synthesized and characterized, with two showing around 90% Faraday efficiency for CO₂RR. This work highlights the potential of MAGECS to revolutionize functional material development, paving the way for fully automated, artificial intelligence-driven material design.

arXiv: 2507.19307

Stability and Symmetry-Assured Crystal Structure Generation for Inverse Design of Photocatalysts in Water Splitting

Zhilong Song, Chongyi Ling, Qiang Li, Qionghua Zhou, Jinlan Wang

Generative models are revolutionizing materials discovery by enabling inverse design-direct generation of structures from desired properties. However, existing approaches often struggle to ensure inherent stability and symmetry while precisely generating structures with target compositions, space groups, and lattices without fine-tuning. Here, we present SSAGEN (Stability and Symmetry-Assured GENerative framework), which overcomes these limitations by decoupling structure generation into two distinct stages: crystal information (lattice, composition, and space group) generation and coordinate optimization. SSAGEN first generates diverse yet physically plausible crystal information, then derives stable and metastable atomic positions through universal machine learning potentials, combined global and local optimization with symmetry and Wyckoff position constraints, and dynamically refined search spaces. Compared to prior generative models such as CDVAE, SSAGEN improves the thermodynamic and kinetic stability of generated structures by 148% and 180%, respectively, while inherently satisfying target compositions, space groups, and lattices. Applied to photocatalytic water splitting (PWS), SSAGEN generates 200,000 structures-81.2% novel-with 3,318 meeting all stability and band gap criteria. Density functional theory (DFT) validation confirms 95.6% structures satisfy PWS requirements, with 24 optimal candidates identified through comprehensive screening based on electronic structure, thermodynamic, kinetic, and aqueous stability criteria. SSAGEN not only precisely generates materials with desired crystal information but also ensures inherent stability and symmetry, establishing a new paradigm for targeted inverse design of functional materials.

arXiv: 2503.06512

LLM-Feynman: Leveraging Large Language Models for Universal Scientific Formula and Theory Discovery

Zhilong Song, Minggang Ju, Chunjin Ren, Qiang Li, Chongyi Li, Qionghua Zhou, Jinlan Wang

Distilling underlying principles from data has historically driven scientific breakthroughs. However, conventional data‐driven machine learning often produces complex models that lack interpretability and generalization due to insufficient domain expertise. Here, we present LLM-Feynman, a novel agent that leverages large language models (LLMs) alongside systematic optimization to derive concise, interpretable formulas from data and domain knowledge. Our method integrates automated feature engineering, LLM-guided symbolic regression with self-evaluation, and Monte Carlo tree search to reduce LLM hallucination, thereby enhancing formula discovery. The embedding of domain knowledge simplifies the formula, while self-evaluation based on this knowledge further minimizes prediction errors, surpassing conventional symbolic regression in accuracy and interpretability. Validation on datasets from Feynman physics lectures confirms that LLM-Feynman can rediscover over 90% real physical formulas. Moreover, when applied to four key materials science tasks – from classifying the synthesizability of 2D and perovskite structures to predicting ionic conductivity in lithium solid-state electrolytes and GW bandgaps in 2D materials – LLM-Feynman consistently yields interpretable formula with accuracy exceeding 90% and R2 values above 0.8. By transcending mere data fitting through the integration of deep domain knowledge, LLM-Feynman establishes a new paradigm for the automated discovery of generalizable scientific formula and theory across disciplines.

arXiv: 2407.06489

T2MAT (text-to-materials): A universal agent for generating material structures with goal properties from a single sentence

Zhilong Song, Shuaihua Lu, Qionghua Zhou, Jinlan Wang

Artificial Intelligence-Generated Content (AIGC)—content autonomously produced by AI systems without human intervention—has significantly boosted efficiency across various fields. However, AIGC in material science faces challenges in efficiently discovering novel materials that surpass existing databases, while ensuring the invariance and stability of crystal structures. To address these challenges, we develop T2MAT (text-to-material), an end-to-end agent that transforms user-input text into the inverse design of novel material structures with target properties beyond existing database, enabled by comprehensive exploration of chemical space and fully automated first-principles validation. Furthermore, we propose CGTNet (Crystal Graph Transformer NETwork), a graph neural network specifically designed to capture long-range interactions, which dramatically improves the accuracy and data efficiency of property predictions and thereby strengthens the reliability of inverse design. Through these contributions, T2MAT reduces the reliance on human expertise and accelerates the discovery of high-performance functional materials, paving the way for truly autonomous material design.

Nat. Commun. 11, 1, 3513 (2020)

Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts

Baicheng Weng#, Zhilong Song#, Rilong Zhu, Qingyu Yan, Qingde Sun, Corey G Grice, Yanfa Yan, Wan-Jian Yin

Symbolic regression (SR) is an approach of interpretable machine learning for building mathematical formulas that best fit certain datasets. In this work, SR is used to guide the design of new oxide perovskite catalysts with improved oxygen evolution reaction (OER) activities. A simple descriptor, μ/t, where μ and t are the octahedral and tolerance factors, respectively, is identified, which accelerates the discovery of a series of new oxide perovskite catalysts with improved OER activity. We successfully synthesise five new oxide perovskites and characterise their OER activities. Remarkably, four of them, Cs_0.4La_0.6Mn_0.25Co_0.75O₃, Cs_0.3La_0.7NiO₃, SrNi_0.75Co_0.25O₃, and Sr_0.25Ba_0.75NiO₃, are among the oxide perovskite catalysts with the highest intrinsic activities. Our results demonstrate the potential of SR for accelerating the data-driven design and discovery of new materials with improved properties.

Mater. Horiz. 10, 5, 1651-1660 (2023)

Distilling universal activity descriptors for perovskite catalysts from multiple data sources via multi-task symbolic regression

Zhilong Song, Xiao Wang, Fangting Liu, Qionghua Zhou, Wan-Jian Yin, Hao Wu, Weiqiao Deng, Jinlan Wang

Developing activity descriptors via data-driven machine learning (ML) methods can speed up the design of highly active and low-cost electrocatalysts. Despite the fact that a large amount of activity data for electrocatalysts is stored in the literature, data from different publications are not comparable due to different experimental or computational conditions. In this work, an interpretable ML method, multi-task symbolic regression, was adopted to learn from data in multiple experiments. A universal activity descriptor to evaluate the oxygen evolution reaction (OER) performance of oxide perovskites free of calculations or experiments was constructed and reached high accuracy and generalization ability. Utilizing this descriptor with Bayesian-optimized parameters, a series of compelling double perovskites with excellent OER activity were predicted and further evaluated using first-principles calculations. Finally, the two ML-predicted nickel-based perovskites with the best OER activity were successfully synthesized and characterized experimentally. This work opens a new way to extend machine-learning material design by utilizing multiple data sources.

J. Phys. Chem. Lett. 14, 14, 3594-3601(2023)

Adaptive Design of Alloys for CO₂ Activation and Methanation via Reinforcement Learning Monte Carlo Tree Search Algorithm

Zhilong Song, Qionghua Zhou, Shuaihua Lu, Sae Dieb, Chongyi Ling, Jinlan Wang

Data-driven machine learning (ML) has earned remarkable achievements in accelerating materials design, while it heavily relies on high-quality data acquisition. In this work, we develop an adaptive design framework for searching for optimal materials starting from zero data and with as few DFT calculations as possible. This framework integrates automatic density functional theory (DFT) calculations with an improved Monte Carlo tree search via reinforcement learning algorithm (MCTS-PG). As a successful example, we apply it to rapidly identify the desired alloy catalysts for CO₂ activation and methanation within 200 MCTS-PG steps. To this end, seven alloy surfaces with high theoretical activity and selectivity for CO₂ methanation are screened out and further validated by comprehensive free energy calculations. Our adaptive design framework enables the fast computational exploration of materials with desired properties via minimal DFT calculations.

📚 Publications

Total: 20 papers, Google Scholar Profile | | h-index: 10

First Author Papers (5 published + 3 preprints)

Z. Song, S. Lu, M. Ju, et al. “Accurate prediction of synthesizability and precursors of 3D crystal structures via large language models.” Nature Communications, 2025, 16(1): 6530.
Z. Song, L. Fan, S. Lu, et al. “Inverse design of promising alloys for electrocatalytic CO₂ reduction via generative graph neural networks combined with bird swarm algorithm.” Nature Communications, 2025, 16(1): 1053.
Z. Song, X. Wang, F. Liu, et al. “Distilling universal activity descriptors for perovskite catalysts from multiple data sources via multi-task symbolic regression.” Materials Horizons, 2023, 10(5): 1651-1660.
Z. Song, Q. Zhou, S. Lu, et al. “Adaptive design of alloys for CO₂ activation and methanation via reinforcement learning Monte Carlo tree search algorithm.” The Journal of Physical Chemistry Letters, 2023, 14(14): 3594-3601.
Z. Song, C. Ling, Q. Li, et al. “Stability and Symmetry-Assured Crystal Structure Generation for Inverse Design of Photocatalysts in Water Splitting.” 2025, arXiv:2507.19307. (Under review)
Z. Song, M. Ju, C. Ren, et al. “LLM-Feynman: Leveraging Large Language Models for Universal Scientific Formula and Theory Discovery.” 2025, arXiv:2503.06512. (Under review)
Z. Song, S. Lu, Q. Zhou, et al. “T2MAT (text-to-materials): A universal agent for generating material structures with goal properties from a single sentence.” 2024, arXiv:2407.06489. (Under review)
Z. Song, X. Chen, F. Meng, et al. “Machine learning in materials design: Algorithm and application.” Chinese Physics B, 2020, 29(11): 116103.

Co-first Author Papers (2 papers)

B. Weng#, Z. Song#, R. Zhu, et al. “Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts.” Nature Communications, 2020, 11(1): 3513. (Equal contribution)
M. Wu#, Z. Song#, Y. Cui, et al. “Machine learning-assisted design of nitrogen-rich covalent triazine frameworks photocatalysts.” Advanced Functional Materials, 2024: 2413453. (Equal contribution)

Co-author Papers (9 papers)

Y. Su, Z. Song, W. Zhu, et al. “Visible-light photocatalytic CO₂ reduction using metal-organic framework derived Ni(OH)₂ nanocages: a synergy from multiple light reflection, static charge transfer, and oxygen vacancies.” ACS Catalysis, 2020, 11(1): 345-354.
Z. Sun, Z. Song, W.J. Yin. “Going beyond the d-band center to describe CO₂ activation on single-atom alloys.” Advanced Energy and Sustainability Research, 2022, 3(2): 2100152.
S. Dieb, Z. Song, W.J. Yin, et al. “Optimization of depth-graded multilayer structure for x-ray optics using machine learning.” Journal of Applied Physics, 2020, 128(7): 074901.
X. Chen, Z. Song, S. Lu, et al. “AI-driven materials design: paradigm shift from small data to big data.” SCIENTIA SINICA Chimica, 2025, 55(6): 1648-1659.
W. Lin, F. Liu, Z. Song, et al. “Feature-Extended Descriptor Construction for Prediction of Consecutive Elementary Reaction Energies in Methane Oxidation.” Chemistry of Materials, 2025, 37(12): 4499–4510.
S. Lu, Q. Zhou, X. Chen, Z. Song, et al. “Inverse design with deep generative models: next step in materials discovery.” National Science Review, 2022, 9(8): nwac111.
H. Lin, J. Mao, M. Qin, Z. Song, et al. “Single-phase alkylammonium cesium lead iodide quasi-2D perovskites for color-tunable and spectrum-stable red LED.” Nanoscale, 2019, 11(36): 16907-16918.
H.L. Zhu, H. Lin, Z. Song, et al. “Achieving high-quality Sn–Pb perovskite films on complementary metal-oxide-semiconductor-compatible metal/silicon substrates for efficient imaging array.” ACS Nano, 2019, 13(10): 11800-11808.
X. Gao, Y. Wu, Y. Zhang, X. Chen, Z. Song, et al. “How the spacer influences the stability of 2D perovskites?.” Small Methods, 2024: 2401172.

Conference Papers (1 paper)

Z. Song, X. Chen, S. Dieb, et al. “Design of thermodynamically stable perovskites using machine learning.” The 67th JSAP Spring Meeting 2020, The Japan Society of Applied Physics, 2020: 3632-3632.

💻 Technical Expertise

Theoretical Foundations

Physics & Chemistry: Solid theoretical foundation in solid-state physics, quantum chemistry, density functional theory (DFT), and theoretical catalysis
Computer Science: Deep understanding of interpretable models, generative models， graph neural networks (GNNs) and autoregressive models.

Programming

Languages: Proficient in Python, Shell scripting (Bash, Zsh, Perl), Julia, MATLAB and Fortran
AI-Powered Development: Proficient in AI-assisted development tools like Cursor and Claude code to enhance productivity and achieve vibe coding.

Machine Learning

General Frameworks: Expert in PyTorch and TensorFlow
Materials-Specific ML:
- Generative models (GAN, VAE, Diffusion, Flow) for materials structure generation - model development and innovation
- Graph Neural Networks (GNN) for materials property prediction - model development and innovation
- Universal machine learning potentials - application and development
Large Language Models: Full parameter and LoRA fine-tuning, agent construction, and reinforcement learning fine-tuning
Interpretable ML: Development and application of interpretable ML algorithms such as symbolic regression
Infrastructure: Materials database development and deployment

Computational Materials Science

First-Principles Calculations: Expert in VASP, CP2K, Quantum ESPRESSO for material property calculations
Machine Learning Potentials: Proficient in GPUMD and DeePMD-kit for force field training and development
Specialized Tools: Expert in Pymatgen and ASE (Atomic Simulation Environment) for general-purpose materials data manipulation

🎯 Hobbies

Beyond my research, I’m passionate about various activities that keep me balanced and inspired:

🏎️ Driving

I’m passionate about driving and got my license right after graduating from high school. During my undergraduate years, I completed two long self-driving adventures across China:

Summer 2015: ~5,500 km journey in 15 days, traversing 11 provinces across central and southern China
Summer 2016: ~9,100 km journey in 23 days, traversing 15 provinces across northern, western, and central China

My passion for driving extends to the virtual world - I was once a sim racer with a complete racing simulator setup, including a force feedback steering wheel, shifter, handbrake, and three-pedal system (throttle, brake, and clutch). During high school, I set:

7 track world records in DiRT 3 🏆
20 track world records in WRC 3 🏆

Throughout my undergraduate years, I enjoyed racing simulators like Assetto Corsa and Forza Motorsport for relaxation, and I retired from competitive sim racing after starting my master’s degree. These experiences have taught me to embrace competition fearlessly - even when facing intense competition and pressure, I maintain 100% confidence and give my all to pursue opportunities, regardless of the outcome.

🎱 Cue Sports

I’m an amateur billiards player with a particular fondness for snooker and Chinese eight-ball:

Snooker: Personal best break of 48 points
Chinese Eight-ball: Occasional table clearances

During high school and undergraduate years, I played weekly, though this reduced to at most once a month during my master’s and Ph.D. studies. These sports taught me the importance of precision, strategy, and patience - qualities that serve me well in research.

🏸 Badminton

I enjoy playing badminton at a recreational level. I can manage basic shots like clears, net drops, and smashes, but I can’t do proper footwork and my movement speed is slow. During high school, I played frequently with friends, though this became less regular during my undergraduate years. Throughout my master’s and PhD studies, I still play occasionally as a fun way to stay active and take a break from research.

💡 Thank you for taking the time to learn about my work and interests! I’m always open to discussions, collaborations, and new connections. Feel free to reach out （email: zhilong@seu.edu.cn, zhilong@email.cn, songzl@szlab.ac.cn）– I look forward to hearing from you!

Zhilong Song (宋志龙)