Thomas Hikaru Clark’s Page
Hi! I’m Thomas :)

Interests
I am interested in the principles underlying human language and cognition, and how the tools of NLP, information theory, and probabilistic inference can shed light on these principles. My primary line of research involves building algorithmic models of “noisy-channel language processing”, explaining how humans are able to extract meaning from anomalous or erroneous utterances in a cognitively plausible and resource-rational way.
I’ve also done projects investigating what factors influence how speakers choose between two possible ways of saying the same thing, looking at the Russian comparative alternation as a case study via both corpus study and behavioral experimentation; what makes some sentences more memorable than others, what might explain why languages have the word orders that they do; and what speakers modulate how they talk to emphasize surprising words in conversation.
My other interests include urban design, AI ethics, language learning, and science communication.
Curriculum Vitae
Thomas Hikaru Clark CV
Publications and Presentations
Publications
- Clark, T. H., Poliak, M., Regev, T., Haskins, A. J., Gibson, E., & Robertson, C. (2025, preprint). The relationship between surprisal, prosody, and backchannels in conversation reflects intelligibility-oriented pressures. PsyArXiv.
- Clark, T. H., Meister, C., Pimentel, T., Hahn, M., Cotterell, R., Futrell, R., & Levy, R. (2023). A Cross-Linguistic Pressure for Uniform Information Density in Word Order. Transactions of the Association for Computational Linguistics.
- Clark, T., Wilcox, E. G., Gibson, E., & Levy, R. (2022). Evidence for Availability Effects on Speaker Choice in the Russian Comparative Alternation. Proceedings of the Annual Meeting of the Cognitive Science Society.
- Meister, C., Pimentel, T., Clark, T. H., Cotterell, R., & Levy, R. (2022). Analyzing Wrap-Up Effects through an Information-Theoretic Lens. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics.
- Clark, T., Conforti, C., Liu, F., Meng, Z., Shareghi, E., & Collier, N. (2021). Integrating Transformers and Knowledge Graphs for Twitter Stance Detection. Proceedings of the Seventh Workshop on Noisy User-Generated Text (W-NUT 2021).
- Katsos, N., Banerjee, E., Chang, Y. J., Clark, T., Cowan, J., Williamson, T. R., & Witkowska, Z. (2021). Experimental Pragmatics: The Making of a Cognitive Science. Journal of Pragmatics.
- Meng, Z., Liu, F., Clark, T. H., Shareghi, E., & Collier, N. (2021). Mixture-of-partitions: Infusing large biomedical knowledge graphs into BERT. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
Oral Presentations
- “Evidence for an Availability-Based Production Account of the Russian Comparative Alternation via Corpus Analysis”, 35th Annual Conference on Human Sentence Processing.
Poster Presentations
- “A Model of Approximate and Incremental Noisy-Channel Language Processing”, 47th Annual Meeting of the Cognitive Science Society (upcoming).
- “Modeling Noisy-Channel Language Processing with Incremental and Approximate Probabilistic Inference”, 38th Annual Conference on Human Sentence Processing.
- “Meaning distinctiveness predicts sentence memorability”, 38th Annual Conference on Human Sentence Processing.
- “Inferring Errors and Intended Meanings with a Generative Model of Language Production in Aphasia”, 46th Annual Meeting of the Cognitive Science Society.
- “A Cross-Linguistic Pressure for Uniform Information Density in Word Order”, EMNLP 2023.
- “Context-sensitive features predict sentence memorability in the absence of memorable words”, 45th Annual Meeting of the Cognitive Science Society.
- “Word Frequency Predicts Word Errors in Stroke-Induced Aphasia”, 36th Annual Conference on Human Sentence Processing.
- “Evidence for Syntax-Lexicon Trade-off in Stroke-Induced Aphasia Patients”, 36th Annual Conference on Human Sentence Processing.
- “A Cross-Linguistic Pressure for Uniform Information Density in Word Order”, 36th Annual Conference on Human Sentence Processing.
- “Evidence for Availability Effects on Speaker Choice in the Russian Comparative Alternation”, 44th Annual Meeting of the Cognitive Science Society.
Education
I am currently a PhD candidate at MIT in the department of Brain and Cognitive Sciences, where I am a member of TedLab and the Computational Psycholinguistics Lab. I received my undergraduate degree in Computer Science from Princeton University, where I also earned certificates in Linguistics and Russian Language & Culture. After college, I earned an M.Ed. from the University of Notre Dame as part of the ACE Teaching Fellows 25th Cohort; I taught HS Computer Science and Math in Jacksonville, FL. Afterwards, I returned to grad school for an MPhil in Theoretical and Applied Linguistics from the University of Cambridge, where I was involved in the Language Technology Lab and continued on to a PhD in a different city named Cambridge.
Other Experience
Before starting my PhD, I interned at Vimeo on the Machine Learning Research team and at IBM Watson on the Speech to Text team. Previously, I have done iOS development for the Paideia Institute in Rome, Italy, and have done volunteer service projects in Russia and Japan. During the summer of 2021, I was an instructor for a summer startup camp at the Cambridge Center for International Research, teaching machine learning and data science principles to students from around the world.