News
Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining
2+ hour, 14+ min ago (310+ words) Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining'Apple Machine Learning Research Beyond a Single Extractor: Re-thinking HTML-to-Text Extraction for LLM Pretraining One of the first pre-processing steps for constructing web-scale LLM pretraining datasets involves extracting text from HTML....
Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
6+ day, 21+ hour ago (319+ words) Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents'Apple Machine Learning Research Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents AuthorsZhen Yang, Zi-Yi Dou, Di Feng, Forrest Huang, Anh Nguyen, Keen You, Omar Attia, Yuhao Yang, Michael Feng, Haotian…...
Asynchronous Verified Semantic Caching for Tiered LLM Architectures
1+ week, 1+ day ago (354+ words) Asynchronous Verified Semantic Caching for Tiered LLM Architectures'Apple Machine Learning Research Asynchronous Verified Semantic Caching for Tiered LLM Architectures AuthorsAsmit Kumar Singh, Haozhe Wang, Laxmi Naga Santosh Attaluri, Tak Chiam, Weihua Zhu Large language models (LLMs) now sit in the…...
Mapping the Design Space of User Experience for Computer Use Agents
1+ week, 4+ day ago (337+ words) Mapping the Design Space of User Experience for Computer Use Agents'Apple Machine Learning Research Mapping the Design Space of User Experience for Computer Use Agents AuthorsRuijia Cheng, Jenny T. Liang, Eldon Schoop, Jeffrey Nichols Large language model (LLM)-based computer use agents…...