Mobile and wearable devices can provide continuous, granular, and longitudinal data on an individual’s physiological state and behaviors. Examples include step counts, raw sensor measurements such as heart rate variability, sleep duration, and more. Individuals can use these data for personal health monitoring as well as to motivate healthy behavior. This represents an exciting area in which generative AI models can be used to provide additional personalized insights and recommendations to an individual to help them reach their health goals. To do so, however, models must be able to reason about personal health data comprising complex time series and sporadic information (like workout logs), contextualize these data using relevant personal health domain knowledge, and produce personalized interpretations and recommendations grounded in an individual’s health context. Consider a common health query, “How can I get better sleep?” Though a seemingly straightforward question, arriving at a response that is customized to the individual involves performing a series of complex analytical steps, such as: checking data availability, calculating average sleep duration, identifying sleep pattern anomalies over a period of time, contextualizing these findings within the individual's broader health, integrating knowledge of population norms of sleep, and offering tailored sleep improvement recommendations. Recently, we showed how building on Gemini models’ advanced capabilities in multimodality and long-context reasoning could enable state-of-the-art performance on a diverse set of medical tasks. However, such tasks rarely make use of complex data sourced from mobile and wearable devices relevant for personal health monitoring. Building on the next-generation capabilities of Gemini models, we present research that highlights two complementary approaches to providing accurate personal health and wellness information with LLMs. The first paper, “Towards a Personal Health Large Language Model”, demonstrates that LLMs fine-tuned on expert analysis and self-reported outcomes are able to successfully contextualize physiological data for personal health tasks. The second paper, “Transforming Wearable Data into Personal Health Insights Using Large Language Model Agents”, emphasizes the value of code generation and agent-based workflows to accurately analyze behavioral health data through natural language queries. We believe that bringing these ideas together, to enable interactive computation and grounded reasoning over personal health data, will be critical components for developing truly personalized health assistants. With these two papers, we curate new benchmark datasets across a range of personal health tasks, which help evaluate the effectiveness of these models.