Senior Research Scientist (Multimodal Large Language Model)

Senior Research Scientist (Multimodal Large Language Model) - PICO

Location:

San Jose

Team:

Technology

Employment Type:

Regular

Job Code:

A57637

Share this listing:

Responsibilities

About the Team PICO-MR team is dedicated to pioneering core technologies for intelligent human-computer interaction in MR environments, with a focus on integrating multimodal large language models (MLLM) and tool-use capabilities to redefine user experiences. Our R&D directions cover cutting-edge fields including multimodal scene understanding, MLLM-based agent systems, tool-augmented MR interaction, 3D environment perception, and AIGC-driven content generation. Within MR scenarios, our work spans: MLLM optimization and adaptation for MR, intelligent task execution with tool use, multimodal scene understanding (vision, point clouds, text), AIGC-based scene generation, depth estimation (Mono/Stereo/MVS), 3D environment perception, large-scale 3D scene reconstruction (3DGS, NeRF, etc.), visual localization, and lighting estimation—encompassing both fundamental research breakthroughs and industrial-grade solution deployment. Responsibilities: 1. Lead the R&D of multimodal large language models (MLLM) tailored for MR scenarios, integrating vision, point clouds, text, and other multimodal information—including model architecture optimization, cross-modal alignment, data construction, evaluation system enhancement, and end-to-end training/inference acceleration. 2. Drive the research and implementation of MLLM tool-use capabilities in MR environments, enabling models to proficiently utilize spatial interaction and spatial computing-related professional tools, support tool calls for both single-turn and multi-turn conversations, and solve complex user tasks through interaction. 3. Address key challenges in long-horizon, multi-turn tool-augmented tasks in MR, such as context memory management, tool selection strategy, and error correction mechanisms. 4. Keep abreast of cutting-edge technologies in MLLM, multimodal intelligence, and tool-use research, and lead the application and deployment of innovative technologies in PICO's MR products. 5. Collaborate with cross-functional teams (including software engineering, product design, and hardware development) to translate research outcomes into practical features that enhance user experience.

Qualifications

Minimum Qualifications 1. Master's or Ph.D. degree in Computer Science, Electrical Engineering, Machine Learning, Artificial Intelligence, or a related quantitative field. 2. Expertise in multimodal large model pre-training, post-training, fine-tuning, or cross-modal fusion technologies, with hands-on experience in model optimization, training workflow design, and performance tuning. 3. Proven research experience in LLM tool use, reinforcement learning, LLM agents, or interactive learning, with a deep understanding of single-turn and multi-turn interaction mechanisms. 4. Proficiency in core 2D/3D computer vision tasks, including detection, segmentation, depth estimation, image matching, and 3D scene perception. 5. Skilled in Python and C++, with solid programming capabilities and experience in developing large-scale models using mainstream deep learning frameworks (PyTorch/TensorFlow). 6. Excellent problem-solving and independent research abilities, capable of addressing complex technical challenges in the integration of MR and MLLM tool use. Preferred Qualifications 1. Publications in AI/ML/CV conferences (e.g., NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ACL, EMNLP) focusing on multimodal large models, LLM tool use, or agent systems. 2. Hands-on experience in building large-scale MLLM training pipelines, tool-use evaluation systems, or multimodal agent platforms. 3. Familiarity with MR/AR/VR technologies, spatial computing, or 3D scene reconstruction (3DGS, NeRF, etc.) is a strong plus. 4. Experience in addressing long-horizon reasoning or asynchronous agent behavior challenges is highly valued. 5. Award winners of competitions such as ACM-ICPC, NOI/IOI, TopCoder, or AI/ML contests (e.g., Kaggle) are preferred. 6. Strong collaboration and communication skills, able to lead research initiatives and drive cross-team technical alignment.

Job Information

【For Pay Transparency】Compensation Description (Annually)

The base salary range for this position in the selected city is $212800 - $450000 annually.​
Compensation may vary outside of this range depending on a number of factors, including a candidate’s qualifications, skills, competencies and experience, and location. Base pay is one part of the Total Package that is provided to compensate and recognize employees for their work, and this role may be eligible for additional discretionary bonuses/incentives, and restricted stock units.​
Benefits may vary depending on the nature of employment and the country work location. Employees have day one access to medical, dental, and vision insurance, a 401(k) savings plan with company match, paid parental leave, short-term and long-term disability coverage, life insurance, wellbeing benefits, among others. Employees also receive 10 paid holidays per year, 10 paid sick days per year and 17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure).​
The Company reserves the right to modify or change these benefits programs at any time, with or without notice.​
For Los Angeles County (unincorporated) Candidates:​
Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state, and local laws including the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. Our company believes that criminal history may have a direct, adverse and negative relationship on the following job duties, potentially resulting in the withdrawal of the conditional offer of employment:​
1. Interacting and occasionally having unsupervised contact with internal/external clients and/or colleagues;​
2. Appropriately handling and managing confidential information including proprietary and trade secret information and access to information technology systems; and​
3. Exercising sound judgment.​

About Pico

Founded in April 2015, Pico is a VR company committed to developing immersive and interactive VR experiences for people around the world. Pico also provides tailor-made solutions for our enterprise clients in the fields of education and healthcare.​

Why Join ByteDance

Inspiring creativity is at the core of ByteDance's mission. Our innovative products are built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and enrich life - a mission we work towards every day.​
As ByteDancers, we strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our Company, and our users. When we create and grow together, the possibilities are limitless. Join us.​
Diversity & Inclusion​
ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.​

Reasonable Accommodation

ByteDance is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws. If you need assistance or a reasonable accommodation, please reach out to us at https://tinyurl.com/RA-request​