
Responsibilities
About the team Seed Global Data is a team focused on producing international data for LLMs. For the training of large models, data is the lifeline of model quality — and the Global Data team is working closely with technical, product, and operations teams to ensure effective data production strategies and execution management. As a project intern, you will have the opportunity to engage in impactful short-term projects that provide you with a glimpse of professional real-world experience. You will gain practical skills through on-the-job learning in a fast-paced work environment and develop a deeper understanding of your career interests. Applications will be reviewed on a rolling basis - we encourage you to apply early. Successful candidates must be able to commit to at least 3 months long internship period. Your Role Will Involve: 1. Support multiple coding-focused LLM training projects, ensuring that timelines, quality standards, and objectives are achieved. Identify risks and propose corrective actions as required to keep projects on track. There may also be general or reasoning-related LLM projects that would require your support. 2. Establish and maintain strong relationships with product managers, project owners, researchers, and other supporting project members. Communicate project updates and concerns in a timely fashion to ensure prompt follow-up by project owners or project managers. 3. Develop code scripts for diverse project-related purposes, such as automating key processes, conducting data analysis, and converting file types and formatting to meet specific requirements of various platforms. Support general annotation operation improvement initiatives across multiple data domains. Create and maintain technical guidelines and casebooks to support consistent and high-quality data production from external parties. 4. Analyze annotation quality, model performance, and dataset coverage through statistical, visual, and programmatic methods. Employ tools like Python (Pandas, NumPy, Matplotlib) and SQL to generate actionable insights, monitor the health of the data pipeline, and support model training operations. Collaborate with model trainers and researchers to inform training strategies and guide data - centric iterative improvements.
Qualifications
Minimum Qualifications 1. Currently pursuing Bachelor's or Master's degree in Computer Science, a related technical field, or equivalent practical experience. 2. Experienced in project or operations management roles on software engineering teams, possessing strong project management skills to design, manage, and proactively optimize complex workflows, while balancing independent judgment with collaborative teamwork in a fast-paced, project-based environment. 3. Experienced with programming languages such as Python, Java, Go, or C, C++ acquired through coding projects or technical roles. 4. Problem-solving skills with the ability to understand and communicate technical concepts effectively to the layman. 5. Deep interest in LLMs, computational thinking, and ability to adapt to a high-intensity work environment with an objective-driven mindset. 6. Familiar and able to work with markup and typesetting languages: HTML, LaTex and Markdown. 7. Exceptional proficiency in both English and Mandarin, with strong written and oral communication skills required to collaborate with internal teams and stakeholders across English and Mandarin-speaking regions. Preferred Qualifications: 1. Experience in competitive coding such as Codeforce, CPC at regional or international level. 2. Experience in LLM annotation and evaluation processes, working with leading AI/LLM companies on technical projects. 3. Experience with codebases and understanding of software development processes, coding best practices, and version control systems (e.g., Git). Familiarity with full-stack concepts, including front-end interfaces, back-end logic, and database integration. 4. Enthusiasm for learning, engaging with diverse technical case studies, working with global teams, and comfort with using and developing simple technology tools that enhance project efficiency. By submitting an application for this role, you accept and agree to our global applicant privacy policy, which may be accessed here: https://jobs.bytedance.com/en/legal/privacy If you have any questions, please reach out to us at apac-earlycareers@bytedance.com
Job Information
About Doubao (Seed)
Founded in 2023, the ByteDance Doubao (Seed) Team, is dedicated to pioneering advanced AI foundation models. Our goal is to lead in cutting-edge research and drive technological and societal advancements.
With a strong commitment to AI, our research areas span deep learning, reinforcement learning, Language, Vision, Audio, AI Infra and AI Safety. Our team has labs and research positions across China, Singapore, and the US.
Why Join ByteDance
Inspiring creativity is at the core of ByteDance's mission. Our innovative products are built to help people authentically express themselves, discover and connect – and our global, diverse teams make that possible. Together, we create value for our communities, inspire creativity and enrich life - a mission we work towards every day.
As ByteDancers, we strive to do great things with great people. We lead with curiosity, humility, and a desire to make impact in a rapidly growing tech company. By constantly iterating and fostering an "Always Day 1" mindset, we achieve meaningful breakthroughs for ourselves, our Company, and our users. When we create and grow together, the possibilities are limitless. Join us.
Diversity & Inclusion
ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At ByteDance, our mission is to inspire creativity and enrich life. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.