Bengali Linguist (Contract)

Job type: Contract · Department: Data Operations · Work type: Remote

Palo Alto, California, United States

About Poseidon

Poseidon is building the data infrastructure the next generation of AI will depend on. Foundation models are not limited by compute. They are bottlenecked by rare, high-quality, IP-safe data that actually improves performance: the long-tail, edge-case, multi-modal datasets that cannot be scraped or synthetically generated.

We are creating a data layer connecting AI companies with the datasets they need. Poseidon is the infrastructure that makes scalable, compliant, demand-driven data sourcing possible.

Backed by a16z, we are early, moving fast, and looking for mission-driven teammates to shape this category.

The Role

We are looking for a Bengali Linguist to support the build of a large-scale Bengali audio dataset for AI training purposes. This is a contract role working closely with Poseidon's data operations team to ensure the quality, accuracy, and linguistic integrity of our Bengali audio corpus.

You will be a core part of getting this dataset across the finish line: reviewing transcripts, flagging errors, and serving as the subject matter expert our team relies on when Bengali-specific questions arise. This role requires someone who is deeply fluent in Bengali, comfortable with ambiguity, and able to operate with a high degree of independence.

What You'll Do

Review and quality-check Bengali audio transcripts for accuracy, completeness, and adherence to transcription guidelines
Listen to and evaluate Bengali audio files for speech quality, naturalness, speaker diversity, and dialect consistency
Identify and flag linguistic issues including mispronunciations, code-switching, dialect variation, transliteration errors, and transcription inconsistencies
Develop and refine transcription guidelines and annotation standards for Bengali
Serve as the internal linguistic authority for Bengali, advising the team on dialect considerations, script conventions, and language-specific edge cases
Collaborate with data operations and engineering teams to scope and define quality benchmarks
Support onboarding and quality calibration of Bengali transcriptionists and annotators as needed

What We're Looking For

Native or near-native fluency in Bengali (both spoken and written, Standard/Kolkata Bengali)
Fluency in English
Strong familiarity with Bengali script, transliteration conventions, and dialectal variation between West Bengal and Bangladesh
Prior experience in transcription, annotation, translation, or linguistic quality review
Meticulous attention to detail with the ability to maintain consistency across large volumes of audio and text
Comfortable working asynchronously and independently in a fast-moving remote environment
Based in the United States

Nice to Have

Experience with ASR (automatic speech recognition) datasets or TTS (text-to-speech) training data
Familiarity with AI data pipelines, labeling tools, or annotation platforms
Academic or professional background in linguistics, South Asian languages, or a related field
Exposure to Indic language NLP or speech AI projects

Why Join Now

This is an opportunity to play a direct role in building one of the most important Bengali speech datasets in existence, one that will meaningfully improve how AI understands and speaks one of the world's most widely spoken languages. You'll work with a lean, mission-driven team at a venture-backed company at the center of the AI data economy.