Transcription & Document Parsing, Simplified.
A powerful FastAPI service to handle speech-to-text, document parsing, and OCR all in one place. Upload any supported file and get the extracted text back instantly.
Core Features
Everything you need to process content from any source.
Audio/Video Transcription
High-quality speech-to-text using the blazing-fast Faster-Whisper engine.
Document Parsing
Extracts text from `.pdf`, `.docx`, `.xlsx`, `.csv`, and `.txt` files seamlessly.
Image OCR
Reads text from scanned images like `.png` or `.jpg` using Tesseract.
Supported File Types
A wide range of formats are supported out-of-the-box.
Category | Formats |
---|---|
Audio / Video |
.mp3 , .wav , .mp4 ,
.m4a
|
Documents |
.pdf , .docx , .xlsx ,
.csv , .txt
|
Images (OCR) |
.png , .jpg , .jpeg
|
Deploy Your Instance
Get the API running in minutes with Docker. Choose your preferred method below.
Instructions
Follow these steps to build and run the Docker container on your local machine. This is ideal for development and testing.
# 1. Clone the repository
git clone https://github.com/shahzaibtkturners/transcription-api.git
cd transcription-api
# 2. Build the Docker image
docker build -t transcription-api .
# 3. Run the container
docker run -d -p 8000:8000 --name transcription-service transcription-api
API Usage
Once running, send a POST request to the `/upload/` endpoint.
Example Request
Use `curl` or any HTTP client to send a `multipart/form-data` request with your file.
curl -X POST "http://localhost:8000/upload/" \
-H "accept: application/json" \
-H "Content-Type: multipart/form-data" \
-F "file=@/path/to/your/sample.pdf"
Example Response
You'll receive a JSON response containing the extracted text and file metadata.
{
"filename": "sample.pdf",
"content_type": "application/pdf",
"text": "This is the extracted text content..."
}
The interactive API documentation is also available at
/docs
on your
server.
Contribute & Support
This project is open-source. We welcome contributions and support from the community!