Case Study

Building a Scalable Online Coding and Interview Platform Using Django Rest Framework and Kubernetes

A technology startup, CodeCraft, set out to create an online platform to streamline coding interviews and assessments for software developers. The platform needed to handle various services such as user authentication, coding problem design, code execution, assessment management, and media recording for interviews. To build a robust and scalable backend, CodeCraft utilized Django Rest Framework (DRF) for the API services, containerized the entire infrastructure, and deployed it on Kubernetes using Helm for orchestration. The platform was hosted on AWS Elastic Kubernetes Service (EKS), with observability enabled via Grafana, Prometheus, and Loki.

Quick
Summary

Team Size

6 Members

Duration

3 months

Platform

React Native, Firebase

The Challenges

Complex Microservices Architecture

The platform needed to have multiple backend services, including user authentication, coding execution, problem design, and media services, while maintaining smooth communication between them.

Scalability and Performance

The system had to handle high traffic, particularly during peak interview seasons, while providing real-time coding execution and video interview recording.

Service Observability and Logging

To ensure reliable operation of the platform, there needed to be real-time monitoring of services and efficient log management for debugging and performance analysis.

Cross-Platform Video Interviewing

The platform required a media service that could record live coding interviews between candidates and interviewers.

Solution: Key Components of the Architecture

Django Rest Framework (DRF) for Backend Services

Authentication Service

Built using DRF, the authentication service managed user registration, login, password resets, and user roles. JWT (JSON Web Tokens) were used for secure and stateless authentication, enabling seamless session management across the microservices.

Coding Problem Design Service

This service allowed administrators and content creators to define coding problems, categorize them by difficulty, and provide sample input/output data. DRF’s flexible serialization helped create, update, and manage problem metadata efficiently.

Code Execution Service

The core feature of the platform, this service allowed users to execute code in multiple programming languages in a sandboxed environment. Code execution was handled by isolated Docker containers for security, with the service providing results such as test case success or failure.

Assessment Service

This service facilitated the creation and management of coding assessments, assigning problems to candidates, and tracking their performance in real-time. The service also recorded test submissions and graded them automatically.

Media Service (Video Interview Recording)

Leveraging AWS Chime SDK, this service enabled real-time video conferencing between candidates and interviewers. It supported recording and storing video/audio streams for future review. AWS Chime handled the video processing while the backend managed interview session metadata and file storage.

Containerization with Docker

Consistency

Each service (authentication, problem design, code execution, assessment, and media) was containerized using Docker.
Docker allowed the team to manage dependencies for each service independently, ensuring that updates to one service didn’t affect others.

Kubernetes for Container Orchestration

AWS EKS (Elastic Kubernetes Service)

All microservices were deployed on AWS EKS, which provided the scalability and resilience required for the platform. Kubernetes allowed automatic scaling of the code execution and assessment services based on traffic loads, particularly during high-traffic periods like university recruiting seasons.

Helm for Deployment

Helm charts were used to manage Kubernetes deployments. Each service had its own Helm chart, enabling streamlined updates and rollbacks. Helm also facilitated the management of configurations, secrets, and environment variables.

Persistent Storage and Load Balancing

AWS EKS provided built-in load balancing and persistent storage for databases (managed via AWS RDS for PostgreSQL) and media files (stored in AWS S3).

Service Monitoring with Grafana and Prometheus

Real-Time Alerts and Metrics

Prometheus was integrated with the Kubernetes cluster to collect real-time metrics on CPU usage, memory consumption, request latency, and service health across the microservices.
Grafana dashboards were used for real-time visual monitoring of service performance, helping the DevOps team quickly identify and resolve performance bottlenecks or service disruptions.
Alerts were configured for key metrics such as high memory usage in the code execution service, ensuring prompt action could be taken before issues escalated.

Logging and Observability with Loki

Centralized Logging

Loki was used to aggregate and manage logs across the entire platform. Logs from all services (e.g., user actions, code submissions, API errors, and system events) were streamed to Loki, where they could be searched and analyzed in real time.
Centralized logging allowed the team to debug issues quickly, trace specific requests, and identify patterns in errors or performance issues.

API Schema and Documentation with Swagger

Good Documentation

All REST API services followed the JSON:API specification to ensure consistency and compatibility across endpoints, with predictable request and response structures.
The API endpoints were fully documented using Swagger, which automatically generated API documentation that developers could use to test and interact with the services. Swagger’s UI provided easy access to all endpoints, including authentication, problem management, and assessment workflows.

Results:

Cross-Platform Availability and Scalability

CodeCraft successfully delivered a fully functional platform available on both web and mobile devices, with the backend efficiently handling thousands of concurrent users during peak periods.
Kubernetes auto-scaling on AWS EKS allowed the platform to handle fluctuating loads, especially during company-wide assessments or when many coding interviews were scheduled simultaneously.

Efficient Code Execution and Real-Time Feedback

The platform’s code execution service provided immediate feedback on coding submissions, offering detailed results on runtime, errors, and test case success or failure. The use of isolated Docker containers for execution ensured that code was run in a secure and isolated environment, protecting against potential exploits.

Seamless Video Interview Integration with AWS Chime

The media service, integrated with AWS Chime SDK, allowed for real-time video interviews between candidates and interviewers. This feature, paired with the ability to record and store interviews, made the platform a one-stop solution for conducting and reviewing technical interviews.

Service Observability and Proactive Issue Management

With Prometheus and Grafana providing real-time metrics, the team could monitor system performance at a granular level, making proactive adjustments to ensure service stability. Alerts configured for critical metrics allowed for quick action, reducing system downtime.
Loki’s centralized logging streamlined troubleshooting and debugging, significantly reducing the time spent identifying issues.

Cost-Effective and Scalable Architecture

By containerizing services and using Kubernetes for orchestration, CodeCraft was able to optimize resource utilization. AWS EKS provided scalability without requiring significant manual intervention, allowing the platform to grow efficiently as its user base expanded.

API Consistency and Usability

The use of JSON:API standards and Swagger documentation ensured that the API was easy to use and integrate with. Third-party developers and clients using the API for external integrations found the documentation and predictable API structure highly useful.

Conclusion

By leveraging Django Rest Framework, Kubernetes, and AWS EKS, CodeCraft successfully built a scalable, secure, and feature-rich platform for coding assessments and interviews. The use of Docker for containerization, Helm for deployment, and a microservices architecture enabled the platform to scale efficiently and handle complex workflows like real-time code execution and video interviews. With strong monitoring, observability, and logging through Prometheus, Grafana, and Loki, CodeCraft ensured high availability and quick recovery from potential issues. This architecture provided a robust, cost-effective solution for the company’s technical interviewing and assessment needs.