Senior ML Engineer @ J-Squared Technologies · Toronto

Jaskaran Singh I make AI run anywhere.

Machine Learning Engineer with a Software Engineer's backbone. I take models from research papers to production: Agentic Pipelines, LLM Inference, and Computer Vision that ships on real hardware for real clients.

Edge AI LLM Inferencing Agentic AI Computer Vision Full-Stack AWS
Jaskaran Singh presenting at an AI conference
Speaking at AI conferences
4+ years
Production software & ML engineering
JP Morgan Alum
Fintech-grade engineering discipline
UofT MScAC
Master's in Applied Computing
CANSEC 2026
Speaker · FalconVeo video RAG
About

Engineer first, researcher close second.

I'm a Senior Machine Learning Engineer at J-Squared Technologies, where I take AI from research papers to rugged hardware — designing agentic data-generation pipelines, building privacy-preserving video RAG, and optimizing vision models until they hit single-digit-millisecond budgets on NVIDIA Jetson and Hailo-8 for clients across the retail-mining, manufacturing and defence sectors.

Before that I shipped customer-facing microservices at JP Morgan Chase and growth tooling at Scaler, so I'm equally at home in React, .NET and Rails as I am in CUDA and Rust. I did my master's in Applied Computing at the University of Toronto with a perfect 4.0, and I write about ML systems on Medium.

The thread through all of it: I like making powerful models small, fast and local — and building the systems engineering around them that turns a demo into a product.

University of Toronto

MSc in Applied Computing (MScAC) — Computer Science

2022 – 2023 · GPA 4.0 / 4.0 · A+ in ML, Deep Learning, NLP & Computational Imaging

Thapar Institute of Engineering & Technology

B.E. in Computer Engineering

2018 – 2022 · GPA 9.55 / 10

Edge AI & Model Optimization

Quantization (PTQ + QAT), pruning, distillation and custom CUDA / TensorRT kernels — detection, segmentation, re-ID and pose models meeting sub-5 ms budgets on Jetson and Hailo-8.

LLM Systems & Agentic AI

RAG over knowledge graphs, MCP servers for real tools, multi-agent annotation workflows, and local quantized inference — Ollama, Candle (Rust), vLLM, TensorRT-LLM.

Full-Stack ML Engineering

Lock-free C++ IPC backbones, Rust inference services, REST APIs, React frontends and AWS architecture — the production plumbing that makes models actually usable.

Experience

Where I've shipped.

J-Squared Technologies

Senior Machine Learning Engineer

May 2023 — Present
  • Agentic annotation at scale — combined diffusion models, Grounding DINO + SAM, and LLM decision agents (Llama 3.2-3B, Phi-3) into an autonomous generation-and-annotation workflow: 100K+ samples produced, manual labeling cut by 90%.
  • FalconVeo, agentic video RAG — on-device video search (CLIP + MS-TEMBA + Qwen3.5-VL) for privacy-preserving industrial clip retrieval; presenting at CANSEC 2026, one of the world's largest defence conferences.
  • Sprint-summarization RAG — Notion ingestion via MCP, Neo4j knowledge-graph reasoning, and GPT-OSS-20B served locally through Ollama (native MXFP4); cross-team reporting went from 2+ hours to under 30 minutes.
  • Vivado-MCP — internal MCP server exposing 67 Xilinx Vivado TCL tools to local LLMs, powering AI-driven HDL, testbench and constraint generation for the FPGA team — fully on-device.
  • GPU kernel & model optimization — PTQ + QAT, pruning, distillation and custom CUDA / TensorRT kernels across detection, segmentation, re-ID and pose; sub-5 ms latency on Hailo-8 and Jetson for retail-mining, manufacturing and defence clients.
  • Systems work — lock-free shared-memory ring buffer in C++ powering 4× concurrent-stream vision inference on Jetson AGX at <25 ms end-to-end, and a memory-optimized Rust LLM inference service (Candle + Actix-Web) with 2× faster cold starts than Python baselines.
PyTorchCUDATensorRT RustC++Ollama MCPNeo4jJetson · Hailo-8

JP Morgan Chase

Software Engineer

Jan 2022 — Aug 2022
  • Migrated customer-facing CIB microservices from Angular to React (Hooks + Redux); A/B-tested rollout drove +10% engagement and faster page loads, earning a team recognition award.
  • Designed and shipped .NET Core REST APIs for production financial transaction workflows, hardened with rate limiting, URL escaping, CORS and CSRF protection.
ReactRedux.NET Core Microservices

Scaler Academy

Software Development Engineer Intern

Jul 2021 — Dec 2021
  • Rebuilt the referral and newsletter dashboards plus embeddable marketing widgets in React; the redesigned referral flow doubled the conversion rate.
  • Built Rails REST APIs and ActiveRecord models for referral tracking and growth analytics, with SQL migrations and Jenkins CI/CD on AWS.
ReactRuby on RailsAWS Jenkins
Projects

Selected builds.

Research projects, production systems and the occasional hackathon artifact — most with code on GitHub.

fragivo ● LIVE

Fragivo — AI Fragrance Platform

LLM- and vision-powered fragrance discovery on AWS: OAuth, Google-Search-grounded analysis, prompt-engineered recommendations.

LLMsAWSRecSys
Medical text summarization

LLMs for Medical Text Summarization

Fine-tuned and benchmarked GPT-3/4, T5, BART and Pegasus on medical summarization — ROUGE, BERTScore and inference cost head-to-head. Published as a preprint.

LLMsNLPBenchmarking
Medical image enhancement using GANs

MedGANs — Medical Image Enhancement

End-to-end GAN pipeline for medical image denoising and enhancement — published in IJESE (2025) with single-shot HDR and edge-enhancement post-processing.

GANsImagingPublished
Active learning research paper classification

Active Learning for Paper Classification

Supervised + active + semi-supervised learning to classify mixed labeled/unlabeled research papers with minimal annotation budget.

Active LearningNLP
Cassava leaf disease prediction

Cassava Leaf Disease Prediction

Deep-learning classifier identifying viral diseases from cassava leaf photos — computer vision for low-resource agriculture.

CVCNNs
Disaster response pipeline

Disaster Response Pipeline

Production-grade ETL + ML pipeline classifying real disaster messages in real time, served behind an API.

NLPData Engineering
Plagiarism detection

Plagiarism Detection

Binary classifier scoring text similarity against source documents using containment and LCS features, deployed on SageMaker.

MLSageMaker
Recommendation engine

Recommendation Engine

Rank-based, collaborative-filtering and SVD recommenders built on IBM Watson Studio interaction data.

RecSysSVD
Hate speech detection

Hate Speech Detection

NLP classifier flagging racist and hateful tweets — text preprocessing, embeddings and supervised classification on real Twitter data.

NLPClassification
Research & Speaking

Publications, preprints & talks.

Journal

An End-to-End Pipeline for Medical Image Enhancement Using GANs Architecture

Singh, J., Patel, T., & Dankar, A. · International Journal of Emerging Science and Engineering, Vol. 13(3), pp. 32–39 · 2025

Preprint

Performance Analysis of LLMs for Medical Text Summarization

OSF Preprints — GPT-3/4, T5, BART and Pegasus compared on quality (ROUGE / BERTScore) and inference cost.

Preprint

Empirical Study of Supervised & Active Learning for Classifying Research Papers

OSF Preprints — conventional ML combined with active and semi-supervised learning for efficient classification of partially-labeled corpora.

Speaker · 2026

FalconVeo — Agentic Video RAG for Privacy-Preserving Industrial Clip Retrieval

CANSEC 2026, Ottawa — presenting J-Squared's on-device video search system (CLIP + MS-TEMBA + Qwen3.5-VL) at one of the world's largest defence & security conferences.

Speaking

Topics I speak and write about

Edge AI deployment in the real world · LLM inference optimization — quantization, pruning, distillation and custom CUDA/TensorRT kernels · on-device agents and privacy-preserving RAG · Rust vs Python for ML serving.

Photos

On stage & on the demo floor.

Moments from conferences and showcases — presenting Edge AI and LLM work.

Credentials

Certifications & achievements.

AWS Certified ×3

  • Machine Learning Engineer
  • AI Practitioner
  • Cloud Practitioner

Hackathon Wins

  • Code For Good — JP Morgan · Winner
  • Smart India Hackathon — Runner-up
  • Startup Punjab Hackathon — Winner

Udacity Nanodegrees

  • Machine Learning Engineer
  • Data Scientist
  • Machine Learning with TensorFlow

Coursera Specializations

  • Deep Learning with PyTorch — IBM
  • AI in Marketing & Finance — UPenn
  • NLP — DeepLearning.AI
  • Algorithmic Toolbox — UC San Diego
Contact

Let's build something that ships.

Whether it's edge AI, agentic systems, or an idea you want a second brain on — my inbox is open.

Toronto, Ontario, Canada +1 (437) 986-0064