J-Squared Technologies
Senior Machine Learning Engineer
- Agentic annotation at scale — combined diffusion models, Grounding DINO + SAM, and LLM decision agents (Llama 3.2-3B, Phi-3) into an autonomous generation-and-annotation workflow: 100K+ samples produced, manual labeling cut by 90%.
- FalconVeo, agentic video RAG — on-device video search (CLIP + MS-TEMBA + Qwen3.5-VL) for privacy-preserving industrial clip retrieval; presenting at CANSEC 2026, one of the world's largest defence conferences.
- Sprint-summarization RAG — Notion ingestion via MCP, Neo4j knowledge-graph reasoning, and GPT-OSS-20B served locally through Ollama (native MXFP4); cross-team reporting went from 2+ hours to under 30 minutes.
- Vivado-MCP — internal MCP server exposing 67 Xilinx Vivado TCL tools to local LLMs, powering AI-driven HDL, testbench and constraint generation for the FPGA team — fully on-device.
- GPU kernel & model optimization — PTQ + QAT, pruning, distillation and custom CUDA / TensorRT kernels across detection, segmentation, re-ID and pose; sub-5 ms latency on Hailo-8 and Jetson for retail-mining, manufacturing and defence clients.
- Systems work — lock-free shared-memory ring buffer in C++ powering 4× concurrent-stream vision inference on Jetson AGX at <25 ms end-to-end, and a memory-optimized Rust LLM inference service (Candle + Actix-Web) with 2× faster cold starts than Python baselines.
PyTorchCUDATensorRT
RustC++Ollama
MCPNeo4jJetson · Hailo-8