ALBERTO FERRER
Linux Engineer & AI Architect
For contact information, please visit Contact.
Results-oriented IT Solutions professional with solid background in project management, regulatory compliance and process improvement. Excellent communication skills. Experience working in teams with several companies to develop and deploy technical solutions. Demonstrated success in analyzing areas to implement improvements and achievement of project results that contribute to the overall objectives of the company.
LLM Inference Engines:
- vLLM (PagedAttention, Continuous Batching)
- SGLang (RadixAttention, Prefix Caching)
- TensorRT-LLM (NVIDIA Optimizations)
- NVIDIA NIM (Containerized Microservices)
AI Frameworks:
- TensorFlow / PyTorch
- Hugging Face Transformers
- vLLM / SGLang
- llama.cpp / Unsloth
- NVIDIA Triton Server / TensorRT / NIM
- OpenAI APIs / Claude APIs
- OpenWebUI
- PostgreSQL + pgvector
- ChromaDB
- Milvus / Qdrant
- FAISS
- Elasticsearch Vector
- LangChain / LlamaIndex
Frameworks: CrewAI, LangGraph React Agents, Agno Agents, FastAPI
Projects: Python PoCs, Document Extraction with VLMs, AI System Integrations
- Fine-tuning & Continual Pretraining
- GPT/Transformer Model Creation
- Classification Model Development
- MLflow / Kubeflow
- Ray / Apache Airflow
- Custom Model Publishing
AI Model Publishing & Contributions:
- Published custom models and datasets on Hugging Face Hub
- Developed PoC applications with FastAPI and Python
- Created document extraction systems using Vision Language Models (VLMs)
- Built agentic AI solutions and multi-agent systems
LLM Inference Optimization:
- Production deployment of vLLM, SGLang, TensorRT-LLM
- Distributed KV Cache Management (LMCache, Mooncake, NVIDIA NXIL, Redis)
- Disaggregated Prefill/Decode Architecture (1p1d pattern)
- Performance benchmarking: TTFT (4-500ms), ITL, throughput (up to 12K tokens/sec)
- Multi-node inference orchestration with NVIDIA Dynamo and Grove
GPU Optimization & Hardware:
- NVIDIA GPU platforms: H100, H200, A100, L40S optimization
- GPU memory optimization (90% utilization, FP8/INT8 quantization)
- CUDA optimization, Tensor Core utilization
- NCCL, RDMA, NVLink for inter-GPU communication
- Multi-Instance GPU (MIG), GPU persistence mode
- NVIDIA GPU Operator management
- KEDA auto-scaling for LLM workloads
- Helm charts for AI deployments
- Custom Resource Definitions (CRDs) for AI infrastructure
- LoRA adapter management with AIBrix
- OS tuning: Transparent Huge Pages (THP), BBR congestion control
- NUMA balancing, CPU governor optimization
- Memory management: dirty ratios, swappiness, overcommit
- I/O scheduler tuning (NVMe, SSD optimization)
- Network stack optimization for distributed inference
- Single-node multi-GPU with NXIL (<5us KV cache latency)
- Disaggregated prefill-decode with Mooncake (cost-optimized)
- Multi-cloud datacenter with NVIDIA Dynamo (1.2M tokens/sec)
- Enterprise AI platform with VMware Private AI Foundation
- Development-to-production scaling with vLLM Production Stack
- Vector databases: pgvector, MongoDB Atlas Vector Search, RediSearch
- Hybrid search: text + vector combination
- RAG architecture: document indexing, retrieval, embedding storage
- Knowledge graphs: Apache AGE, RedisGraph
- Session management and LLM response caching
- Model management: validation, versioning, model gallery
- LoRA adapter batching and caching
- Prometheus/Grafana monitoring for LLM metrics
- Auto-scaling based on KV cache, queue depth, latency
- CI/CD for model deployment
- VMware Cloud Foundation integration (VKS, NSX, multi-tenant)
- High Availability: control plane HA, disaster recovery
- Security: network segmentation, secrets management, model signing
- Compliance: model validation, drift prevention, governance
- Systems Management
- Database Administration
- Technical Support
- System Hardening
- Kernels Management
- Systems Security
- Server Administration
CI/SRE: CI/SRE Generation, Process Documentation, Process Auditing, RPM/DEB
Languages & Tools: NIX* Distributions Creation, Multiple Linux Languages
- Advanced LAMP/LEMP
- Nginx Clusters
- PHP-FPM
- MySQL HA Clusters
- Apache Optimization
- SSL/TLS Implementation
VMware Cloud Foundation (VCF):
- VMware vSphere / ESXi administration
- VMware vSAN storage management
- NSX-T network virtualization
- vRealize Suite (Automation, Operations, Log Insight)
- vCenter Server management
- Tanzu Kubernetes Grid (TKG) / vSphere with Tanzu
- VMware Private AI Foundation integration
- Multi-tenant isolation and resource pools
- Supervisor clusters and vSphere Namespaces
- Enterprise HA and disaster recovery
Container Orchestration & Containerization:
- Kubernetes (K8s) administration and architecture
- Docker containerization and image management
- Helm charts development and deployment
- Kubernetes operators and Custom Resource Definitions (CRDs)
- NVIDIA GPU Operator for AI workloads
- KEDA auto-scaling
- Service mesh (Istio, Linkerd)
- CNI plugins (Calico, Flannel, NSX-T)
- Persistent storage (CSI drivers, StatefulSets)
- Monitoring stack (Prometheus, Grafana, ELK)
Cloud Platforms:
- VMware Cloud on AWS
- Microsoft Azure (VMs, AKS, Storage)
- Amazon Web Services (EC2, EKS, S3)
- Multi-cloud orchestration
Virtualization Technologies:
- KVM/QEMU
- vSphere / vCenter
- Hyper-V
- Proxmox
AI Team Leader - Newly formed AI division
- Lead Engineer for Run:ai and their products (SME)
- Lead Engineer for NVIDIA and their AI platform
- Lead Engineer for Applied AI PoC program
- Prototyping of Applications and integrations
- Customer configuration analysis (SWOT)
- Vector Databases implementation
- Embeddings and ML workflows
- DELL Hardware implementation for NVIDIA
- NVIDIA Triton Server, TensorRT, NIM
- Kubernetes for AI (GPU & others)
- Docker images for AI related tools
- vLLM, SGLang, TensorRT-LLM production deployments
- Distributed KV caching with Mooncake, LMCache, Redis
- OS tuning for AI workloads (NUMA, THP, RDMA)
- Performance benchmarking and optimization
- AIBrix orchestration and LoRA management
- NVIDIA Dynamo multi-datacenter inference
- VMware Private AI Foundation integration
- Lead Engineer & Trainer at Escalations Team
- Technical support ownership for customer base
- Advanced troubleshooting and OS-level issue resolution
- Customer loyalty through exceptional service delivery
- Issue escalation management and resolution
- Training and mentoring of Rackers
- Collaboration with CSM, Account Managers, and Incident Management
- Security remediation via Crowdstrike with malware analysis
- Ansible automation applications development
Custom Tools Developed:
- MRMF: Python-based Malware Scanner for LAMP Stack
- Scanware: Rust Application Scanner with plugins
- Traffic Analyzer: Python 3 port with enhanced features
Career Progression: L1 > L2 > L3 > Linux Engineer
- Full Stack Linux System Administration
- Customized support for Enterprise Level accounts
- Account technical expert on Rackspace side
- Technical point of contact and liaison
- Infrastructure documentation preparation
- Project assistance based on customer needs
- Infrastructure recommendations and consultancy
- Continuous infrastructure supervision and monitoring
- Proactive issue identification and resolution
- Company support process management
- Technological implementations
- Documentation writing and maintenance
- Software architecture design
- Company founding and management
- Software development and programming
- Server architecture design and implementation
- Software architecture and system design
- Linux distribution development based on IPCop & RedHat Linux
- CentOS variant implementation
- Wikipedia documented project
- RPM packages updating and maintenance
- Private repositories creation and management
- SRPM Trees upgrading and updating
- Technical articles writing (barrahome.org)
- Community support on Freenode
- Multiple Linux distributions support
- Documentation and knowledge sharing
RedHat, CentOS, Debian, Ubuntu, Unix (Solaris), FreeBSD, Windows
Python, Perl, Bash, PHP, C/C++, Java, PowerShell, Rust
- Cisco Routers and Switches
- LAN/WAN Configuration
- VPN Implementation
- TCP/IP Protocol Suite
- Firewalls Configuration
- Intrusion Detection Systems
- Apache/Nginx (10+ years)
- MySQL/MariaDB Clustering
- PHP-FPM
- cPanel/WHM (10+ years)
- SSL/TLS Implementation
- DNS Management
- RedHat Security Reports: Contributed several kernel failure reports to RedHat
- Bugzilla Contributions: Bug #455833
- Linux Security Forum: Active contributor to gmane.linux.lfs.security
- Community Support: 25+ years providing Linux support and documentation
- High-Performance Computing (HPC)
- Distributed Systems Architecture
- Performance Engineering and Benchmarking
- Cost Optimization for AI Workloads
- Enterprise Compliance and Governance
- Machine Learning Operations (MLOps)
- Site Reliability Engineering (SRE)
- Spanish (Native)
- English (Professional Working Proficiency)
Last Updated: February 2026