Ameer Haj Ali
Ameer is the Head of Platform and Infrastructure Engineering organizations at Anyscale, Inc (A $1.1B startup that provides a unified compute platform for running any AI applications. Here is me demoing our product). The Platform team is responsible for Ray Serve: the fastest growing open-source ML production serving library and builds everything related to the lifecycle of Ray workloads: Ray Client, Jobs, highly-available Services, observability, running ray locally, and runtime environments.
The infrastructure team builds the infrastructure for cluster orchestration, billing, monitoring, autoscaling, which Anyscale and its customers run on. The team also maintains the Ray Cluster and Autoscaler (video), Ray Client, Cloud providers, and Ray on Kubernetes in open-source Ray. The team also develops the core engine of Anyscale’s product for providing the infinite laptop serverless experience, used by every user of Ray and the product.
Ameer completed his Ph.D. in two years (summa cum laude, the fastest in the department) Electrical Engineering and Computer Science at UC Berkeley in the ADEPT Lab and RISE Lab, where he was advised by Professors Ion Stoica and Krste Asanovic. His research focused on Compiling, Auto-Tuning, Code Optimization, Machine Learning, Reinforcement Learning, and hardware for machine learning. At UC Berkeley Ameer helped bring up/led many projects spanning machine learning in compiler optimization and hardware software codesign. This includes Gemmini, AutoPhase, NeuroVectorizer, ProTuner, Ansor, AutoCkt and more.
Ameer finished his M.Sc. studies (summa cum laude, the valedictorian) at the Technion in 2018, where he worked on using emerging memory technologies to enhance the performance of modern computer systems with Professor Shahar Kvatinsky and made multiple journal and conference publications. He also finished four years of undergraduate studies in computer engineering at the Technion in only three years, graduating summa cum laude and receiving the valedictorian honor. During Summer 2019, Ameer worked as a research scientist at Intel Labs in the Brain Inspired Computing Lab where he explored deep reinforcement learning in system optimization, and built NeuroVectorizer and RLDRM (awarded best paper award). During his undergraduate studies, Ameer worked at Mellanox Technologies as a chip designer, focusing on creating design and automation tools that facilitated the formal and dynamic verification process.
Ameer was granted O1 and EB1 (Einstein) US Visas, which are granted for an individual who possesses extraordinary ability.
In his free time, Ameer volunteers as a board member on the board of directors of American Technion Society (ATS), and promotes the underprivileged Arab minority in Israel.
Contact
Email: hajali.<firstname> AT gmail DOT com
Google Scholar: ameerhajali
LinkedIn: ameerhajali
Twitter: @aha_ml
Education
University of California, Berkeley, 2018 - 2020
Ph.D. student, working with Prof. Krste Asanovic and Prof. Ion Stoica.
Finished 5-year track summa cum laude in two years.
Thesis: Machine Learning in Compiler Optimization
Research Interests: Compiling, Auto-Tuning, Code Optimization, Machine Learning, Reinforcement Learning, and hardware for machine learning.
Technion, 2016 - 2018, The Valedictorian
M.Sc., Electrical Engineering.
Finished 4-year track summa cum laude (top 3%) in three years.
Member of the President’s List of highest honors for excellent scholastic achievements every semester.
Technion, 2013 - 2016, The Valedictorian
B.Sc., Computer Engineering.
Finished 4-year track summa cum laude (top 3%) in three years.
Member of the President’s List of highest honors for excellent scholastic achievements every semester.
Professional Experience
Industry
Head of Infrastructure and Platform Engineering Organizations, Anyscale, USA, 01/01/2022-present
- Grew and supported high performing two organizations from 0 to 30 engineers (L3-L7, TLs, Managers, PMs) in less than 1 year.Engineering Manager, Anyscale, USA, 09/23/2021-01/01/2022
- The objective owner and project manager of Anyscale General Availability (GA) project. Responsible for the sales, marketing, engineering, and product management departments to deliver the Anyscale GA product (40 Eng/TL/EM, 4 PMs, 3 SAs, 2 Security, 4 marketing/sales. Ex. Google/Uber/Microsoft /Facebook/Linkedin/Databricks/Amazon/etc).
- Leading the cloud platform engineering (1 Tech Lead/Architect, 2 PMs, 2 Managers, 11 L3-L6 SWEs), which builds the foundational blocks of Anyscale's serverless infrastructure end-to-end. This includes cluster orchestration, autoscaling, logging, metrics, billing, and a multi-cloud, multi-region architecture that provides a reliable and scalable managed Ray experience for Anyscale customers.
- Project manager of Anyscale's All-In-Our-Account (AIOA) and All-In-Customer-Account (AICA) projects. AIOA is a new K8s-based infrastructure that is completely managed by Anyscale (Anyscale Cloud) that customers can run on. AICA is a new K8s-based infrastructure that is managed by Anyscale but runs in the customer's account.
Tech Lead Manager, AnyScale, USA, 03/03/2021-09/23/2021
- Led Tiger Team 1, Tiger Team 2, and the most influential projects at Anyscale (by that time). I was overseeing 25+ Engineers, 7 Managers, 5 PMs, SREs, and 3 Product Designers. These projects productionized multiple new Anyscale products in "the fastest execution seen at Anyscale". This includes a new UI, frontend, backend, CLI, compute environments, data pipeline, and transitioning from open source Ray to the product with zero code changes.
A demo video of the final product was broadcasted in the Ray Summit 2021.
- led the Serverless team(7 Eng, 1 PM, ex. Google/Uber/MS/FB/Amazon/etc), which develops and maintains the Ray Cluster and Autoscaler (video), Ray Client, Cloud providers, and Ray on Kubernetes. The team develops the core engine of Anyscale's product for providing the infinite laptop serverless experience, used by every user of Ray and the proprietary product.
- led the development of C++ API for Ray to allow Anyscale’s users to run distributed C++ applications. This is being used by multiple companies including Intel, Ant Financial, and ByteDance.
- led multiple projects requested by Anyscale’s early customers to get them on board. For example, I built a cluster management system to allow them to run Ray on their on-premise clusters
- My team works closely with customers to address all their needs and concerns as fast as possible.
- led the development of features that reduced the company’s annual cloud providers’ bill by more than $0.5M (growing linearly with the number of employees).
Tech Lead, 12/11/2020-03/03/2021
- Built the new autoscaling infrastructure in Anyscale's product.
- Led the development of the open source Ray K8s operator.
Sr. Software Engineer, AnyScale, USA, 11/15/2019-12/11/2020
- Maintained the Ray autoscaler.
- Implemented scalable Scikit-learn on top of Ray that runs on large clusters.
- Lead the implementation of C++ client API for Ray.
- Built a cloud gateway that enables plugging any type of remote clusters including on-prem.
- Contributed to RLlib: scalable reinforcement learning library.
- Directed the first Ray meetup 2019 in Israel.
Board Member, American Technion Society, USA, 2019 - present
AI Research Intern, Intel Labs (Brain Inspired Computing Lab), USA, Summer 2019
- NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning.
Published in CGO 2020 (the premier conference in compilers).
Open sourced under Intel GitHub repository.
- A View on Deep Reinforcement Learning in System Optimization.
Available on arXiv.
- RLDRM: Closed Loop Dynamic Intel RDT Resource Allocation with Deep Reinforcement Learning.
Published in NetSoft 2020 (received Best Paper Award).
Chip Designer, Mellanox Technologies (R&D), Yokneam, 2015 - 2016
- Worked on creating design and automation tools that facilitated the formal and dynamic verification process. Worked especially with Python, scripting languages, C++, and Verilog.
Graduate Teaching Assistance
University of California, Berkeley, 2019 - present
Graduate Student Instructor (GSI), Introduction to Machine Learning (CS 189/289A).
Technion, 2015 - 2018
Head Teaching Assistant, Circuit Theory (700+ students, 044105).
Head Teaching Assistant, Electronic Switching Circuits (300+ students, 044147).
Supervisor of B.Sc. projects, VLSI Lab and Parallel Systems Lab (044167).
Teaching Assistant, MATLAB.
Awards and Fellowships
The person of the year in my home city (45,000 residents), Shefaraam, 2022.
Granted the EB1 + Green Card (Einstein Visa for Extraordinary Ability), USA, 2021.
Granted the O1 extraordinary ability Visa, USA, 2020.
The Valedictorian Honor (M.Sc.), Technion, 2019.
Open Gateway Fellowship, UC Berkeley, 2018.
The William Oldham Fellowship, UC Berkeley, 2018.
The Valedictorian Honor (B.Sc.), Technion, 2017.
Dean’s scholarship for excellent graduate students, Technion, 2016.
Full tuition scholarship for M.Sc. studies, Technion , 2016-2018.
The System Architecture Labs Cluster Prize for outstanding undergraduate projects (received twice), Technion, 2016.
Excellence award from Apple for excellent scholastic achievements, Technion, 2016.
Member of the President’s List of highest honors for excellent scholastic achievements in all undergraduate semesters (top 3%), Technion, 2013-2016.
Full tuition scholarship for B.Sc. studies, Technion, 2013-2016.
Advised Students
University of California, Berkeley
Chloe Liu (First employment: graduate student at Stanford).
Ian Galbraith (First employment: software engineer at Twilio)
Fang Shuo Deng (First employment: software engineer at Abnormal Security)
Israel Institute of Technology, Technion
Stav Belogolovsky (Test and DFT Engineer at Arbe)
Amnon Wahle (Algorithm Research at BeyondMinds)
Publications
Machine Learning in Compiler Optimization, PhD Thesis.
Ameer Haj-Ali.
[thesis]
Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration. Nominated for Best Paper Award.
Hasan Genc, Seah Kim, Alon Amid, Ameer Haj-Ali, Vighnesh Iyer, Pranav Prakash, Jerry Zhao, Daniel Grubb, Harrison Liew, Howard Mao, Albert Ou, Colin Schmidt, Samuel Ste, John Wright, Ion Stoica, Jonathan Ragan-Kelley, Krste Asanovic, Borivoje Nikolic, Yakun Sophia Shao
58th ACM/ESDA/IEEE Design Automation Conference (DAC 2021), December 2021. [paper]
TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers
Lianmin Zheng, Ruochen Liu, Junru Shao, Tianqi Chen, Joseph E. Gonzalez, Ion Stoica, Ameer Haj Ali
35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (NeurIPS 2021), December 2021. [paper]ProTuner: Tuning Programs with Monte Carlo Tree Search
Ameer Haj-Ali, Hasan Genc, Qijing Huang, William Moses, John Wawrzynek, Krste Asanović, Ion Stoica.
arXiv preprint, 2020. [paper]
Ansor: Generating High-Performance Tensor Programs for Deep Learning
Lianmin Zheng, Chengfan Jia, Minmin Sun, Zhao Wu, Cody Hao Yu, Ameer Haj-Ali, Yida Wang, Jun Yang, Danyang Zhuo, Koushik Sen, Joseph Gonzalez, Ion Stoica.
The 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020. [paper]
NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning. Work done in a summer internship at Intel Labs.
Ameer Haj-Ali, Nesreen Ahmed, Ted Willke, Sophia Shao, Krste Asanovic, Ion Stoica.
International Symposium on Code Generation and Optimization (CGO), 2020. [paper][code][video]
AutoPhase: Juggling HLS Phase Orderings in Random Forests with Deep Reinforcement Learning.
Ameer Haj-Ali*, Qijing Huang*, William Moses, John Xiang, Krste Asanovic , John Wawrzynek, Ion Stoica.
Proceedings of Machine Learning and Systems (MLSys), 2020. [paper][code][video]
AutoCkt: Deep Reinforcement Learning of Analog Circuit Designs.
Keertana Settaluri, Ameer Haj-Ali, Qijing Huang, Suhong Moon, Kourosh Hakhamaneshi, Ion Stoica, Krste Asanovic, Borivoje Nikolic.
Design Automation and Test in Europe (DATE), 2020. [paper][code]
RLDRM: Closed Loop Dynamic Cache Allocation with Deep Reinforcement Learning for Network Function Virtualization. Work done in a summer internship at Intel Labs. Best Paper Award.
Bin Li, Yipeng Wang, Ren Wang, Charlie Tai, Ravi Iyer, Zhu Zhou, Andrew Herdrich, Tong Zhang, Ameer Haj-Ali, Ion Stoica, Krste Asanovic.
IEEE Conference on Network Softwarization (NetSoft), 2020. [paper][code]
A View on Deep Reinforcement Learning in System Optimization. Work done in a summer internship at Intel Labs.
Ameer Haj-Ali, Nesreen Ahmed, Ted Willke, Joseph Gonzalez, Krste Asanovic, Ion Stoica.
arXiv preprint, 2019. [paper]
Learning to Vectorize Using Deep Reinforcement Learning. Work done in a summer internship at Intel Labs.
Ameer Haj-Ali, Nesreen Ahmed, Ted Willke, Sophia Shao, Krste Asanovic, Ion Stoica.
Workshop on ML for Systems at NeurIPS, 2019. [paper][code]
Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep Learning Architectures.
Ameer Haj-Ali*, Hasan Genc*, Vighnesh Iyer*, Alon Amid*, Howard Mao, John Wright, Colin Schmidt, Jerry Zhao, Albert Ou, Max Banister, Yakun Sophia Shao, Borivoje Nikolic, Ion Stoica, Krste Asanovic.
arXiv preprint, 2019. [paper][code]
AutoPhase: Compiler Phase-Ordering for HLS with Deep Reinforcement Learning.
Ameer Haj-Ali*, Qijing Huang*, William Moses, John Xiang, Ion Stoica, Krste Asanovic , John Wawrzynek.
FCCM, 2019. [paper][code][video]
Performing Image Processing in Memristive Memory. Nominated for Cadence Academic Master Thesis Award.
Ameer Haj-Ali.
M.Sc. Thesis. [thesis]
Memristor-Based Processing-in-Memory and Its Application On Image Processing.
Ameer Haj-Ali, Ronny Ronen, Rotem Ben-Hur, Nimrod Wald, and Shahar Kvatinsky.
Elsevier, 2020. [chapter]
mMPU - a Real Processing-in-Memory Architecture to Combat the von Neumann Bottleneck.
Nishil Talati, Rotem Ben-Hur, Nimrod Wald, Ameer Haj-Ali, John Reuben, and Shahar Kvatinsky.
Springer, 2020. [chapter]
SIMPLER MAGIC: Synthesis and Mapping of In-Memory Logic Executed in a Single Row to Improve Throughput.
Rotem Ben-hur, Ronny Ronen, Ameer Haj-Ali, Debjyoti Bhattacharjee, Adi Eliahu, Natan Peled, and Shahar Kvatinsky.
TCAD, 2019. [paper]
Supporting the Momentum Training Algorithm Using a Memristor-Based Synapse.
Tzofnat Greenberg-Toledo, Roee Mazor, Ameer Haj-Ali, and Shahar Kvatinsky.
TCAS-I, 2019. [paper]
Not in Name Alone: a Memristive Memory Processing Unit for Real In-Memory Processing.
Ameer Haj-Ali, Rotem Ben-Hur, Nimrod Wald, Ronny Ronen, and Shahar Kvatinsky.
IEEE Micro, 2018. [paper]
IMAGING: In-Memory AlGorithms for Image processiNG.
Ameer Haj-Ali, Rotem Ben-Hur, Nimrod Wald, Ronny Ronen, and Shahar Kvatinsky.
TCAS-I, 2018. [paper]
Efficient Algorithms for In-memory Fixed Point Multiplication Using MAGIC.
Ameer Haj-Ali, Rotem Ben-Hur, Nimrod Wald, and Shahar Kvatinsky.
ISCAS, 2018. [paper]
Practical Challenges in Delivering the Promises of Real Processing-in-Memory Machines.
Nishil Talati, Ameer Haj-Ali, Rotem Ben-Hur, Nimrod Wald, Ronny Ronen, Pierre-Emmanuel Gaillardon, and Shahar Kvatinsky.
DATE, 2018. [paper]
Memristive Logic: A Framework for Evaluation and Comparison.
John Reuben, Rotem Ben-Hur, Nimrod Wald, Nishil Talati, Ameer Haj-Ali, Pierre-mmanuel Gaillardon, and Shahar Kvatinsky.
PATMOS, 2017. [paper]
A Taxonomy and Evaluation Framework for Memristive Logic.
John Reuben, Rotem Ben-Hur, Nimrod Wald, Nishil Talati, Ameer Haj-Ali, Pierre-Emmanuel Gaillardon, and Shahar Kvatinsky.
Springer, 2017. [chapter]