Apr.2024-Present l Huawei Research UK

I am a Principal Performance Modelling Engineer in Huawei Research (UK), focusing on modelling and simulation of upcoming CPU and NPU processors with AI future use-cases, leading architectural studies and software/hardware co-optimization to define requirements and new innovative features of such advanced processors.

 

 

Previous Experience

Jan.2024-Mar.2024 l Freelance AI Consultant

Exploring next career steps and AI freelance consulting.

 

 

Apr.2022-Dec.2023 l Samsung Research UK

Between 2022-2023, as Tech Lead and Principal Engineer in Samsung Research (SRUK), I worked with Chris Alder and Ben Duckworth, leading an On-Device AI Applied Research team and collaborated with cross-functional local and remote teams, driving on-device AI vision & speech applied research projects. Focusing on ML-hardware aware solutions: frameworks and neural network models optimisation for the next generation of Samsung products, from proof-of-concepts to commercial products in fast-paced projects.

 

• Speech Separation and Enhancement (2022-2023)

Octocat Designed and developed efficient real-time RNN architecture models that distinguish and separate the primary voice from other sounds. The key is to take advantage of Neural Processing Units to achieve optimal sound enhancements in real-time, providing deeper and greater details by analyzing content scene by scene and accentuating various audio elements including human voices, background music and sound effects in next generation products.

** Outstanding R&D Award for performance & continuous contributions (12/2023) **
(Results also presented at CES 2024)

Keywords: Deep Neural Networks, Pytorch, Tensorflow, Jax, RNN, Transformers, Speech Processing, Speaker Source Separation, NPU: Neural Processing Units, DSP: Digital Signal Processing Units, Tizen, Real-time Processing

• Face and Speaker Detection & Verification (2022-2023)

Octocat Optimized deep neural network architectures to achieve real-time faces and lips detection, from characters displayed on the screen, taking advantage of Neural Processing Units to automatically associate their voices with their position in the image, for increased realism and immersive sound experience in next generation products. And experimental Speaker Verification for parental control use-cases

Keywords: Deep Neural Networks, Pytorch, Tensorflow, Object Detection, NPU: Neural Processing Units, DSP: Digital Signal Processing Units, Tizen, Real-time Processing

 

 

Jan.2018-Mar.2022 l Arm
(Machine Learning Group)

Between 2018-2022, as a Principal Engineer in Arm's Machine Learning Group, I worked with David Mansell and Ian Bratt, analysing neural networks of future use-cases, identifying the most relevant operations and data patterns, capturing key insights with data science techniques to advance real-world performance of Arm's new software and hardware solutions.

 

• Predictive Models for Kernel Selection Heuristics (2020-2022)

Octocat Designed and developed innovative predictive model for CPUs & GPUs, for high-performance selection of kernels implementations of GEMM (General Matrix Multiply) and convolutions, within Machine Learning use-cases; choosing the right implementation in microseconds is crucial through fast and general heuristics.

Keywords: Deep Neural Networks, Tensorflow, Keras, Convolutions, General Matrix Multiply, Decision Trees, Random Forest, Classification, Regression, AutoML, Data Augmentation, Performance Analysis

• New Armv9-A ML CPU ISA extensions (SME) (2020-2022)

Octocat Investigated novel CPU architecture features called SME (Scalable Matrix Extension)—a new instruction set designed to optimize performance and power efficiency of AI applications. SME enables hardware-acceleration of outer product operations, for fast general matrix multiplication used AI core oeprations. Through hardware simulations and benchmarking, I demonstrated to CPU architects and hardware engineers concrete performance gains for AI applications

Keywords: Matrix multiplication, Outer product, Modelling, CPU Architecture, Deep Neural Network, Performance Analysis

• Future Use-Cases & Deep Neural Networks Performance (2018-2022)

Octocat Investigated deep neural networks for future use-cases, analysing them and breaking them down to distill the most relevant operations and data patterns, capturing key insights with data science techniques, to then drive new software and hardware solutions.

Keywords: Deep Neural Networks, Convolutions, General Matrix Multiply, Classification, Regression, LSTM, Transformers, Clustering, Ensemble Learning, Data Augmentation, Performance Analysis

 

Sep.2012-Dec.2017 l Arm
(Architecture & Technology Group)

Between 2015-2017, I worked with Paul Hughes in Arm's Architecture and Technology Group, as member of the Intelligent Machines Future System Design team, analysing and prototyping computer vision and ML subsystems for ADAS.

 

• ADAS & Computational Biology use-cases (2016-2017)

Octocat Analysed and prototyped machine learning subsystems for ADAS (Advanced Driver-Assistance System) and Computational Biology applications, to identify which computational patterns could be optimized with the SVE (Scalable Vector Extension) instruction set.

Keywords: Image Segmentation, SLAM, Stereo matching, Object Detection, Graph/Network Structures, Deep Neural Networks, System Architecture, Memory management, Virtualization, Software Modelling, Performance Analysis

• New Armv8-A CPU ISA extensions (SVE) (2015-2016)

Octocat Investigated the impact of SVE (Scalable Vector Extension): a new instruction set, with applications in computer vision workloads, exploiting as much data-level parallelism as possible, with new instructions and vector lengths that scale from 128 to 2048 bits. Through hardware simulations and benchmarking, I demonstrated to CPU architects and hardware engineers concrete performance gains in computer vision applications.

Keywords: Single Instruction, Multiple Data (SIMD) programming model, Vector Instructions, Parallel Programming, Modelling, CPU Architecture, Computer Vision, Keypoint and Features Detection, Assembly Language, Performance Analysis

Between 2012-2014, I joined Arm's Processor Division as Staff Engineer, working on system architecture and heterogeneous computing projects with Charles Garcia-Tobin and Jason Parker.

 

• Heterogeneous Computing and GPU Coherency (2012-2014)

Octocat Modelled and prototyped low-level software, and analysed heterogeneous compute use-cases to incorporate Shared Virtual Memory (SVM) support between different type of processors (CPU & GPU), enabling them to share data, as simply as passing a pointer, which massively simplify the software, and delivers more power efficient and higher performance applications than other software managed cache synchronization mechanisms.

Keywords: Memory Models, Cache Memory, Interconnect, Heterogeneous Computing, System Architecture, Modelling, System Programming, Simulation Waveform, FPGA prototyping

 

Personal Projects l

 

Honors and awards l

2023 Samsung Electronics:
Outstanding R&D Award for performance & continuous contributions [Dec. 2023]

2021 University of Girona:
Interview & career recognition [book release: Nov. 2021]

Career recognition by the board of trustees of the Polytechnic School Patronage (University of Girona) in 25th anniversary book, by founding members, industrialists, representatives of institutions, Chamber of Commerce, delegates from different government departments and presidents and deans of professional associations from University of Girona. 25th Anniversary Book: Polytechnic School Patronage

2013 Best Ph.D. Thesis Award of the School of Computer Science

Parallel spatial data structures for interactive rendering, PhD Thesis, defended October 2012
Publication September 2013
University of Girona

2007 Best Computer Science Project Award

Generation and real-time visualization of 3D vegetation
University of Girona, Spain
Patronat Award 12th Edition winner
Supervised by Dr. Gustavo Patow and Prof. Mateu Sbert

Technical skills l

Languages l

English (Read, Write and Speak: Fluent), Spanish (mother language), Catalan (Read, Write and Speak: Fluent), Korean (Beginner).

Programming Languages l

Python, C, C++, Objective C, OpenCL, CUDA, Aarch64 Assembly (NEON, SVE), Javascript, R, Latex, MatLab, HLSL, GLSL, SQL, NoSQL, HTML, CSS, PHP.

Programming frameworks, tools and APIs l

Tensorflow, Pytorch, JAX, AirFlow, RStudio, Mathematica, Github Copilot, Jupyter Notebook, Caffe2, OpenCV, DirectX, OpenGL, OpenMP, Unity, Android NDK, LLVM, AWS (EC2, EFS, S3), Google Cloud Platform, Docker, Kubernetes, Spark, Hadoop, Git, Confluence, JIRA, Sharepoint, Visual Studio, GDB, 3ds Max, Maya, Zbrush, Blender, Photoshop, Gimp, Inkscape.

Full-Stack Development l

OSX, Linux, Android, Tizen, Windows, Firmware and bare-metal.

Keywords: parallel computing, multi-threaded design, algorithm design, numerical methods, data visualization, machine learning, computer vision, 3d graphics and game engine programming, white papers, technical papers, GPU, CPU and FPGA development

Education and training l

Ph.D. in Computer Science

Dissertation: Parallel spatial data structures for interactive rendering
BR PhD Fellowship from the University of Girona

Master in Computing

From the University of Girona, and the UPC Barcelona Tech

Computer Engineering

University of Girona, Spain

Academic research l

Selected Research Publications l

I have a Ph. D. in Computer Science from University of Girona. My research concerned with parallel efficient data structures for data visualization, geometric modeling, image-based representations and ray-tracing.

My previous research explored practical applications in a variety of areas in computer graphics, including real-time rendering, GPU efficient data-structures for geometry processing and texturing, and dynamic parallel data-structures for ray-tracing and general-purpose GPU applications.

 

• Parallel spatial data structures for interactive rendering (2013)

Octocat PhD Thesis, defended October 2012 [pdf] (Best Ph.D. Thesis Award, School of Computer Science 2013 University of Girona )

The focus of this study is to design and provide time- and space-efficient parallel data structures and algorithms for real-time rendering and general-purpose GPU applications. A large number of operations in computer graphics are concerned with the process of collecting spatial data in a computer’s memory, in such a way that the information can be subsequently recovered as quickly as possible in order to be processed and generate a screen image in real-time. In this context, it is important to retain and organize the spatial data in such a way that fast retrieval and evaluation are possible. This thesis introduces three specific representations of spatial data with efficient parallel random-access for interactive rendering applications. Surface and volume representations of different topology and sparsity are handled with efficient encoding and rendering algorithms, where the key idea is to create a mapping of the input data to a virtual grid, which naturally suits for parallel graphics processing units with a Single Instruction, Multiple Data (SIMD) programming model. The proposed approaches create a coarse lattice in which each cell contains a local description of surface and volume information, required for rendering such regions of the domain. This low-bandwidth localized memory access pattern is increasingly advantageous in many-core architectures, were the usage of random-access parallel data structures is crucial to provide fast rendering speed and good visual quality.
   

Keywords: Spatial data, Dense and Sparse Data Structures, Parallel Computing, Spatial Hashing, Surface Parameterization, Subdivision Surfaces, Surface Simplification

• Interactive Applications for Sketch-Based Editable Polycube Map (2013)

Octocat IEEE Transactions on Visualization and Computer Graphics, (Volume:19, Issue: 7, July 2013); Ismael Garcia, Jiazhi Xia, Ying He, Shi-Qing Xin, Gustavo Patow [pdf]

In this paper we propose a sketch-based editable polycube mapping method that, given a general mesh and a simple polycube that coarsely resembles the shape of the object, plus sketched features indicating relevant correspondences between the two, provides a uniform, regular and user-controllable quads-only mesh that can be used as a basis structure for subdivision. Large scale models with complex geometry and topology can be processed efficiently with simple, intuitive operations. We show that the simple, intuitive nature of the polycube map is a substantial advantage from the point of view of the interface by demonstrating a series of applications, including kit-basing, shape morphing, painting over the parameterization domain, and GPU-friendly tessellated subdivision displacement, where the user is also able to control the number of patches in the base mesh by the construction of the base polycube.
   

Keywords: Digital Geometry Processing, Surface Parameterization, Polycube Map, GPU Subdivision Surface

• A Runtime Cache for Interactive Procedural Modeling (2012)

Octocat SMI 2012: Shape Modeling International, Computer & Graphics; Tim Reiner, Sylvain Lefebvre, Lorenz Diener, Ismael Garcia, Bruno Jobard, Carsten Dachsbacher [project page]

We present an efficient runtime cache to accelerate the display of procedurally displaced and textured implicit surfaces, exploiting spatio-temporal coherence between consecutive frames. We cache evaluations of implicit textures covering a conceptually infinite space. Rotating objects, zooming onto surfaces, and locally deforming shapes now requires minor cache updates per frame and benefits from mostly cached values, avoiding expensive re-evaluations. A novel parallel hashing scheme supports arbitrarily large data records and allows for an automated deletion policy: new information may evict information no longer required from the cache, resulting in an efficient usage. This sets our solution apart from previous caching techniques, which do not dynamically adapt to view changes and interactive shape modifications. We provide a thorough analysis on cache behavior for different procedural noise functions to displace implicit base shapes, during typical modeling operations.
   

Keywords: Parallel Hashing, Runtime Cache, Interactive Shape Modeling, Implicit Surface Rendering, Procedural Textures

• Coherent parallel hashing (2011)

Octocat ACM Transactions on Graphics, Proceedings of SIGGRAPH Asia, Vol. 30(6), 2011, Ismael Garcia, Sylvain Lefebvre, Samuel Hornus, Anass Lasram [project page]

Recent spatial hashing schemes hash millions of keys in parallel, compacting sparse spatial data in small hash tables while still allowing for fast access from the GPU. Unfortunately, available schemes suffer from two drawbacks: Multiple runs of the construction process are often required before success, and the random nature of the hash functions decreases access performance. We introduce a new parallel hashing scheme which reaches high load factor with a very low failure rate. In addition our scheme has the unique advantage to exploit coherence in the data and the access patterns for faster performance. Compared to existing approaches, it exhibits much greater locality of memory accesses and consistent execution paths within groups of threads. This is especially well suited to Computer Graphics applications, where spatial coherence is common. In absence of coherence our scheme performs similarly to previous methods, but does not suffer from construction failures. Our scheme is based on the Robin Hood scheme modified to quickly abort queries of keys that are not in the table, and to preserve coherence. We demonstrate our scheme on a variety of data sets. We analyze construction and access performance, as well as cache and threads behavior.
   

Keywords: Spatial Data, Parallel Computing, Coherent Memory, Cache Memory, Hashing, Sparse Data

• Editable Polycube Map for GPU-based Subdivision Surfaces (2011)

Octocat I3D 2011: Proceedings of Symposium on Interactive 3D Graphics & Games; Jiazhi Xia, Ismael Garcia, Ying He, Shi-Qing Xin, Gustavo Patow [project page]

In this paper we propose an editable polycube mapping method that, given an arbitrary high-resolution polygonal mesh and a simple polycube representation plus optional sketched features indicating relevant correspondences between the two, provides a uniform, regular and artist-controllable quads-only mesh with a parameterized subdivision scheme. The method introduces a global parameterization, based on a divide and conquer strategy, which allows to create polycube-maps with a much smaller number of patches, and gives the user much more control over the quality of the induced subdivision surface. All this makes it a practical method for real-time rendering on modern hardware (e.g. OGL 4.1 and D3D11 tessellation hardware). By sketching these correspondence features, processing large-scale models with complex geometry and topology is now feasible. This is crucial for obtaining watertight displaced Catmull-Clark subdivision surfaces and high-quality texturing on real-time applications.
   

Keywords: Digital Geometry Processing, Surface Parameterization, Polycube Map, GPU Subdivision Surface

• IGT: Inverse Geometric Textures (2008)

Octocat ACM Transactions on Graphics, Proceedings of SIGGRAPH Asia, Vol. 27(5), 2008; Ismael Garcia, Gustavo Patow [project page]

Preserving details from a high resolution reference model onto lower resolution models is a complex, and sometimes daunting, task as manual intervention is required to correct texture misplacements. Inverse Geometric Textures (IGT) is a parameterization independent texturing technique that allows preservation of texture details from a high resolution reference model onto lower resolutions, generated with a given simplification method. IGT uses a parameterization defined on the reference model to generate an inversely parameterized texture that stores, for each texel, a list of all triangles that mapped onto it. This way, for any valid texture coordinate, IGT can know the point and the triangle of the detailed model that was projected, allowing application of details from the reference model onto the fragment from the low-resolution model. IGT is encoded in compact data structures and can be evaluated quickly. Furthermore, the high resolution model can have its own independent, secondary parameterization, so that no additional effort is required to directly use artist-designed content.
   

Keywords: Appearance Preserving Simplification, Detail-Recovery, Computer Games, Texturing, Parameterizations, Level-of-Detail

 

Selected Undergraduate Research Publications l

I was introduced in the Computer Graphics field advised by Prof. Mateu Sbert. In 2005 I did a research internship collaborating with Prof. Laszlo Szirmay-Kalos at Technical University of Budapest Computer Graphics Research Group. In 2007 I started my Phd. in Computer Graphics joining the GGG research group, under the supervision of Dr. Gustavo Patow with doctorate research internships with ALICE Project-Team (INRIA Nancy), collaborating with Dr. Sylvain Lefebrve.

 

• Generation & interactive visualization of 3D vegetation (2007)

Octocat Master thesis in Computing, 2007; from the University of Girona and the UPC Barcelona Tech; advised by Dr. Gustavo Patow and Prof. Mateu Sbert [pdf]

• Multi-layered indirect texturing for tree rendering (2007)

Octocat Eurographics Workshop on Natural Phenomena 2007; Ismael Garcia, Gustavo Patow, Laszlo Szirmay-Kalos, Mateu Sbert [project page]

This paper presents a technique to render in real time complex trees using billboard clouds as an impostor simplification for the original polygonal tree, combined with a new texture-based representation for the foliage. The technique provides several new contributions with respect to previous approaches. The new algorithm allows progressive level of detail both at the geometric and at the shader levels. It also preserves the parallax effects of the original polygonal model keeping leaf positions, orientations, and preserving the overlapping of the leaves as seen from any view point. In addition, the texture-based representation provides high-definition close views without introducing high memory requeriments. We adapted a realistic lighting model with soft shadows and a global illumination precomputation, allowing to render highly complex scenes with thousands of trees in real time.
   

Keywords: Image Generation, Clustering, 3D Graphics, Realism

• Leaf cluster impostors for tree rendering with parallax (2005)

Octocat Short Paper of Eurographics (Dublin, Ireland), pp. 69-72, 2005; Ismael Garcia, Mateu Sbert, Laszlo Szirmay-Kalos

This paper presents a simple method to render complex trees on high frame rates while maintaining parallax effects. Based on the recognition that a planar impostor is accurate if the represented polygon is in its plane, we find an impostor for each of those groups of tree leaves that lie approximately in the same plane. The groups are built automatically by a clustering algorithm. Unlike billboards, these impostors are not rotated when the camera moves, thus the expected parallax effects are provided. On the other hand, clustering allows the replacement of a large number of leaves by a single semi-transparent quadrilateral, which improves rendering time considerably. Our impostors well represent the tree from any direction and provide accurate depth values, thus the method is also good for shadow computation.
   

Keywords: Image Generation, Clustering, 3D Graphics, Realism

 

Academic teaching l

From 2010 to 2011 Multimedia and computing technologies,

Technical Computer Engineering (University of Girona)
Teaching assistant

From 2009 to 2010 Multimedia and computing technologies,

Technical Computer Engineering (University of Girona)
Teaching assistant

From 2008 to 2009 Information Technologies,

Enviromental Science (University of Girona)
Teaching assistant

From 2007 to 2008 Effective audio-visual presentations,

Law (University of Girona)
Teaching assistant

From 2007 to 2008 Information Technologies,

Enviromental Science (University of Girona)
Teaching assistant

Conference talks l

2021: Modelling Machine Learning networks with Scalable Matrix Extensions

Global Engineering Conference
Arm internal engineering conference

2019: Data-driven analytics to advance Arm ML-solutions

Global Engineering Conference
Arm internal engineering conference

2018: Improving ML solutions with Data Science

Data + Insights
Arm internal Data Science Conference

2011: Event Lab Talk

Parallel computing for data processing, rendering & interaction
Event Lab Invited Speaker
University of Barcelona Barcelona, Spain

2011: Coherent parallel hashing

SIGGRAPH Asia 2011
Hong Kong, China

2011: Editable Polycube Map for GPU-based Subdivision Surfaces

Symposium on Interactive 3D Graphics and Games 2011
San Francisco, USA

2008: IGT: Inverse Geometric Textures

SIGGRAPH Asia 2011
Singapure, Singapore

2007: Multi-layered indirect texturing for tree rendering

Eurographics Workshop on Natural Phenomena 2007
Prague, Czech Republic

2005: Leaf cluster impostors for tree rendering with parallax

Eurographics 2005
Dublin, Ireland

Supervised Bachelor Students: l

2017 Gershom Akoli Agim

Deep Neural Networks on Arm Cortex-A CPUs: Analysis of CPU Inference on Vision workloads
MEng Electrical & Electronic Engineering Hons Project
Heriot-Watt University, United Kingdom

2016 Jan-Peter Larsson

3D Reconstruction using Stereo Matching Techniques on Scalable Vector Processors
MEng Electrical Hons Project
Edinburgh University, United Kingdom

2011 Enrique Nuzete

Interactive polycubemap editor
BEng Project, Technical Computer Engineering
University of Girona, Spain

2011 Tania Mendes

Modelling and visualization of skeleton-based animations
BEng Project, Technical Computer Engineering
University of Girona, Spain

2007 Verena Skuk

Procedural modelling and rendering of vegetation
Student Research Project
University of Girona, Spain

2007 Isaac Moles

Real-time rendering of large forest
BEng Project, Technical Computer Engineering
University of Girona, Spain

Other professional activities l

Program Committee Member l

CGVCVIP Computer Graphics, Visualization, Computer Vision & Image Processing

2012 IADIS International Conference Computer Graphics Visualization and Image Processing
Journal program committee member

CEIG:

CEIG 2015
Conference program committee member

Reviewer l

Siggraph Asia:

Siggraph Asia 2015
Technical papers reviewer
Siggraph Asia 2014
Technical papers reviewer

CGI Computer Graphics International:

CGI 2012
Technical papers reviewer

I3D Interactive 3D Graphics and Games:

I3D 2012
Technical papers reviewer

EG Eurographics:

EG 2013
Technical papers reviewer

EG 2008
Technical papers reviewer

Computer Animation and Virtual Worlds Journal:

2014 Computer Animation and Virtual Worlds Journal
Technical papers reviewer

CEIG Congreso Español de Informática Gráfica:

2015 CEIG 2015
Technical papers reviewer

2009 CEIG 2009
Technical papers reviewer

2008 CEIG 2008
Technical papers reviewer

Conference Organizer l

2009 Eurographics Symposium on Rendering, EGSR
Girona, Spain
Local organizer

Industry research collaborations l

2010 GPU Mesh Processing tools
NVIDIA Mutual Non-Disclosure Agreement

Academic undergraduate experience

From June 2011 to December 2013 Advances in virtual reality for cutting edge applications

Spanish Ministry of Science and Technology Project (TIN2010-20590-C02-02)
Research developer

From November 2007 to July 2010 CALBaD: Computer Aided Light Based Design

Spanish Ministry of Science and Technology Project (TIN2007-67120)
Research developer

From October 2009 to December 2009 Research intership - INRIA Nancy (France), Alice project-team

Phd thesis research intership under supervision of Dr.Sylvain Lefebvre
Mobility grant TME2008-00961 from Alice project-team
INRIA Nancy France
Reserch developer

From May 2007 to May 2011 PhD research fellowship BR

University of Girona
Research PhD student

From September 2004 to April 2007 Gametools Project

European Union Project (IST-2-004363)
Research developer

From February 2010 to April 2010 Research intership - INRIA Nancy (France), Alice project-team

Phd thesis research intership under supervision of Dr.Sylvain Lefebvre, INRIA contract Alice project-team
INRIA Nancy, France Research developer

From February 2005 to May 2005 Research undergraduate intership

Undergraduate final project under supervision of Prof.László Szirmay-Kalos Technical University of Budapest
Erasmus programme grant
Research developer

From June 2003 to March 2006 El Baúl S.A. – Grupo editorial el baúl

Redesign and implementation of online classified advertisement and community website. elbaul.com
Web developer

From July 2002 to December 2002 Institut d’Informàtica i Aplicacions, Universitat de Girona

Review and performance analysis of of the Sony’s Playstation 2 Linux Development Kit. Project number TIC2001-2416-C03-01, financed by the Spanish Ministery of Science and Technology [pdf]
Software developer

From June 2000 to September 2000 Igm Web S.L.

Design and implementation of several corporative websites.
Igm Web S.L.
Web developer

From June 1999 to September 1999 Kripton Networks

Writing technical reviews of videogames as a freelance web journalist. Iespana/Informatica
Web technical writer