Our May 23 event will feature an in-person presentation by Mark Kurtz, machine learning lead at Neural Magic, at Kensho's office in Cambridge, MA. Doors open at 5:30 pm for networking and snacks and our program will start at 6 pm. You may also attend via Zoom. Respond here and we'll get you a link prior to the program.
In the presentation Optimizing Transformer Models for Performance, Mark Kurtz will walk through the current state-of-the-art research for compressing transformers resulting in smaller and faster models that are just as accurate. The techniques include structured pruning, distillation, unstructured pruning, and quantization. After a quick background over the techniques, he will walk through newly published research which expands on these techniques: effects of optimizations on models of different sizes, results of applying optimizations for pretraining vs fine-tuning, and the surprising ways that sparse networks outperform dense for out-of-domain transfer and zero shot use cases. After this, he will end with a demo showing the results in action and how you can leverage and build upon the research, opening up for Q&A afterwards.