Releases: apple/coremltools
coremltools 8.1
Release Notes
- Python Support
- Compatible with Python 3.12.
- Added support for additional PyTorch operations
torch.clamp_max
,torch.rand_like
,torch.all
,torch.linalg_inv
,torch.nan_to_num
,torch.cumprod
,torch.searchsorted
ops are now supported.
- Increased conversion support coverage for models produced by
torch.export
- Op translation support is at 68% parity with our mature
torch.jit.trace
converter. - Support enumerated shape model.
- Support
ImageType
input.
- Op translation support is at 68% parity with our mature
- Added Python bindings for the following classes:
- Various other bug fixes, enhancements, clean ups and optimizations.
- Favor bool mask in scaled dot product attention
- Fix quantization crash with bool mask
- Special thanks to our external contributors for this release: @M-Quadra @benjaminkech @guru-desh
coremltools 8.0
Release Notes
Compare to 7.2 (including features from 8.0b1 and 8.0b2)
- Support for Latest Dependencies
- Compatible with the latest
protobuf
python package which improves serialization latency. - Support
torch 2.4.0
,numpy 2.0
,scikit-learn 1.5
.
- Compatible with the latest
- Support stateful Core ML models
- Updates to the converter to produce Core ML models with the State Type (new type introduced in iOS18/macOS15).
- Adds a toy stateful attention example model to show how to use in-place kv-cache.
- Increase conversion support coverage for models produced by
torch.export
- Op translation support is at 56% parity with our mature
torch.jit.trace
converter - Representative deep learning models (mobilebert, deeplab, edsr, mobilenet, vit, inception, resnet, wav2letter, emformer) have been supported
- Representative foundation models (llama, stable diffusion) have been supported
- The model quantized by
ct.optimize.torch
could be exported bytorch.export
and then convert.
- Op translation support is at 56% parity with our mature
- New Compression Features
- coremltools.optimize
- Support compression with more granularities: blockwise quantization, grouped channel wise palettization
- 4 bit weight quantization and 3 bit palettization
- Support joint compression modes (8 bit look-up-tables for palettization, pruning+quantization/palettization)
- Vector palettization by setting
cluster_dim > 1
and palettization with per channel scale by settingenable_per_channel_scale=True
. - Experimental activation quantization (take a W16A16 Core ML model and produce a W8A8 model)
- API updates for
coremltools.optimize.coreml
andcoremltools.optimize.torch
- Support some models quantized by
torchao
(including the ops produced by torchao such as_weight_int4pack_mm
). - Support more ops in
quantized_decomposed
namespace, such asembedding_4bit
, etc.
- coremltools.optimize
- Support new ops and fixes bugs for old ops
- compression related ops:
constexpr_blockwise_shift_scale
,constexpr_lut_to_dense
,constexpr_sparse_to_dense
, etc - updates to the GRU op
- SDPA op
scaled_dot_product_attention
clip
op
- compression related ops:
- Updated the model loading API
- Support
optimizationHints
. - Support loading specific functions for prediction.
- Support
- New utilities in
coremltools.utils
coremltools.utils.MultiFunctionDescriptor
andcoremltools.utils.save_multifunction
, for creating an mlprogram with multiple functions in it, that can share weights.coremltools.models.utils.bisect_model
can break a large Core ML model into two smaller models with similar sizes.coremltools.models.utils.materialize_dynamic_shape_mlmodel
can convert a flexible input shape model into a static input shape model.
- Various other bug fixes, enhancements, clean ups and optimizations
- Special thanks to our external contributors for this release: @sslcandoit @FL33TW00D @dpanshu @timsneath @kasper0406 @lamtrinhdev @valfrom @teelrabbit @igeni @Cyanosite
coremltools 8.0b2
Release Notes
- Support for Latest Dependencies
- Compatible with the latest
protobuf
python package: Improves serialization latency. - Compatible with
numpy 2.0
. - Supports
scikit-learn 1.5
.
- Compatible with the latest
- New Core ML model utils
coremltools.models.utils.bisect_model
can break a large Core ML model into two smaller models with similar sizes.coremltools.models.utils.materialize_dynamic_shape_mlmodel
can convert a flexible input shape model into a static input shape model.
- New compression features in
coremltools.optimize.coreml
- Vector palettization: By setting
cluster_dim > 1
incoremltools.optimize.coreml.OpPalettizerConfig
, you can do the vector palettization, where each entry in the lookup table is a vector of lengthcluster_dim
. - Palettization of per channel scale: By setting
enable_per_channel_scale=True
incoremltools.optimize.coreml.OpPalettizerConfig
, weights are normalized along the output channel using per channel scales before being palettized. - Joint compression: A new pattern is supported, where weights are first quantized to int8 and then palettized into n-bit look-up table with int8 entries.
- Support conversion of palettized model with 8bits LUT produced from
coremltools.optimize.torch
.
- Vector palettization: By setting
- New compression features / bug fixes in
coremltools.optimize.torch
- Added conversion support for Torch models jointly compressed using the training time APIs in
coremltools.optimize.torch
. - Added vector palettization support to
SKMPalettizer
. - Fixed bug in construction of weight vectors along output channel for vector palettization with
PostTrainingPalettizer
andDKMPalettizer
. - Deprecated
cluter_dtype
option in favor oflut_dtype
inModuleDKMPalettizerConfig
. - Added support for quantizing
ConvTranspose
modules withPostTrainingQuantizer
andLinearQuantizer
. - Added static grouping for activation heuristic in
GPTQ
. - Fixed bug in how quantization scales are computed for
Conv2D
layer with per-block quantization inGPTQ
. - Can now perform activation only quantization with
QAT
APIs.
- Added conversion support for Torch models jointly compressed using the training time APIs in
- Experimental
torch.export
conversion support- Support conversion of stateful models with mutable buffer.
- Support conversion of dynamic inputs shape models.
- Support conversion of 4-bit weight compression models.
- Support new torch ops:
clip
. - Various other bug fixes, enhancements, clean ups and optimizations.
- Special thanks to our external contributors for this release: @dpanshu , @timsneath , @kasper0406 , @lamtrinhdev , @valfrom
Appendix
- Example code of converting stateful
torch.export
model
import torch
import coremltools as ct
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.register_buffer("state_1", torch.tensor([0.0, 0.0, 0.0]))
def forward(self, x):
# In place update of the model state
self.state_1.mul_(x)
return self.state_1 + 1.0
source_model = Model()
source_model.eval()
example_inputs = (torch.tensor([1.0, 2.0, 3.0]),)
exported_model = torch.export.export(source_model, example_inputs)
coreml_model = ct.convert(exported_model, minimum_deployment_target=ct.target.iOS18)
- Example code of converting
torch.export
models with dynamic input shapes
import torch
import coremltools as ct
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear = torch.nn.Linear(3, 5)
def forward(self, x):
y = self.linear(x)
return y
source_model = Model()
source_model.eval()
example_inputs = (torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]),)
dynamic_shapes = {"x": {0: torch.export.Dim(name="batch_dim")}}
exported_model = torch.export.export(source_model, example_inputs, dynamic_shapes=dynamic_shapes)
coreml_model = ct.convert(exported_model)
- Example code of converting
torch.export
with 4-bit weight compression
import torch
from torch._export import capture_pre_autograd_graph
from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e
from torch.ao.quantization.quantizer.xnnpack_quantizer import (
XNNPACKQuantizer,
get_symmetric_quantization_config,
)
import coremltools as ct
class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.linear = torch.nn.Linear(3, 5)
def forward(self, x):
y = self.linear(x)
return y
source_model = Model()
source_model.eval()
example_inputs = (torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]),)
pre_autograd_graph = capture_pre_autograd_graph(source_model, example_inputs)
quantization_config = get_symmetric_quantization_config(weight_qmin=-7, weight_qmax=8)
quantizer = XNNPACKQuantizer().set_global(quantization_config)
prepared_graph = prepare_pt2e(pre_autograd_graph, quantizer)
converted_graph = convert_pt2e(prepared_graph)
exported_model = torch.export.export(converted_graph, example_inputs)
coreml_model = ct.convert(exported_model, minimum_deployment_target=ct.target.iOS17)
coremltools 8.0b1
For all the new features, find the updated documentation in the docs-guides
- New utilities
coremltools.utils.MultiFunctionDescriptor()
andcoremltools.utils.save_multifunction
, for creating anmlprogram
with multiple functions in it, that can share weights. Updated the model loading API to load specific functions for prediction. - Stateful Core ML models: updates to the converter to produce Core ML models with the State Type (new type introduced in iOS18/macOS15).
coremltools.optimize
- Updates to model representation (
mlprogram
) pertaining to compression:- Support compression with more granularities: blockwise quantization, grouped channel wise palettization
- 4 bit weight quantization (in addition to 8 bit quantization that was already supported)
- 3 bit palettization (in addition to 1,2,4,6,8 bit palettization that was already supported)
- Support joint compression modes:
- 8 bit Look-up-tables for palettization
- ability to combine weight pruning and palettization
- ability to combine weight pruning and quantization
- API updates:
coremltools.optimize.coreml
- Updated existing APIs to account for features mentioned above
- Support joint compression by applying compression techniques on an already compressed model
- A new API to support activation quantization using calibration data, which can be used to take a W16A16 Core ML model and produce a W8A8 model:
ct.optimize.coreml.experimental.linear_quantize_activations
- (to be upgraded from experimental to the official name space in a future release)
coremltools.optimize.torch
- Updated existing APIs to account for features mentioned above
- Added new APIs for data free compression (
PostTrainingPalettizer
,PostTrainingQuantizer
- Added new APIs for calibration data based compression (
SKMPalettizer
for sensitive k-means palettization algorithm,layerwise_compression
for GPTQ/sparseGPT quantization/pruning algorithm) - Updated the APIs + the
coremltools.convert
implementation, so that for converting torch models compressed withct.optimize.torch
, there is no longer a need to provide additional pass pipeline arguments.
- Updates to model representation (
- iOS18 / macOS15 ops
- compression related ops:
constexpr_blockwise_shift_scale
,constexpr_lut_to_dense
,constexpr_sparse_to_dense
, etc - updates to the GRU op
- PyTorch op
scaled_dot_product_attention
- compression related ops:
- Experimental
torch.export
conversion support
import torch
import torchvision
import coremltools as ct
torch_model = torchvision.models.vit_b_16(weights="IMAGENET1K_V1")
x = torch.rand((1, 3, 224, 224))
example_inputs = (x,)
exported_program = torch.export.export(torch_model, example_inputs)
coreml_model = ct.convert(exported_program)
- Various other bug fixes, enhancements, clean ups and optimizations
Known Issues
- Conversion will fail when using certain palettization modes (e.g. int8 LUT, vector palettization) with torch models using
ct.optimize.torch
- Some of the joint compression modes when used with the training time APIs in
ct.optimize.torch
will result in a torch model that is not correctly converted - The post-training palettization config for mlpackage models (
ct.optimize.coreml.``OpPalettizerConfig
) does not yet have all the arguments that are supported in thecto.torch.palettization
APIs (e.g.lut_dtype
(to get int8 dtyped LUT),cluster_dim
(to do vector palettization),enable_per_channel_scale
(to apply per-channel-scale) etc). - Applying symmetric quantization using GPTQ algorithm with
ct.optimize.torch.layerwise_compression.LayerwiseCompressor
will not produce the correct quantization scales, due to a known bug. This may lead to poor accuracy for the quantized model
Special thanks to our external contributors for this release: @teelrabbit @igeni @Cyanosite
coremltools 7.2
- New Features
- Supports ExecuTorch 0.2 (see ExecuTorch doc for examples)
- Core ML Partitioner: If a PyTorch model is partially supported with Core ML, then Core ML partitioner can determine the supported part and have ExecuTorch delegate to Core ML.
- Core ML Quantizer: Quantize PyTorch models in Core ML favored scheme
- Supports ExecuTorch 0.2 (see ExecuTorch doc for examples)
- Enhancements
- Improved Model Conversion Speed
- Expanded Operation Translation Coverage
- add
torch.narrow
- add
torch.adaptive_avg_pool1d
andtorch.adaptive_max_pool1d
- add
torch.numpy_t
(i.e. the numpy-style transpose operator.T
) - enhance
torch.clamp_min
for integer data type - enhance
torch.add
for complex data type - enhance
tf.math.top_k
whenk
is variable
- add
Thanks to our ExecuTorch partners and our open-source community: @KrassCodes @M-Quadra @teelrabbit @minimalic @alealv @ChinChangYang @pcuenca
coremltools 7.1
-
New Features:
- Supports Torch 2.1
-
Includes experimental support for
torch.export
API but limited to EDGE dialect. -
Example usage:
-
import torch from torch.export import export from executorch.exir import to_edge import coremltools as ct example_args = (torch.randn(*size), ) aten_dialect = export(AnyNNModule(), example_args) edge_dialect = to_edge(aten_dialect).exported_program() edge_dialect._dialect = "EDGE" mlmodel = ct.convert(edge_dialect)
-
-
- Supports Torch 2.1
-
Enhancements:
- API -
ct.utils.make_pipeline
- now allows specifying compute_units - New optimization passes:
- Folds selective data movement ops like reshape, transpose into adjacent constant compressed weights
- Casts int32 → int16 dtype for all intermediate tensors when compute precision is set to fp16
- PyTorch op - multinomial - Adds lowering for it to CoreML
- Type related refinements on Pad and Gather/Gather-like ops
- API -
-
Bug Fixes:
- Fixes coremltools build issue related to kmeans1d package
- Minor fixes in lowering of PyTorch ops: masked_fill & randint
-
Various other bug fixes, enhancements, clean ups and optimizations.
coremltools 7.0
- New submodule
coremltools.optimize
for model quantization and compressioncoremltools.optimize.coreml
for compressing coreml models, in a data free manner.coremltools.compresstion_utils.*
APIs have been moved herecoremltools.optimize.torch
for compressing torch model with training data and fine-tuning. The fine tuned torch model can then be converted usingcoremltools.convert
- The default neural network backend is now
mlprogram
for iOS15/macOS12. Previously callingcoremltools.convert()
without providing theconvert_to
or theminimum_deployment_target
arguments, used the lowest deployment target (iOS11/macOS10.13) and theneuralnetwork
backend. Now the conversion process will default to iOS15/macOS12 and themlprogram
backend. You can change this behavior by providing aminimum_deployment_target
orconvert_to
value. - Python 3.11 support.
- Support for new PyTorch ops:
repeat_interleave
,unflatten
,col2im
,view_as_real
,rand
,logical_not
,fliplr
,quantized_matmul
,randn
,randn_like
,scaled_dot_product_attention
,stft
,tile
pass_pipeline
parameter has been added tocoremltools.convert
to allow controls over which optimizations are performed.- MLModel batch prediction support.
- Support for converting statically quantized PyTorch models.
- Prediction from compiled model (
.modelc
files). Get compiled model files from anMLModel
instance. Python API to explicitly compile a model. - Faster weight palletization for large tensors.
- New utility method for getting weight metadata:
coremltools.optimize.coreml.get_weights_metadata
. This information can be used to customize optimization across ops when usingcoremltools.optimize.coreml
APIs. - New and updated MIL ops for iOS17/macOS14/watchOS10/tvOS17
coremltools.compression_utils
is deprecated.- Changes default I/O type for Neural Networks to FP16 for iOS16/macOS13 or later when
mlprogram
backend is used. - Changes upper input range behavior when backend is
mlprogram
:- If
RangeDim
is used and no upper-bound is set (with a positive number), an exception will be raised. - If the user does not use the
inputs
parameter but there are undetermined dim in input shape (for example, TF with "None" in input placeholder), it will be sanitized to a finite number (default_size + 1) and raise a warning.
- If
- Various other bug fixes, enhancements, clean ups and optimizations.
Special thanks to our external contributors for this release: @fukatani , @pcuenca , @KWiecko , @comeweber , @sercand , @mlaves, @cclauss, @smpanaro , @nikalra, @jszaday
coremltools 7.0b2
- The default neural network backend is now
mlprogram
for iOS15/macOS12. Previously callingcoremltools.convert()
without providing theconvert_to
or theminimum_deployment_target
arguments, used the lowest deployment target (iOS11/macOS10.13) and theneuralnetwork
backend. Now the conversion process will default to iOS15/macOS12 and themlprogram
backend. You can change this behavior by providing aminimum_deployment_target
orconvert_to
value. - Changes default I/O type for Neural Networks to FP16 for iOS16/macOS13 or later when
mlprogram
backend is used. - Changes upper input range behavior when backend is
mlprogram
:- If
RangeDim
is used and no upper-bound is set (with a positive number), an exception will be raised. - If the user does not use the
inputs
parameter but there are undetermined dim in input shape (for example, TF with "None" in input placeholder), it will be sanitized to a finite number (default_size + 1) and raise a warning.
- If
- New utility method for getting weight metadata:
coremltools.optimize.coreml.get_weights_metadata
. This information can be used to customize optimization across ops when usingcoremltools.optimize.coreml
APIs. - Support for new PyTorch ops:
repeat_interleave
andunflatten
. - New and updated iOS17/macOS14 ops:
batch_norm
,conv
,con
v_transpose
,expand_dims
,gru
,instance_norm
,inverse
,l2_norm
,layer_norm
,linear
,local_response_norm
,log
,lstm
,matmul
,reshape_like
,resample
,resize
,reverse
,reverse_sequence
,rnn
,rsqrt
,slice_by_index
,slice_by_size
,sliding_windows
,squeeze
,transpose
. - Various other bug fixes, enhancements, clean ups and optimizations.
Special thanks to our external contributors for this release: @fukatani, @pcuenca, @KWiecko, @comeweber and @sercand
coremltools 7.0b1
- New submodule
coremltools.optimize
for model quantization and compressioncoremltools.optimize.coreml
for compressing coreml models, in a data free manner.coremltools.compresstion_utils.*
APIs have been moved herecoremltools.optimize.torch
for compressing torch model with training data and fine-tuning. The fine tuned torch model can then be converted usingcoremltools.convert
- Updated MIL ops for iOS17/macOS14/watchOS10/tvOS17
pass_pipeline
parameter has been added tocoremltools.convert
to allow controls over which optimizations are performed.- Python 3.11 support.
- MLModel batch prediction support.
- Support for converting statically quantized PyTorch models
- New Torch layer support:
randn
,randn_like
,scaled_dot_product_attention
,stft
,tile
- Faster weight palletization for large tensors.
coremltools.models.ml_program.compression_utils
is deprecated.- Various other bug fixes, enhancements, clean ups and optimizations.
Core ML tools 7.0 guide: https://coremltools.readme.io/v7.0/
Special thanks to our external contributors for this release: @fukatani, @pcuenca, @mlaves, @cclauss, @smpanaro, @nikalra, @jszaday
coremltools 6.3
Core ML Tools 6.3 Release Note
- Torch 2.0 Support
- TensorFlow 2.12.0 Support
- Remove Python 3.6 support
- Functionality for controling graph passes/optimizations, see the
pass_pipeline
parameter tocoremltools.convert
. - A utility function for easily creating pipeline, see:
utils.make_pipeline
. - A debug utility function for extracting submodels, see:
converters.mil.debugging_utils.extract_submodel
- Various other bug fixes, enhancements, clean ups and optimizations.
Special thanks to our external contributors for this release: @fukatani, @nikalra and @kevin-keraudren.