Skip to content

Releases: apple/coremltools

coremltools 8.1

20 Nov 04:56
29ad374
Compare
Choose a tag to compare

Release Notes

  • Python Support
    • Compatible with Python 3.12.
  • Added support for additional PyTorch operations
    • torch.clamp_max, torch.rand_like, torch.all, torch.linalg_inv, torch.nan_to_num, torch.cumprod, torch.searchsorted ops are now supported.
  • Increased conversion support coverage for models produced by torch.export
    • Op translation support is at 68% parity with our mature torch.jit.traceconverter.
    • Support enumerated shape model.
    • Support ImageType input.
  • Added Python bindings for the following classes:
  • Various other bug fixes, enhancements, clean ups and optimizations.
    • Favor bool mask in scaled dot product attention
    • Fix quantization crash with bool mask
  • Special thanks to our external contributors for this release: @M-Quadra @benjaminkech @guru-desh

coremltools 8.0

16 Sep 20:50
7b13371
Compare
Choose a tag to compare

Release Notes

Compare to 7.2 (including features from 8.0b1 and 8.0b2)

  • Support for Latest Dependencies
    • Compatible with the latest protobuf python package which improves serialization latency.
    • Support torch 2.4.0, numpy 2.0, scikit-learn 1.5.
  • Support stateful Core ML models
    • Updates to the converter to produce Core ML models with the State Type (new type introduced in iOS18/macOS15).
    • Adds a toy stateful attention example model to show how to use in-place kv-cache.
  • Increase conversion support coverage for models produced by torch.export
    • Op translation support is at 56% parity with our mature torch.jit.trace converter
    • Representative deep learning models (mobilebert, deeplab, edsr, mobilenet, vit, inception, resnet, wav2letter, emformer) have been supported
    • Representative foundation models (llama, stable diffusion) have been supported
    • The model quantized by ct.optimize.torch could be exported by torch.export and then convert.
  • New Compression Features
    • coremltools.optimize
      • Support compression with more granularities: blockwise quantization, grouped channel wise palettization
      • 4 bit weight quantization and 3 bit palettization
      • Support joint compression modes (8 bit look-up-tables for palettization, pruning+quantization/palettization)
      • Vector palettization by setting cluster_dim > 1 and palettization with per channel scale by setting enable_per_channel_scale=True.
      • Experimental activation quantization (take a W16A16 Core ML model and produce a W8A8 model)
      • API updates for coremltools.optimize.coreml and coremltools.optimize.torch
    • Support some models quantized by torchao (including the ops produced by torchao such as _weight_int4pack_mm).
    • Support more ops in quantized_decomposed namespace, such as embedding_4bit, etc.
  • Support new ops and fixes bugs for old ops
    • compression related ops: constexpr_blockwise_shift_scale, constexpr_lut_to_dense, constexpr_sparse_to_dense, etc
    • updates to the GRU op
    • SDPA op scaled_dot_product_attention
    • clip op
  • Updated the model loading API
    • Support optimizationHints.
    • Support loading specific functions for prediction.
  • New utilities in coremltools.utils
    • coremltools.utils.MultiFunctionDescriptor and coremltools.utils.save_multifunction, for creating an mlprogram with multiple functions in it, that can share weights.
    • coremltools.models.utils.bisect_model can break a large Core ML model into two smaller models with similar sizes.
    • coremltools.models.utils.materialize_dynamic_shape_mlmodel can convert a flexible input shape model into a static input shape model.
  • Various other bug fixes, enhancements, clean ups and optimizations
  • Special thanks to our external contributors for this release: @sslcandoit @FL33TW00D @dpanshu @timsneath @kasper0406 @lamtrinhdev @valfrom @teelrabbit @igeni @Cyanosite

coremltools 8.0b2

16 Aug 01:02
5e2460f
Compare
Choose a tag to compare
coremltools 8.0b2 Pre-release
Pre-release

Release Notes

  • Support for Latest Dependencies
    • Compatible with the latest protobuf python package: Improves serialization latency.
    • Compatible with numpy 2.0.
    • Supports scikit-learn 1.5.
  • New Core ML model utils
    • coremltools.models.utils.bisect_model can break a large Core ML model into two smaller models with similar sizes.
    • coremltools.models.utils.materialize_dynamic_shape_mlmodel can convert a flexible input shape model into a static input shape model.
  • New compression features in coremltools.optimize.coreml
    • Vector palettization: By setting cluster_dim > 1 in coremltools.optimize.coreml.OpPalettizerConfig, you can do the vector palettization, where each entry in the lookup table is a vector of length cluster_dim.
    • Palettization of per channel scale: By setting enable_per_channel_scale=True in coremltools.optimize.coreml.OpPalettizerConfig, weights are normalized along the output channel using per channel scales before being palettized.
    • Joint compression: A new pattern is supported, where weights are first quantized to int8 and then palettized into n-bit look-up table with int8 entries.
    • Support conversion of palettized model with 8bits LUT produced from coremltools.optimize.torch.
  • New compression features / bug fixes in coremltools.optimize.torch
    • Added conversion support for Torch models jointly compressed using the training time APIs in coremltools.optimize.torch .
    • Added vector palettization support to SKMPalettizer .
    • Fixed bug in construction of weight vectors along output channel for vector palettization with PostTrainingPalettizer and DKMPalettizer .
    • Deprecated cluter_dtype option in favor of lut_dtype in ModuleDKMPalettizerConfig .
    • Added support for quantizing ConvTranspose modules with PostTrainingQuantizer and LinearQuantizer .
    • Added static grouping for activation heuristic in GPTQ.
    • Fixed bug in how quantization scales are computed for Conv2D layer with per-block quantization in GPTQ .
    • Can now perform activation only quantization with QAT APIs.
  • Experimental torch.export conversion support
    • Support conversion of stateful models with mutable buffer.
    • Support conversion of dynamic inputs shape models.
    • Support conversion of 4-bit weight compression models.
  • Support new torch ops: clip .
  • Various other bug fixes, enhancements, clean ups and optimizations.
  • Special thanks to our external contributors for this release: @dpanshu , @timsneath , @kasper0406 , @lamtrinhdev , @valfrom

Appendix

  • Example code of converting stateful torch.export model
import torch
import coremltools as ct

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.register_buffer("state_1", torch.tensor([0.0, 0.0, 0.0]))

    def forward(self, x):
        # In place update of the model state
        self.state_1.mul_(x)
        return self.state_1 + 1.0

source_model = Model()
source_model.eval()

example_inputs = (torch.tensor([1.0, 2.0, 3.0]),)
exported_model = torch.export.export(source_model, example_inputs)
coreml_model = ct.convert(exported_model, minimum_deployment_target=ct.target.iOS18)
  • Example code of converting torch.export models with dynamic input shapes
import torch
import coremltools as ct

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(3, 5)

    def forward(self, x):
        y = self.linear(x)
        return y

source_model = Model()
source_model.eval()

example_inputs = (torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]),)
dynamic_shapes = {"x": {0: torch.export.Dim(name="batch_dim")}}
exported_model = torch.export.export(source_model, example_inputs, dynamic_shapes=dynamic_shapes)
coreml_model = ct.convert(exported_model)
  • Example code of converting torch.export with 4-bit weight compression
import torch
from torch._export import capture_pre_autograd_graph
from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e
from torch.ao.quantization.quantizer.xnnpack_quantizer import (
    XNNPACKQuantizer,
    get_symmetric_quantization_config,
)
import coremltools as ct

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(3, 5)
    def forward(self, x):
        y = self.linear(x)
        return y

source_model = Model()
source_model.eval()

example_inputs = (torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]),)

pre_autograd_graph = capture_pre_autograd_graph(source_model, example_inputs)
quantization_config = get_symmetric_quantization_config(weight_qmin=-7, weight_qmax=8)
quantizer = XNNPACKQuantizer().set_global(quantization_config)
prepared_graph = prepare_pt2e(pre_autograd_graph, quantizer)
converted_graph = convert_pt2e(prepared_graph)

exported_model = torch.export.export(converted_graph, example_inputs)
coreml_model = ct.convert(exported_model, minimum_deployment_target=ct.target.iOS17)

coremltools 8.0b1

10 Jun 19:09
f391218
Compare
Choose a tag to compare
coremltools 8.0b1 Pre-release
Pre-release

For all the new features, find the updated documentation in the docs-guides

  • New utilities coremltools.utils.MultiFunctionDescriptor() and coremltools.utils.save_multifunction , for creating an mlprogram with multiple functions in it, that can share weights. Updated the model loading API to load specific functions for prediction.
  • Stateful Core ML models: updates to the converter to produce Core ML models with the State Type (new type introduced in iOS18/macOS15).
  • coremltools.optimize
    • Updates to model representation (mlprogram) pertaining to compression:
      • Support compression with more granularities: blockwise quantization, grouped channel wise palettization
      • 4 bit weight quantization (in addition to 8 bit quantization that was already supported)
      • 3 bit palettization (in addition to 1,2,4,6,8 bit palettization that was already supported)
      • Support joint compression modes:
        • 8 bit Look-up-tables for palettization
        • ability to combine weight pruning and palettization
        • ability to combine weight pruning and quantization
    • API updates:
      • coremltools.optimize.coreml
        • Updated existing APIs to account for features mentioned above
        • Support joint compression by applying compression techniques on an already compressed model
        • A new API to support activation quantization using calibration data, which can be used to take a W16A16 Core ML model and produce a W8A8 model: ct.optimize.coreml.experimental.linear_quantize_activations
          • (to be upgraded from experimental to the official name space in a future release)
      • coremltools.optimize.torch
        • Updated existing APIs to account for features mentioned above
        • Added new APIs for data free compression (PostTrainingPalettizer , PostTrainingQuantizer
        • Added new APIs for calibration data based compression (SKMPalettizer for sensitive k-means palettization algorithm, layerwise_compression for GPTQ/sparseGPT quantization/pruning algorithm)
        • Updated the APIs + the coremltools.convert implementation, so that for converting torch models compressed with ct.optimize.torch , there is no longer a need to provide additional pass pipeline arguments.
  • iOS18 / macOS15 ops
    • compression related ops: constexpr_blockwise_shift_scale, constexpr_lut_to_dense, constexpr_sparse_to_dense, etc
    • updates to the GRU op
    • PyTorch op scaled_dot_product_attention
  • Experimental torch.export conversion support
import torch
import torchvision

import coremltools as ct

torch_model = torchvision.models.vit_b_16(weights="IMAGENET1K_V1")

x = torch.rand((1, 3, 224, 224))
example_inputs = (x,)
exported_program = torch.export.export(torch_model, example_inputs)

coreml_model = ct.convert(exported_program)
  • Various other bug fixes, enhancements, clean ups and optimizations

Known Issues

  • Conversion will fail when using certain palettization modes (e.g. int8 LUT, vector palettization) with torch models using ct.optimize.torch
  • Some of the joint compression modes when used with the training time APIs in ct.optimize.torch will result in a torch model that is not correctly converted
  • The post-training palettization config for mlpackage models (ct.optimize.coreml.``OpPalettizerConfig) does not yet have all the arguments that are supported in the cto.torch.palettization APIs (e.g. lut_dtype (to get int8 dtyped LUT), cluster_dim (to do vector palettization), enable_per_channel_scale (to apply per-channel-scale) etc).
  • Applying symmetric quantization using GPTQ algorithm with ct.optimize.torch.layerwise_compression.LayerwiseCompressor will not produce the correct quantization scales, due to a known bug. This may lead to poor accuracy for the quantized model

Special thanks to our external contributors for this release: @teelrabbit @igeni @Cyanosite

coremltools 7.2

22 Apr 23:27
7521b68
Compare
Choose a tag to compare
  • New Features
    • Supports ExecuTorch 0.2 (see ExecuTorch doc for examples)
      • Core ML Partitioner: If a PyTorch model is partially supported with Core ML, then Core ML partitioner can determine the supported part and have ExecuTorch delegate to Core ML.
      • Core ML Quantizer: Quantize PyTorch models in Core ML favored scheme
  • Enhancements
    • Improved Model Conversion Speed
    • Expanded Operation Translation Coverage
      • add torch.narrow
      • add torch.adaptive_avg_pool1d and torch.adaptive_max_pool1d
      • add torch.numpy_t (i.e. the numpy-style transpose operator .T)
      • enhance torch.clamp_min for integer data type
      • enhance torch.add for complex data type
      • enhance tf.math.top_k when k is variable

Thanks to our ExecuTorch partners and our open-source community: @KrassCodes @M-Quadra @teelrabbit @minimalic @alealv @ChinChangYang @pcuenca

coremltools 7.1

01 Nov 14:46
dbb0094
Compare
Choose a tag to compare
  • New Features:

    • Supports Torch 2.1
      • Includes experimental support for torch.export API but limited to EDGE dialect.

      • Example usage:

        •  import torch 
           from torch.export import export
           from executorch.exir import to_edge
           
           import coremltools as ct
           
           example_args = (torch.randn(*size), )
           aten_dialect = export(AnyNNModule(), example_args)
           edge_dialect = to_edge(aten_dialect).exported_program()
           edge_dialect._dialect = "EDGE"
           
           mlmodel = ct.convert(edge_dialect)
          
  • Enhancements:

    • API - ct.utils.make_pipeline - now allows specifying compute_units
    • New optimization passes:
      • Folds selective data movement ops like reshape, transpose into adjacent constant compressed weights
      • Casts int32 → int16 dtype for all intermediate tensors when compute precision is set to fp16
    • PyTorch op - multinomial - Adds lowering for it to CoreML
    • Type related refinements on Pad and Gather/Gather-like ops
  • Bug Fixes:

    • Fixes coremltools build issue related to kmeans1d package
    • Minor fixes in lowering of PyTorch ops: masked_fill & randint
  • Various other bug fixes, enhancements, clean ups and optimizations.

coremltools 7.0

18 Sep 21:47
e4b0d63
Compare
Choose a tag to compare
  • New submodule coremltools.optimize for model quantization and compression
    • coremltools.optimize.coreml for compressing coreml models, in a data free manner. coremltools.compresstion_utils.* APIs have been moved here
    • coremltools.optimize.torch for compressing torch model with training data and fine-tuning. The fine tuned torch model can then be converted using coremltools.convert
  • The default neural network backend is now mlprogram for iOS15/macOS12. Previously calling coremltools.convert() without providing the convert_to or the minimum_deployment_target arguments, used the lowest deployment target (iOS11/macOS10.13) and the neuralnetwork backend. Now the conversion process will default to iOS15/macOS12 and the mlprogram backend. You can change this behavior by providing a minimum_deployment_target or convert_to value.
  • Python 3.11 support.
  • Support for new PyTorch ops: repeat_interleave, unflatten, col2im, view_as_real, rand, logical_not, fliplr, quantized_matmul, randn, randn_like, scaled_dot_product_attention, stft, tile
  • pass_pipeline parameter has been added to coremltools.convert to allow controls over which optimizations are performed.
  • MLModel batch prediction support.
  • Support for converting statically quantized PyTorch models.
  • Prediction from compiled model (.modelc files). Get compiled model files from an MLModel instance. Python API to explicitly compile a model.
  • Faster weight palletization for large tensors.
  • New utility method for getting weight metadata: coremltools.optimize.coreml.get_weights_metadata. This information can be used to customize optimization across ops when using coremltools.optimize.coreml APIs.
  • New and updated MIL ops for iOS17/macOS14/watchOS10/tvOS17
  • coremltools.compression_utils is deprecated.
  • Changes default I/O type for Neural Networks to FP16 for iOS16/macOS13 or later when mlprogram backend is used.
  • Changes upper input range behavior when backend is mlprogram:
    • If RangeDim is used and no upper-bound is set (with a positive number), an exception will be raised.
    • If the user does not use the inputs parameter but there are undetermined dim in input shape (for example, TF with "None" in input placeholder), it will be sanitized to a finite number (default_size + 1) and raise a warning.
  • Various other bug fixes, enhancements, clean ups and optimizations.

Special thanks to our external contributors for this release: @fukatani , @pcuenca , @KWiecko , @comeweber , @sercand , @mlaves, @cclauss, @smpanaro , @nikalra, @jszaday

coremltools 7.0b2

15 Aug 16:46
5765495
Compare
Choose a tag to compare
coremltools 7.0b2 Pre-release
Pre-release
  • The default neural network backend is now mlprogram for iOS15/macOS12. Previously calling coremltools.convert() without providing the convert_to or the minimum_deployment_target arguments, used the lowest deployment target (iOS11/macOS10.13) and the neuralnetwork backend. Now the conversion process will default to iOS15/macOS12 and the mlprogram backend. You can change this behavior by providing a minimum_deployment_target or convert_to value.
  • Changes default I/O type for Neural Networks to FP16 for iOS16/macOS13 or later when mlprogram backend is used.
  • Changes upper input range behavior when backend is mlprogram:
    • If RangeDim is used and no upper-bound is set (with a positive number), an exception will be raised.
    • If the user does not use the inputs parameter but there are undetermined dim in input shape (for example, TF with "None" in input placeholder), it will be sanitized to a finite number (default_size + 1) and raise a warning.
  • New utility method for getting weight metadata: coremltools.optimize.coreml.get_weights_metadata. This information can be used to customize optimization across ops when using coremltools.optimize.coreml APIs.
  • Support for new PyTorch ops: repeat_interleave and unflatten.
  • New and updated iOS17/macOS14 ops: batch_norm, conv, conv_transpose, expand_dims, gru, instance_norm, inverse, l2_norm, layer_norm, linear, local_response_norm, log, lstm, matmul, reshape_like, resample, resize, reverse, reverse_sequence, rnn, rsqrt, slice_by_index, slice_by_size, sliding_windows, squeeze, transpose.
  • Various other bug fixes, enhancements, clean ups and optimizations.

Special thanks to our external contributors for this release: @fukatani, @pcuenca, @KWiecko, @comeweber and @sercand

coremltools 7.0b1

05 Jun 22:41
b5ba7e1
Compare
Choose a tag to compare
coremltools 7.0b1 Pre-release
Pre-release
  • New submodule coremltools.optimize for model quantization and compression
    • coremltools.optimize.coreml for compressing coreml models, in a data free manner. coremltools.compresstion_utils.* APIs have been moved here
    • coremltools.optimize.torch for compressing torch model with training data and fine-tuning. The fine tuned torch model can then be converted using coremltools.convert
  • Updated MIL ops for iOS17/macOS14/watchOS10/tvOS17
  • pass_pipeline parameter has been added to coremltools.convert to allow controls over which optimizations are performed.
  • Python 3.11 support.
  • MLModel batch prediction support.
  • Support for converting statically quantized PyTorch models
  • New Torch layer support: randn, randn_like, scaled_dot_product_attention, stft, tile
  • Faster weight palletization for large tensors.
  • coremltools.models.ml_program.compression_utils is deprecated.
  • Various other bug fixes, enhancements, clean ups and optimizations.

Core ML tools 7.0 guide: https://coremltools.readme.io/v7.0/

Special thanks to our external contributors for this release: @fukatani, @pcuenca, @mlaves, @cclauss, @smpanaro, @nikalra, @jszaday

coremltools 6.3

03 Apr 16:26
d9123f2
Compare
Choose a tag to compare

Core ML Tools 6.3 Release Note

  • Torch 2.0 Support
  • TensorFlow 2.12.0 Support
  • Remove Python 3.6 support
  • Functionality for controling graph passes/optimizations, see the pass_pipeline parameter to coremltools.convert.
  • A utility function for easily creating pipeline, see: utils.make_pipeline.
  • A debug utility function for extracting submodels, see: converters.mil.debugging_utils.extract_submodel
  • Various other bug fixes, enhancements, clean ups and optimizations.

Special thanks to our external contributors for this release: @fukatani, @nikalra and @kevin-keraudren.