Performance Optimizations

As Claro's builtin copy(...) performs a deep copy, performance becomes an important consideration when data can become arbitrarily large (whether as a result of a deeply nested type or not). Fortunately, Claro is able to perform one significant optimization that can have an incredible effect on the runtime performance of copying large data structures.

Claro's copy(...) is Aware of Mutability

The key observation that enables this performance optimization is that, as Claro does not expose a value's memory location to users, if a piece of data is deeply-immutable (and in a few other situations that Claro takes advantage of), there is no possible way to distinguish between the two situations below:

  1. having equal values located at different addresses in memory
  2. having "shared references" to the exact same value in memory

Claro takes advantage of this fact to generate the most efficient possible code to copy the specific type in question. It does so by eliminating any actual copying of deeply immutable data found nested anywhere within a copied value.

For example, take the below mutable list containing immutable lists. When it is copied, a new mutable list must be initialized to represent the outer list so that the original and copied values may be mutated independently. However, the internal immutable lists can just be referenced directly in the copied list (thus establishing what are known as "shared references" to the underlying memory).

Fig 1:


var original = mut [[1, 2, 3], [4, 5], [6]];
var copied = copy(original);

print(original);
print(copied);

Output:

mut [[1, 2, 3], [4, 5], [6]]
mut [[1, 2, 3], [4, 5], [6]]

Demonstrating the Performance Win

Again, I'll reiterate that it's impossible to directly observe from Claro code itself that this optimization has taken place as Claro doesn't provide any mechanism for actually checking a value's memory address. So, instead, I'll try to demonstrate indirectly that this optimization must actually be occurring.

The below example sets up an experiment where a very large, nested list is populated and then copied twice. The first copy is done manually using list comprehension. Then, the second copy uses the builtin copy(...). Each copy is timed to get a sense of the impact of this optimization.

To make things interesting, the outermost level of the list is mutable so that the overall copy is not a no-op. However, the performance gain comes from being able to avoid the unnecessary copies all of the inner lists.

Note: I'm not claiming that this is a rigorous "benchmark" of any sort - just that this broadly demonstrates the claim.

Fig 2:


# Claro's list comprehension needs to support unused elem variable: [someExpr | _ in coll]
function discardFirst<A,B>(a: A, b: B) -> B {
  _ = a;
  return b;
}
var numbers: mut [int] = mut [];
lists::ensureCapacity(numbers, 1000);
var i = 0;
while (i++ < 1000) {
  lists::add(numbers, i);
}
var GIANT_TEST_LIST: mut [[[int]]] = mut [];
repeat (100) {
  var innerList = [discardFirst(unused, [x | x in numbers]) | unused in numbers];
  lists::add(GIANT_TEST_LIST, innerList);
}

# Compute the number of ints in the test list.
print("GIANT_TEST_LIST dimensions: {len(GIANT_TEST_LIST)}x{len(GIANT_TEST_LIST[0])}x{len(GIANT_TEST_LIST[0][0])}\n");

# Now, manually copy the test lest using list comprehension.
var firstTestStart = instant::now();
var manuallyCopied = mut [[[x | x in l2] | l2 in l1] | l1 in GIANT_TEST_LIST];
var firstTestEnd = instant::now();

# Now, copy using the builtin `copy(...)` function.
var secondTestStart = instant::now();
var efficientlyCopied = copy(GIANT_TEST_LIST);
var secondTestEnd = instant::now();

# Let's see approximately how much time each took!
var MILLIS_PER_SECOND = 1000.0;
var NANOS_PER_SECOND = 1000000000.0;
duration::between(firstTestStart, firstTestEnd)
  |> duration::toMillis(^)
  |> print("Manual copy time:  {^/MILLIS_PER_SECOND} seconds");
duration::between(secondTestStart, secondTestEnd)
  |> duration::toNanos(^)
  |> print("Builtin copy time: {^/NANOS_PER_SECOND} seconds");

# Now just to really finish the demonstration, let's confirm that these copies actually contain equal elements to the
# giant copied list.
print("\nmanuallyCopied == GIANT_TEST_LIST:    {manuallyCopied == GIANT_TEST_LIST}");
print("efficientlyCopied == GIANT_TEST_LIST: {efficientlyCopied == GIANT_TEST_LIST}");

Output:

GIANT_TEST_LIST dimensions: 100x1000x1000

Manual copy time:  2.526 seconds
Builtin copy time: 1.02E-4 seconds

manuallyCopied == GIANT_TEST_LIST:    true
efficientlyCopied == GIANT_TEST_LIST: true