[GSoC 2019] WEEK 2!

Hello everyone! I am glad to present some of my progress during this past week.

  • Overview

I was mainly working on the sparse aray issue over this week. A sparse array is, by a brief definition, an array in which most of the elements are zero. In order to save the memory cost, sparse array is distinguished from the dense array(which means most of elements are non-zero) and then is stored and manipulated in a different way. The default data format in SymPy for sparse array is called Dictionary of keys (DOK), where the non-zero value is stored in a dictionary.

However, the sparse array is cast to dense array in various ways. So what I am trying to do is to free these sparse arrays and nake good use of them for cases like large scale sparse array operations.

  • PRs

The PR about derive_by_array #16937 is not yet merged. Thank to the guidance of my mentor Francesco, the PR is being ameliorated and now has satisfactory result. Besides the operation of derive_by_array, I have updated as well the way Array module perform a equality test and apply function. To show the result of these changes, let’s see what the behavior of derive_by_array was:

>>> a = MutableSparseNDimArray.zeros(10000, 20000)
>>> a[1, 1] = i
>>> a[1, 2] = j
>>> d = derive_by_array(a, i)
MemoryError
>>> d = derive_by_array(a, (i, j))
MemoryError

The sparse array is cast to a list which leads to the MemoryError. Now, we can have:

>>> a = MutableSparseNDimArray.zeros(10000, 20000)
>>> a[1, 1] = i
>>> derive_by_array(a, i) = ImmutableSparseNDimArray({20001:1},(10000, 20000))
True
>>> a[1, 2] = j
>>> derive_by_array(a, (i, j)) == ImmutableSparseNDimArray({20001: 1, 200020002: 1},(2, 10000, 20000))
True

The operation is much more rapid and can save a lot of spaces of memory.

Francesco also helped to improve the quality of codes. For example, to perform the deriative of a sparse array, he suggested me to use a dictionary comprehension instead of a for loop, so an extra function is replaced by one single line of code.

A new PR #16994 about __mul__ and__rmul__ is opened, which solves the problem of casting in these operators. Same as another new PR #17000 for tensorproduct. However, these PRs cannot pass the test for now because the issue about equality test is not yet fixed and merged into master of SymPy.

A new PR #16969 about adding functions and tests for ArrayComprehension is merged, which added functions like .tomatrix(), .tolist() for this class.

  • Issues

I have opened a issue #16984 about equality test of Dict and dict objects. I ran into this case while comparing the dictionaries of sparse array. Fortunately, this issue is recently fixed.

  • Reviews

No reviews for other contributors’ PR. I have participated in the review of my own PRs since there are some errors have been found after the creation of PR.

  • Related works

I have participated in the discussion about the nature of ArrayComprehension in the community this week. Comprehension is a new class that I implemented last week(See last post for more details) This discussion helped me better understand the difference between a multipledimensional list and an array: an array is suppposed to perform vectorized operations while list is mosyly element-wise.

Leave a comment

Your email address will not be published. Required fields are marked *