[GSoC 2019] WEEK 4 and 5!

Hello, the first phase is ended and I am happy to pass the first evaluation. I was struggling with my academic projects and final exams during the last two weeks. After talking about my difficulty of spending time contributing on my project with my mentors, Francesco allowed me to have a one-week break in condition that I should make up one week in the next phases. The goal is to have 40 hours work per week on average by the end of this program.

Thanks to the comprehension of my mentor, I could successfully pass the exams. I am going to work more over the second phase in order to have more contributions to the community. 🙂

[GSoC 2019] WEEK 3!

Hello, the third week ends! I am a little bit too busy to keep contributing on my GSoC project as the first two weeks. Fortunately, thank to the work of the past 2 weeks and the help of my mentors, I become more productive which compensates, more or less, the lack of time.

  • Overview

The task about sparse arrays nearly comes to an end which is the main goal for the first phase. I still have one case to be fixed but most of them are merged in to Sympy’s master and are ready to be used. I started as well preparing for the next task: implementing more data structure for sparse arrays. This would be started in the next week and is supposed to be ended by the begining of phase 2.

  • PRs

Several PR are merged this week for sparse arrays operations. #16937 is finally merged with a code style much simpler than its first version. #16994 about __mul__ and __rmul__ is closed and replaced by #17014 due to a mistake about rebase. #17026 about __div__ and __neg__ operators for sparse arrays is opened and merged. #17035 about permutedims and transpose for sparse arrays is opened and reviewed by my mentor.

  • Issues

No specific issue opened. The list in #16941 is continously being updated.

  • Reviews

No review due to the lack of time.

  • Related works

Pratice of rebasing and merging for a specific case: updated local repository from the upstream master after opening a PR. Even though I have broken the PR #16994, I am still happy to learn some new tricks that I didn’t know before.

Preparing a blog about presenting sparse array associated with my work in SymPy so that my contribution can indeed help people when they need to use this class but don’t really know how to do it and what the advantage would be .

[GSoC 2019] WEEK 2!

Hello everyone! I am glad to present some of my progress during this past week.

  • Overview

I was mainly working on the sparse aray issue over this week. A sparse array is, by a brief definition, an array in which most of the elements are zero. In order to save the memory cost, sparse array is distinguished from the dense array(which means most of elements are non-zero) and then is stored and manipulated in a different way. The default data format in SymPy for sparse array is called Dictionary of keys (DOK), where the non-zero value is stored in a dictionary.

However, the sparse array is cast to dense array in various ways. So what I am trying to do is to free these sparse arrays and nake good use of them for cases like large scale sparse array operations.

  • PRs

The PR about derive_by_array #16937 is not yet merged. Thank to the guidance of my mentor Francesco, the PR is being ameliorated and now has satisfactory result. Besides the operation of derive_by_array, I have updated as well the way Array module perform a equality test and apply function. To show the result of these changes, let’s see what the behavior of derive_by_array was:

>>> a = MutableSparseNDimArray.zeros(10000, 20000)
>>> a[1, 1] = i
>>> a[1, 2] = j
>>> d = derive_by_array(a, i)
MemoryError
>>> d = derive_by_array(a, (i, j))
MemoryError

The sparse array is cast to a list which leads to the MemoryError. Now, we can have:

>>> a = MutableSparseNDimArray.zeros(10000, 20000)
>>> a[1, 1] = i
>>> derive_by_array(a, i) = ImmutableSparseNDimArray({20001:1},(10000, 20000))
True
>>> a[1, 2] = j
>>> derive_by_array(a, (i, j)) == ImmutableSparseNDimArray({20001: 1, 200020002: 1},(2, 10000, 20000))
True

The operation is much more rapid and can save a lot of spaces of memory.

Francesco also helped to improve the quality of codes. For example, to perform the deriative of a sparse array, he suggested me to use a dictionary comprehension instead of a for loop, so an extra function is replaced by one single line of code.

A new PR #16994 about __mul__ and__rmul__ is opened, which solves the problem of casting in these operators. Same as another new PR #17000 for tensorproduct. However, these PRs cannot pass the test for now because the issue about equality test is not yet fixed and merged into master of SymPy.

A new PR #16969 about adding functions and tests for ArrayComprehension is merged, which added functions like .tomatrix(), .tolist() for this class.

  • Issues

I have opened a issue #16984 about equality test of Dict and dict objects. I ran into this case while comparing the dictionaries of sparse array. Fortunately, this issue is recently fixed.

  • Reviews

No reviews for other contributors’ PR. I have participated in the review of my own PRs since there are some errors have been found after the creation of PR.

  • Related works

I have participated in the discussion about the nature of ArrayComprehension in the community this week. Comprehension is a new class that I implemented last week(See last post for more details) This discussion helped me better understand the difference between a multipledimensional list and an array: an array is suppposed to perform vectorized operations while list is mosyly element-wise.

[GSoC 2019] WEEK 1!

Hello everyone! Here comes the end of the first coding week. This blog is dedicated to present the progress that I have had during the past week.

  • PRs

This week the PR about ArrayComprehension is merged. #16845. This class is designed as an extension of list comprehension in Python, which enables the list comprehension with symbolic dimensions.

A new PR #16937 about the sparse array being cast to dense array in the case of derive_by_array is opened. But somes improvements are still needed. The challenge is to find an approriate method to override and perform a derivative correctly.

  • Issues

An issue #16911 is opened. It is in fact a discussion about implementing a NumPy-Like behavior in Array module. It is interesting because my mentor was, at first, not in favor of this implementation. But some differences of behavior between the Array module in SymPy and the one in NumPy did change his mind. So this implementation can be kept in my proposal and will be my tasks for the next phase.

Another issue #16941 whose aim is to list out all cases where a sparse array is cast to dense arrays is opened this week. Not only as a discussion, this issue would also be a overview of the problems so that the PRs like #16937 can refer to it. A checklist is being updated to show the avancement.

Please feel free to comment on the issues or the PRs that I opened. I would love to listen to your opinions.

  • Reviews

After the merge of ArrayComprehension, another student Gagandeep Singh opened a new PR to ameliorate the code quality of this class. I have also participated as a reviewer to offer some help. I have also leant a lot from the discussion.

  • Related works

Not much, due to the burden of projects in college.