[GSoC 2019] WEEK 9!

I am so happy to pass the second evaluation!

This week is dedicated to the implementation of new iteration functionalities in Array module. Since the behaviour of __getitem__ is changed last week, we need to find a way to replace the old way of iterating arrays.

For this purpose, a generator is implemented to enable the iteration over each element, which was the old way in SymPy to iterate the array. This functionality is equivalent to flattening the array and then visiting one by one these elements, so the new class is named as Flatten. The advantage of generator will contribute to less memeory cose while iterating. This implementation corresponds as well the lazy-evaluation that I planned to implement in my proposal.

So the implementation is in #17308

[GSoC 2019] Week 8!

This is the week of second evaluation.

I have been working on a new issue, which is about modifying the behavior of Array module to follow the example of NumPy. As I discussed with my mentor at the very beginning of the project, the Array module is newly created and has fewer users at present, so it is still possible to standardize the behaviour of this module. Following a mature library such as NumPy will be a good idea because it would not only increase the consistency between Python’s libraries but also increase the efficiency of code in many cases.

The first step is modifying __getitem__ this function. What we had before is to access the value with a absolute index in the flatten array, while the standard in NumPy is that a index that is not complete will be added slice of `:` to extract all available values.

In order to do so, an new PR is opened and merged: #17226

[GSoC 2019] WEEK 7!

This week is also dedicated to the implementation of new format for sparse array. #17160 I have figured out the way to dynamically handle the dimension problem for different formats.

The most difficult part is for Compressed Sparse Row (CSR) format. This format is used by Matlab and Julia as a default format. (In fact the exact foramt is Compressed Sparse Column, which is transposed storage for CSR format.) CSR format used 3 list for storage: a flattened list for all non-zero value, a list of column index for each non-zero value and a list of pointer for first non-zero value in each row. In a two-dimension matrix, the notion of row is straight forward. But how to define a row in a multidimensional array?

In order solve this problem, I have defined the last dimension of each array as a row and flattened the first n-1 dimensions, with n as the rank of the array. I this way, a sparse array can be stored in 3 lists as the way CSR proposes.

This format is said to have a very good performance in item accessing and slicing, as well as vector product. Unfortunately, I haven’t figured out how to implement the __mul__ function to make good use of this format.

By the end of this week, 3 formats are basically implemented: List Of List(LIL), Coordiante list(COO) and Compressed Sparse Row(CSR).

[GSoC 2019] WEEK 6!

Hello everyone, this week I have started some preparation work for new formats of sparse array. The default format for sparse array in SymPy is Dictionary of keys(DOK) which means that the non-zero values are stored in a dictionary with the index as key and non-zero value as value. But this format has some disadvantages, i.e. very slow iterations due to random order of keys. So in order to improve the performance for some Array operations, I started thinking about infroducing new formats to sparse array module.

The format are widely used in matrix but not in array. Staying in tow dimension matrix makes some special format straight forward: Coordinate list (COO) for example, has one list for non-zero values and two more for columns and rows. But how to deal with the case of Array where the dimension(or rank) is variant is one obstacle.

So this week I have been mostly searching for solutions in order to dynamically handle the dimension for each format.

A new PR is opened to firstly implement the algorithm of each format. Link And a basic implementation is done to show the structure of LIL format.

As a finalization of last phase, most of sparse array casting into dense array issues have been resolved.

[GSoC 2019] WEEK 4 and 5!

Hello, the first phase is ended and I am happy to pass the first evaluation. I was struggling with my academic projects and final exams during the last two weeks. After talking about my difficulty of spending time contributing on my project with my mentors, Francesco allowed me to have a one-week break in condition that I should make up one week in the next phases. The goal is to have 40 hours work per week on average by the end of this program.

Thanks to the comprehension of my mentor, I could successfully pass the exams. I am going to work more over the second phase in order to have more contributions to the community. 🙂

[GSoC 2019] WEEK 3!

Hello, the third week ends! I am a little bit too busy to keep contributing on my GSoC project as the first two weeks. Fortunately, thank to the work of the past 2 weeks and the help of my mentors, I become more productive which compensates, more or less, the lack of time.

  • Overview

The task about sparse arrays nearly comes to an end which is the main goal for the first phase. I still have one case to be fixed but most of them are merged in to Sympy’s master and are ready to be used. I started as well preparing for the next task: implementing more data structure for sparse arrays. This would be started in the next week and is supposed to be ended by the begining of phase 2.

  • PRs

Several PR are merged this week for sparse arrays operations. #16937 is finally merged with a code style much simpler than its first version. #16994 about __mul__ and __rmul__ is closed and replaced by #17014 due to a mistake about rebase. #17026 about __div__ and __neg__ operators for sparse arrays is opened and merged. #17035 about permutedims and transpose for sparse arrays is opened and reviewed by my mentor.

  • Issues

No specific issue opened. The list in #16941 is continously being updated.

  • Reviews

No review due to the lack of time.

  • Related works

Pratice of rebasing and merging for a specific case: updated local repository from the upstream master after opening a PR. Even though I have broken the PR #16994, I am still happy to learn some new tricks that I didn’t know before.

Preparing a blog about presenting sparse array associated with my work in SymPy so that my contribution can indeed help people when they need to use this class but don’t really know how to do it and what the advantage would be .

[GSoC 2019] WEEK 2!

Hello everyone! I am glad to present some of my progress during this past week.

  • Overview

I was mainly working on the sparse aray issue over this week. A sparse array is, by a brief definition, an array in which most of the elements are zero. In order to save the memory cost, sparse array is distinguished from the dense array(which means most of elements are non-zero) and then is stored and manipulated in a different way. The default data format in SymPy for sparse array is called Dictionary of keys (DOK), where the non-zero value is stored in a dictionary.

However, the sparse array is cast to dense array in various ways. So what I am trying to do is to free these sparse arrays and nake good use of them for cases like large scale sparse array operations.

  • PRs

The PR about derive_by_array #16937 is not yet merged. Thank to the guidance of my mentor Francesco, the PR is being ameliorated and now has satisfactory result. Besides the operation of derive_by_array, I have updated as well the way Array module perform a equality test and apply function. To show the result of these changes, let’s see what the behavior of derive_by_array was:

>>> a = MutableSparseNDimArray.zeros(10000, 20000)
>>> a[1, 1] = i
>>> a[1, 2] = j
>>> d = derive_by_array(a, i)
MemoryError
>>> d = derive_by_array(a, (i, j))
MemoryError

The sparse array is cast to a list which leads to the MemoryError. Now, we can have:

>>> a = MutableSparseNDimArray.zeros(10000, 20000)
>>> a[1, 1] = i
>>> derive_by_array(a, i) = ImmutableSparseNDimArray({20001:1},(10000, 20000))
True
>>> a[1, 2] = j
>>> derive_by_array(a, (i, j)) == ImmutableSparseNDimArray({20001: 1, 200020002: 1},(2, 10000, 20000))
True

The operation is much more rapid and can save a lot of spaces of memory.

Francesco also helped to improve the quality of codes. For example, to perform the deriative of a sparse array, he suggested me to use a dictionary comprehension instead of a for loop, so an extra function is replaced by one single line of code.

A new PR #16994 about __mul__ and__rmul__ is opened, which solves the problem of casting in these operators. Same as another new PR #17000 for tensorproduct. However, these PRs cannot pass the test for now because the issue about equality test is not yet fixed and merged into master of SymPy.

A new PR #16969 about adding functions and tests for ArrayComprehension is merged, which added functions like .tomatrix(), .tolist() for this class.

  • Issues

I have opened a issue #16984 about equality test of Dict and dict objects. I ran into this case while comparing the dictionaries of sparse array. Fortunately, this issue is recently fixed.

  • Reviews

No reviews for other contributors’ PR. I have participated in the review of my own PRs since there are some errors have been found after the creation of PR.

  • Related works

I have participated in the discussion about the nature of ArrayComprehension in the community this week. Comprehension is a new class that I implemented last week(See last post for more details) This discussion helped me better understand the difference between a multipledimensional list and an array: an array is suppposed to perform vectorized operations while list is mosyly element-wise.

[GSoC 2019] WEEK 1!

Hello everyone! Here comes the end of the first coding week. This blog is dedicated to present the progress that I have had during the past week.

  • PRs

This week the PR about ArrayComprehension is merged. #16845. This class is designed as an extension of list comprehension in Python, which enables the list comprehension with symbolic dimensions.

A new PR #16937 about the sparse array being cast to dense array in the case of derive_by_array is opened. But somes improvements are still needed. The challenge is to find an approriate method to override and perform a derivative correctly.

  • Issues

An issue #16911 is opened. It is in fact a discussion about implementing a NumPy-Like behavior in Array module. It is interesting because my mentor was, at first, not in favor of this implementation. But some differences of behavior between the Array module in SymPy and the one in NumPy did change his mind. So this implementation can be kept in my proposal and will be my tasks for the next phase.

Another issue #16941 whose aim is to list out all cases where a sparse array is cast to dense arrays is opened this week. Not only as a discussion, this issue would also be a overview of the problems so that the PRs like #16937 can refer to it. A checklist is being updated to show the avancement.

Please feel free to comment on the issues or the PRs that I opened. I would love to listen to your opinions.

  • Reviews

After the merge of ArrayComprehension, another student Gagandeep Singh opened a new PR to ameliorate the code quality of this class. I have also participated as a reviewer to offer some help. I have also leant a lot from the discussion.

  • Related works

Not much, due to the burden of projects in college.

[GSoC 2019]Brief report upon the progress over the community bonding period

Linear Algebra : Tensor Core

Overview

After the official announcement of selection result, I was happy to be one of the students to work with SymPy over this summer program. A first greeting between the the mentors and me was efficient. Since then, I have been preparing for the launch of the program: creating my personal blog, reading the codebase of SymPy, discussing with my mentors about some details of the proposal, contributing to a new class named ArrayComprehension. Thank to the help of my mentors and other contributors in the community, I understand better the workflow and some coding conventions. Besides, due to the constraint of spare time, I have had some difficulties at the beginning to arrange my timetable. Now I have a more reliable estimation about how many hours I can spend on the program while the college semester is not yet finished.

Progress regarding the project

  • My personal blog

Every student is supposed to maintain a blog on which one should update the progress of each week. I think it is also time for me to create a personal site as an access to introduce myself to the world. I used WordPress as the CMS(Content Management System). I wanted to have a first blog as a global presentation of GSoC, unfortunately I was too busy to do it.

  • Codebase of Sympy

I have read carefully the code of the Array module and have browsed the entire Tensor module. As for other modules, I only had a look at those that I would probably use in the during this project. By carefully, I would like to say that I tested each method by instantiating an example. Some debuggings were also helpful when I tried to understand the purpose of some codes.

However, I have to admit that I still have a lot of things to learn and to get familiar with. For example, I didn’t notice the coding convention of the constructor which passes the parameters to superclass in order to make sure the possibility of rebuilding the object. Because if the lack of experience, I may need help and guidance, in the most of time, from other contributors, especially from my mentors.

  • Discussion of proposal

I have created a documentation in which I specify some detailed questions about my proposal. For example, the NumPy-like syntax was something that I considered working on. But it turns out to be unnecessary to do so. Doc of questions This documentation will be continuously updated. It can serve as a tract of our discussion and some decisions that we make together.

Even though I didn’t have enough time to explore the task like Codegen and Integration over array, I hope that I can have a better view of these tasks before its beginning. After a discussion with me mentors, I can make sure that I have at least a clear understanding of the tasks over phase 1.

  • ArrayComprehension class

#16845 This class is not really planned in my proposal, yet it helps me a lot regarding the understanding of the codebase and some coding conventions. Its creation comes from the need of a symbolic list comprehension of two other students in the stats groupe. To be honest, these two students set me a good example during this period. Their dynamicity in contributing to SymPy becomes not only a stress but also an encouragement to me.

For now, the basic structure of ArrayComprehension is established, thanks to the review of Francesco and other contributors in the community. I will continue to working on it so that it would help the students of stats get over the problem.

Estimation of availability during phase 1

After the practice of the past three weeks, I think I found a feasible solution to schedule the work of GSoC and my regular timetable. I have to say that I recently changed the college, this brand-new life gave me as great a challenge as the one of GSoC. Moreover, I have 3 academic projects to do and 4 final exams to prepare. Thus, I would like to reduce my planning working time for the first phase, if I can have the agreement of my mentors. I will make sure the advancement of the project(as it is planned in the proposal), but I cannot be engaged with too much extra work during this period. Once the semester is over, I will make up for the shortage of working hours.

Monday Tuesday Wednesday Thursday FridaySaturdaySunday
2h
22:00 *-
24:00
4h
20:00 –
24:00

4h
20:00 –
24:00
4h
20:00-
24:00
4h
20:00 –
24:00
6h
3h + 3h
6h
3h + 3h

*Based on UTC+2
The average working hours would be 30h per week. Then I will work 50h per week next phase to ensure the 40 hours in average. That is totally possible since I will be on vacation.

Ameliorations and conclusion

Since the project will officially start in a very soon future, I would like to propose some possible ameliorations about myself, based on what I have done in the past three weeks.

  • Be more active in contributing

I have seen the two students in stats sent many PRs and contributed as well in reviewing other’s PRs. That would be a good example for me to follow. I can surely learn a lot from reviering other’s code.

  • Ask well-considered question but don’t be afraid to say it out

I always want to maintain a good impression of myself to others, which makes me hesitate a lot when I want to ask a question. I think it is better to face the weak points of myself rather than hide it. I am a newbie to the Open Source world, there are surely a lot of things to learn. Please forgive me, Francesco and Sartaj, if my questions seem silly to you.

  • Prepare for the next phase once it is possible

Due to my constraint of time, I could not prepare very well for the next phase, which makes me worried. It would be good if I can finalize the work of this phase 1 earlier so that I would have some extra time to think about the phase 2 and 3.