This is the week of second evaluation.
I have been working on a new issue, which is about modifying the behavior of
Array module to follow the example of NumPy. As I discussed with my mentor at the very beginning of the project, the
Array module is newly created and has fewer users at present, so it is still possible to standardize the behaviour of this module. Following a mature library such as NumPy will be a good idea because it would not only increase the consistency between Python’s libraries but also increase the efficiency of code in many cases.
The first step is modifying
__getitem__ this function. What we had before is to access the value with a absolute index in the flatten array, while the standard in NumPy is that a index that is not complete will be added slice of `:` to extract all available values.
In order to do so, an new PR is opened and merged: #17226
This week is also dedicated to the implementation of new format for sparse array. #17160 I have figured out the way to dynamically handle the dimension problem for different formats.
The most difficult part is for Compressed Sparse Row (CSR) format. This format is used by Matlab and Julia as a default format. (In fact the exact foramt is Compressed Sparse Column, which is transposed storage for CSR format.) CSR format used 3 list for storage: a flattened list for all non-zero value, a list of column index for each non-zero value and a list of pointer for first non-zero value in each row. In a two-dimension matrix, the notion of row is straight forward. But how to define a row in a multidimensional array?
In order solve this problem, I have defined the last dimension of each array as a row and flattened the first n-1 dimensions, with n as the rank of the array. I this way, a sparse array can be stored in 3 lists as the way CSR proposes.
This format is said to have a very good performance in item accessing and slicing, as well as vector product. Unfortunately, I haven’t figured out how to implement the __mul__ function to make good use of this format.
By the end of this week, 3 formats are basically implemented: List Of List(LIL), Coordiante list(COO) and Compressed Sparse Row(CSR).
Hello everyone, this week I have started some preparation work for new formats of sparse array. The default format for sparse array in SymPy is Dictionary of keys(DOK) which means that the non-zero values are stored in a dictionary with the index as key and non-zero value as value. But this format has some disadvantages, i.e. very slow iterations due to random order of keys. So in order to improve the performance for some Array operations, I started thinking about infroducing new formats to sparse array module.
The format are widely used in matrix but not in array. Staying in tow dimension matrix makes some special format straight forward: Coordinate list (COO) for example, has one list for non-zero values and two more for columns and rows. But how to deal with the case of Array where the dimension(or rank) is variant is one obstacle.
So this week I have been mostly searching for solutions in order to dynamically handle the dimension for each format.
A new PR is opened to firstly implement the algorithm of each format. Link And a basic implementation is done to show the structure of LIL format.
As a finalization of last phase, most of sparse array casting into dense array issues have been resolved.