Path optimization docs#64
Conversation
…ted imports, fix docstring
|
Moved optional umap import to module level so that the recursive function call does not involve the overhead of repetitive imports. |
There was a problem hiding this comment.
Pull request overview
This pull request adds comprehensive docstrings and type hints to path optimization functions in the lmd.path module, improving code documentation and type safety. The PR also includes code cleanup by removing debug print statements and reorganizing the UMAP dependency check.
Changes:
- Added type hints to all path optimization functions (
calc_len,tsp_hilbert_solve,_get_closest,_tsp_greedy_solve,_get_nodes,tsp_greedy_solve) - Added detailed docstrings with Args and Returns sections for all functions
- Moved UMAP import check to the beginning of
_tsp_greedy_solvefor earlier failure detection - Removed debug print statements and unnecessary code (e.g.,
p = pno-op)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """Calculate the length of a path based on a list of coordinates | ||
|
|
||
| Args: | ||
| data (np.array): Array of shape `(N, 2)` containing a list of coordinates |
There was a problem hiding this comment.
The type annotation in the Args section uses np.array, but the function signature uses np.ndarray. These should be consistent. Use np.ndarray in the docstring to match the type hint.
| data (np.array): Array of shape `(N, 2)` containing a list of coordinates | |
| data (np.ndarray): Array of shape `(N, 2)` containing a list of coordinates |
| data (np.array): Array of shape `(N, 2)` containing a list of coordinates | ||
|
|
||
| Returns: | ||
| Length of path |
There was a problem hiding this comment.
The Returns section is incomplete. It should specify the return type more formally, such as "float: The total length of the path".
| Length of path | |
| float: The total length of the path. |
| p: Iterations to use in constructing the Hilbert curve. | ||
|
|
||
| Returns: | ||
| Ordered indices of data of the shape `(N,)` according to their position along the Hilbert curve. |
There was a problem hiding this comment.
The Returns section format is inconsistent with the other functions. The return type and description should be on the same line or follow a consistent pattern. Consider formatting it as "np.ndarray: Ordered indices of data..." to match documentation conventions.
| Ordered indices of data of the shape `(N,)` according to their position along the Hilbert curve. | |
| np.ndarray: Ordered indices of data of the shape `(N,)` according to their position along the Hilbert curve. |
| Returns: | ||
| `return_sorted=True`: Array of sorted nodes | ||
| `return_sorted=False`: Ordered indices of nodes |
There was a problem hiding this comment.
The Returns section format is inconsistent. It describes two different return cases using the parameter name (return_sorted) rather than clearly indicating the return type. Consider reformatting this as:
"Returns:
np.ndarray: When return_sorted is False, returns ordered indices of nodes.
When return_sorted is True, returns array of sorted nodes."
| T: First element in `choices` that is not in `used` | ||
| None: If all nearest neighbors have been visited, or `-1` in choices. |
There was a problem hiding this comment.
The Returns section of the docstring lists two separate items (T and None) on different lines, which doesn't align with typical docstring conventions. Consider reformatting as:
"Returns:
T | None: First element in choices that is not in used, or None if all nearest neighbors have been visited or if -1 is encountered in choices."
| T: First element in `choices` that is not in `used` | |
| None: If all nearest neighbors have been visited, or `-1` in choices. | |
| T | None: First element in `choices` that is not in `used`, or None if all nearest neighbors have been visited or if `-1` is encountered in choices. |
| k: K-Nearest neighbors selection | ||
|
|
||
| Returns: | ||
| Ordered indices of data of the shape `(N,)`. |
There was a problem hiding this comment.
The Returns section states it returns "Ordered indices of data of the shape (N,)", but examining the function implementation shows it actually returns the sorted data array (coordinates), not indices. At line 175, it concatenates data[nodes] and the recursive result, which are coordinate arrays. This docstring should be corrected to indicate it returns sorted coordinates, not indices.
| Ordered indices of data of the shape `(N,)`. | |
| np.ndarray: Array of shape `(N, 2)` containing the input coordinates ordered | |
| along the greedy TSP path. |
| # TODO: Add docstrings | ||
| # return the first element not present in a list | ||
| def _get_closest(used, choices, world_size): | ||
| def _get_closest(used: list[T], choices: list[T], world_size: Any) -> Union[T, None]: |
There was a problem hiding this comment.
The type hint for the choices parameter indicates list[T], but the function is called with a numpy array (see line 160 where knn_index[current_node] is passed). While the function works due to duck typing, the type hint should be more accurate. Consider using list[T] | np.ndarray or a more general sequence type to reflect actual usage.
| def _get_closest(used: list[T], choices: list[T], world_size: Any) -> Union[T, None]: | |
| def _get_closest(used: list[T], choices: Union[list[T], np.ndarray], world_size: Any) -> Union[T, None]: |
| def tsp_greedy_solve(node_list, k=100, return_sorted=False): | ||
| def tsp_greedy_solve( | ||
| node_list: np.ndarray, k: int = 100, return_sorted: bool = False | ||
| ) -> Union[np.ndarray, tuple[np.ndarray, np.ndarray]]: |
There was a problem hiding this comment.
The return type annotation indicates tuple[np.ndarray, np.ndarray] as one possibility, but the function never returns a tuple. When return_sorted=False, it returns the result of _get_nodes which is a list[int]. The return type should be np.ndarray | list[int] instead.
| ) -> Union[np.ndarray, tuple[np.ndarray, np.ndarray]]: | |
| ) -> Union[np.ndarray, list[int]]: |
| typically yields a shorter traversal path than a random ordering, but does | ||
| not guarantee an optimal Traveling Salesperson solution. | ||
|
|
||
|
|
There was a problem hiding this comment.
There's an extra blank line in the docstring. Remove one of the blank lines to maintain consistent formatting with other docstrings in the file.
| # TODO: Add type hints | ||
| # TODO: Add docstrings |
There was a problem hiding this comment.
The TODOs indicate that type hints have been added, but the TODOs are still present in the code. These TODO comments should be removed since the tasks have been completed.
| # TODO: Add type hints | |
| # TODO: Add docstrings |
| """Calculate the length of a path based on a list of coordinates | ||
|
|
||
| Args: | ||
| data (np.array): Array of shape `(N, 2)` containing a list of coordinates |
There was a problem hiding this comment.
Lukas Heumos gave me the best practice advice that if type hints are used in the function declaration they should not also be included in the docstrings as this is redundant information that makes it harder to read and also to maintain up to date.
| def tsp_greedy_solve( | ||
| node_list: np.ndarray, k: int = 100, return_sorted: bool = False | ||
| ) -> Union[np.ndarray, tuple[np.ndarray, np.ndarray]]: | ||
| """Find an approximation of the closest path through a list of coordinates |
There was a problem hiding this comment.
| """Find an approximation of the closest path through a list of coordinates | |
| """Find an approximation of the shortest path through a list of coordinates |
There was a problem hiding this comment.
I think shortest is more accurate than closest. I'm not sure why that was used in the docs in the first place.
Add docstrings to path optimization methods.