-
Notifications
You must be signed in to change notification settings - Fork 579
feat: Add connector-ops skill and SharePoint/OneDrive connector #1722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (1)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including You can disable this status message by setting the
WalkthroughAdds a connector-ops skill (documentation, templates, scripts), a new SharePoint/OneDrive fsspec filesystem connector with JSON schema and tests, frontend Microsoft OAuth support, backend Azure AD tenant auth config and connector OAuth handling tweaks, plus dependency, sample env, and minor fixes. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant SharePointFS as SharePointFS (Connector)
participant Graph as Graph API (Office365)
participant Drive as Drive/Library
participant fsspec as fsspec (File API)
Client->>SharePointFS: get_fsspec_fs()
Note over SharePointFS: lazy, fork-safe init
SharePointFS->>Graph: obtain authenticated context (client creds or OAuth token)
Graph-->>SharePointFS: authenticated client
SharePointFS->>Graph: resolve drive/site
Graph-->>SharePointFS: drive reference
SharePointFS-->>Client: SharePointFileSystem instance
Client->>fsspec: read_bytes(path)
fsspec->>SharePointFS: _open(path) / _download_file
SharePointFS->>Graph: fetch item metadata / bytes (range)
Graph-->>SharePointFS: metadata / bytes
SharePointFS-->>fsspec: SharePointFile (provides bytes)
fsspec-->>Client: file content
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Nitpick comments (9)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/static/json_schema.json (1)
5-8: Consider stricter validation for authentication requirements.Currently only
connectorNameandclient_idare required, but a valid SharePoint connection also needs authentication credentials. The schema should enforce that users provide either:
client_secret(for app-only/client credentials flow, which also requirestenant_id), ORaccess_token(for user-delegated flow)Without this validation, users could create a connector missing required auth fields.
🔎 Suggested approach using oneOf
{ "title": "SharePoint / OneDrive", "type": "object", "allOf": [ { "required": ["connectorName", "client_id"], "properties": { ... } }, { "oneOf": [ { "title": "App-Only (Client Credentials)", "required": ["tenant_id", "client_secret"], "properties": { ... } }, { "title": "User-Delegated (OAuth)", "required": ["access_token"], "properties": { ... } } ] } ] }.claude/skills/connector-ops/assets/templates/test_mock_template.py (1)
208-231: Consider documenting the__new__pattern usage.Using
__new__to create uninitialized instances for testingsql_to_db_mappingis valid but unusual. Consider adding a comment explaining why this pattern is used (to test the method without requiring a full connection configuration).🔎 Suggested documentation
def test_sql_to_db_mapping_string(self): """Test type mapping for strings.""" + # Use __new__ to create instance without calling __init__ + # since sql_to_db_mapping doesn't require connection state connector = {ClassName}.__new__({ClassName}) result = connector.sql_to_db_mapping("test") self.assertIsInstance(result, str).claude/skills/connector-ops/assets/templates/test_integration_template.py (1)
91-133: SQL f-strings are safe here but pattern could mislead.The f-string SQL queries using
test_tableare safe since the variable is hardcoded in the test. However, this pattern could mislead developers adapting the template to use user input in SQL, creating injection vulnerabilities. Consider adding a comment noting this is safe only because the table name is controlled.🔎 Suggested documentation
def test_table_operations(self): """Test create, insert, select, drop operations.""" connector = {ClassName}(self.config) engine = connector.get_engine() + # Note: Using f-string for table name is safe here because test_table + # is a hardcoded constant. Never use f-strings with user input in SQL. test_table = "_unstract_integration_test".claude/skills/connector-ops/assets/templates/queue_template.py (1)
107-116: Redundantping()call intest_credentials.
get_engine()already callsping()at line 108 to test the connection. Callingping()again intest_credentials()at line 130 is redundant.🔎 Proposed simplification
def test_credentials(self) -> bool: """ Test queue credentials. Returns: True if connection successful Raises: ConnectorError: If connection fails """ try: - client = self.get_engine() - client.ping() + self.get_engine() # Already pings during connection return True except Exception as e: raise ConnectorError( f"Connection test failed: {str(e)}", treat_as_user_message=True ) from eAlso applies to: 128-136
.claude/skills/connector-ops/scripts/verify_connector.py (1)
247-256: Path resolution fallback may silently use an incorrect directory.When the script cannot find
unstract/connectorsrelative to itself, it falls back toPath.cwd(). Ifsrc/unstract/connectorsdoesn't exist there either, it exits. However, if it does exist but is a different project, the script would proceed with the wrong codebase.🔎 Proposed improvement
# Find base path (unstract/connectors) script_dir = Path(__file__).parent base_path = script_dir.parent.parent.parent.parent / "unstract/connectors" if not base_path.exists(): # Try relative to current working directory base_path = Path.cwd() if not (base_path / "src/unstract/connectors").exists(): - print("Could not find connectors base path") + print("Could not find connectors base path.") + print("Run this script from the unstract/connectors directory or ensure") + print("the repository structure is correct.") sys.exit(1) + print(f"Warning: Using CWD as base path: {base_path}").claude/skills/connector-ops/references/test_patterns.md (1)
7-23: Add language specifier to fenced code block.The fenced code block showing directory structure should have a language specifier for consistent formatting.
🔎 Proposed fix
-``` +```text tests/ ├── __init__.py ├── test_connectorkit.py # Registry testsunstract/connectors/tests/filesystems/test_sharepoint_fs.py (1)
57-65: Consider testing through public interfaces.Lines 63-65 access internal attributes (
_site_url,_tenant_id,_client_id) to verify initialization. While acceptable for unit tests, consider whether testing through public behavior would provide better encapsulation.unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py (2)
51-53: Document the upload stub method.The
finalparameter is unused because uploads are handled bySharePointFileSystem.write_bytes(). Consider adding a docstring to clarify this is a required interface method.Suggested documentation
def _upload_chunk(self, final: bool = False) -> bool: - """Upload is handled by write_bytes.""" + """Upload is handled by write_bytes. + + This method is required by AbstractBufferedFile interface but not used + since SharePointFileSystem.write_bytes() handles all uploads directly. + + Args: + final: Unused, required by interface + + Returns: + Always True + """ return True
195-197: Uselogging.exceptionfor better error context.Replace
logging.errorwithlogging.exceptionto automatically include the traceback, which will aid debugging.Proposed fix
return results except Exception as e: - logger.error(f"Error listing path {path}: {e}") + logger.exception(f"Error listing path {path}: {e}") raise
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
⛔ Files ignored due to path filters (3)
backend/uv.lockis excluded by!**/*.lockfrontend/public/icons/connector-icons/SharePoint.pngis excluded by!**/*.pngunstract/connectors/uv.lockis excluded by!**/*.lock
📒 Files selected for processing (18)
.claude/skills/connector-ops/SKILL.md.claude/skills/connector-ops/assets/templates/database_template.py.claude/skills/connector-ops/assets/templates/filesystem_template.py.claude/skills/connector-ops/assets/templates/init_template.py.claude/skills/connector-ops/assets/templates/json_schema_template.json.claude/skills/connector-ops/assets/templates/queue_template.py.claude/skills/connector-ops/assets/templates/test_integration_template.py.claude/skills/connector-ops/assets/templates/test_mock_template.py.claude/skills/connector-ops/references/connector_patterns.md.claude/skills/connector-ops/references/json_schema_examples.md.claude/skills/connector-ops/references/test_patterns.md.claude/skills/connector-ops/scripts/fetch_logo.py.claude/skills/connector-ops/scripts/verify_connector.pyunstract/connectors/pyproject.tomlunstract/connectors/src/unstract/connectors/filesystems/sharepoint/__init__.pyunstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.pyunstract/connectors/src/unstract/connectors/filesystems/sharepoint/static/json_schema.jsonunstract/connectors/tests/filesystems/test_sharepoint_fs.py
🧰 Additional context used
🧬 Code graph analysis (7)
.claude/skills/connector-ops/assets/templates/test_integration_template.py (3)
.claude/skills/connector-ops/assets/templates/database_template.py (4)
test_credentials(132-153)get_engine(87-130)execute(155-175)sql_to_db_mapping(177-204).claude/skills/connector-ops/assets/templates/filesystem_template.py (1)
test_credentials(123-142).claude/skills/connector-ops/assets/templates/queue_template.py (2)
test_credentials(118-136)get_engine(85-116)
unstract/connectors/tests/filesystems/test_sharepoint_fs.py (1)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py (15)
SharePointFS(386-603)get_id(422-423)get_name(426-427)get_description(430-431)can_read(452-453)can_write(448-449)requires_oauth(456-457)python_social_auth_backend(460-461)get_json_schema(438-445)get_icon(434-435)is_dir_by_metadata(546-555)extract_metadata_file_hash(519-544)extract_modified_date(557-592)get_connector_root_dir(595-603)ls(180-197)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/__init__.py (1)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py (1)
SharePointFS(386-603)
.claude/skills/connector-ops/scripts/verify_connector.py (4)
unstract/connectors/src/unstract/connectors/connectorkit.py (2)
connectors(22-23)get_connectors_list(74-113).claude/skills/connector-ops/assets/templates/database_template.py (1)
get_id(46-47).claude/skills/connector-ops/assets/templates/filesystem_template.py (1)
get_id(51-52).claude/skills/connector-ops/assets/templates/queue_template.py (1)
get_id(44-45)
.claude/skills/connector-ops/assets/templates/queue_template.py (3)
unstract/connectors/src/unstract/connectors/queues/unstract_queue.py (1)
lindex(90-91)unstract/connectors/src/unstract/connectors/exceptions.py (1)
ConnectorError(18-35)unstract/core/src/unstract/core/cache/redis_queue_client.py (2)
lpush(64-74)brpop(133-147)
.claude/skills/connector-ops/assets/templates/filesystem_template.py (2)
unstract/connectors/src/unstract/connectors/filesystems/unstract_file_system.py (1)
UnstractFileSystem(17-353)unstract/connectors/src/unstract/connectors/exceptions.py (1)
ConnectorError(18-35)
.claude/skills/connector-ops/assets/templates/database_template.py (2)
unstract/connectors/src/unstract/connectors/databases/unstract_db.py (1)
UnstractDB(16-340)unstract/connectors/src/unstract/connectors/exceptions.py (1)
ConnectorError(18-35)
🪛 Checkov (3.2.334)
.claude/skills/connector-ops/assets/templates/json_schema_template.json
[medium] 24-25: Basic Auth Credentials
(CKV_SECRET_4)
🪛 markdownlint-cli2 (0.18.1)
.claude/skills/connector-ops/references/test_patterns.md
7-7: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.10)
.claude/skills/connector-ops/assets/templates/test_integration_template.py
17-17: Expected an identifier
(invalid-syntax)
17-17: Expected an identifier
(invalid-syntax)
17-17: Expected an identifier
(invalid-syntax)
17-17: Simple statements must be separated by newlines or semicolons
(invalid-syntax)
17-17: Expected one or more symbol names after import
(invalid-syntax)
25-25: Expected :, found {
(invalid-syntax)
25-25: Simple statements must be separated by newlines or semicolons
(invalid-syntax)
25-26: Expected an expression
(invalid-syntax)
26-26: Unexpected indentation
(invalid-syntax)
218-218: Expected a statement
(invalid-syntax)
218-218: Expected :, found {
(invalid-syntax)
218-218: Simple statements must be separated by newlines or semicolons
(invalid-syntax)
218-219: Expected an expression
(invalid-syntax)
219-219: Unexpected indentation
(invalid-syntax)
244-244: Expected a statement
(invalid-syntax)
.claude/skills/connector-ops/assets/templates/init_template.py
10-10: Expected import, found {
(invalid-syntax)
10-10: Simple statements must be separated by newlines or semicolons
(invalid-syntax)
10-10: Expected one or more symbol names after import
(invalid-syntax)
.claude/skills/connector-ops/assets/templates/test_mock_template.py
15-15: Expected an identifier
(invalid-syntax)
15-15: Expected an identifier
(invalid-syntax)
15-15: Expected an identifier
(invalid-syntax)
15-15: Simple statements must be separated by newlines or semicolons
(invalid-syntax)
15-15: Expected one or more symbol names after import
(invalid-syntax)
19-19: Expected :, found {
(invalid-syntax)
19-20: Expected an expression
(invalid-syntax)
20-20: Unexpected indentation
(invalid-syntax)
233-233: Expected a statement
(invalid-syntax)
.claude/skills/connector-ops/scripts/fetch_logo.py
43-43: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
44-44: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
69-69: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
70-70: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
98-98: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
99-99: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
126-126: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
127-127: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
153-153: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
154-154: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.
(S310)
unstract/connectors/tests/filesystems/test_sharepoint_fs.py
252-252: Do not catch blind exception: Exception
(BLE001)
.claude/skills/connector-ops/scripts/verify_connector.py
134-134: Do not catch blind exception: Exception
(BLE001)
137-137: Do not catch blind exception: Exception
(BLE001)
172-172: Do not catch blind exception: Exception
(BLE001)
192-192: subprocess call: check for execution of untrusted input
(S603)
216-216: subprocess call: check for execution of untrusted input
(S603)
.claude/skills/connector-ops/assets/templates/queue_template.py
21-21: Expected an identifier
(invalid-syntax)
21-22: Expected an expression
(invalid-syntax)
22-22: Unexpected indentation
(invalid-syntax)
96-96: Expected one or more symbol names after import
(invalid-syntax)
99-99: Expected an identifier
(invalid-syntax)
.claude/skills/connector-ops/assets/templates/filesystem_template.py
25-25: Expected an identifier
(invalid-syntax)
25-26: Expected an expression
(invalid-syntax)
26-26: Unexpected indentation
(invalid-syntax)
105-105: Expected a module name
(invalid-syntax)
105-105: Simple statements must be separated by newlines or semicolons
(invalid-syntax)
105-105: Expected one or more symbol names after import
(invalid-syntax)
.claude/skills/connector-ops/assets/templates/database_template.py
21-21: Expected an identifier
(invalid-syntax)
21-22: Expected an expression
(invalid-syntax)
22-22: Unexpected indentation
(invalid-syntax)
95-95: Expected one or more symbol names after import
(invalid-syntax)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py
51-51: Unused method argument: final
(ARG002)
106-106: Local variable credentials is assigned to but never used
Remove assignment to unused variable credentials
(F841)
125-128: Avoid specifying long messages outside the exception class
(TRY003)
180-180: Unused method argument: kwargs
(ARG002)
194-194: Consider moving this statement to an else block
(TRY300)
196-196: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
242-242: Unused method argument: kwargs
(ARG002)
253-253: Unused method argument: kwargs
(ARG002)
257-257: Consider moving this statement to an else block
(TRY300)
258-258: Do not catch blind exception: Exception
(BLE001)
266-266: Do not catch blind exception: Exception
(BLE001)
274-274: Do not catch blind exception: Exception
(BLE001)
277-277: Unused method argument: kwargs
(ARG002)
326-326: Unused method argument: kwargs
(ARG002)
334-334: Unused method argument: kwargs
(ARG002)
344-344: Unused method argument: recursive
(ARG002)
344-344: Unused method argument: kwargs
(ARG002)
366-366: Do not catch blind exception: Exception
(BLE001)
499-502: Avoid specifying long messages outside the exception class
(TRY003)
500-500: Use explicit conversion flag
Replace with conversion flag
(RUF010)
512-512: Consider moving this statement to an else block
(TRY300)
514-517: Avoid specifying long messages outside the exception class
(TRY003)
515-515: Use explicit conversion flag
Replace with conversion flag
(RUF010)
534-534: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
595-595: Unused static method argument: kwargs
(ARG004)
🔇 Additional comments (24)
.claude/skills/connector-ops/SKILL.md (2)
1-11: Well-structured skill documentation.The skill document provides comprehensive guidance for connector lifecycle management with clear sections for ADD, REMOVE, and MODIFY operations. The architecture overview and key files table are helpful for understanding the connector ecosystem.
459-467: Helpful operational notes.The important notes section covers critical concerns like fork safety for gRPC, UUID consistency, schema backwards compatibility, and icon naming conventions. These are valuable guardrails for connector development.
.claude/skills/connector-ops/assets/templates/json_schema_template.json (1)
1-68: Good JSON Schema template structure.The template correctly uses
allOfwithoneOfto offer mutually exclusive connection methods (URL vs parameters). Theformat: "password"is appropriately applied to sensitive fields. Placeholders ({display_name},{protocol},{default_port}) are clearly marked for substitution.The Checkov CKV_SECRET_4 warning about line 24-25 is a false positive—the description merely documents the URL format pattern, not actual credentials.
.claude/skills/connector-ops/assets/templates/init_template.py (1)
1-18: Template structure is correct.The template provides a clear pattern for connector
__init__.pyfiles with appropriate placeholders. The docstring documents all substitution points. The Ruff syntax errors are expected false positives since{connector_name}and{ClassName}are intentional placeholders, not valid Python identifiers..claude/skills/connector-ops/references/json_schema_examples.md (1)
590-609: Helpful field format and type reference tables.The reference tables at the end provide a quick lookup for UI rendering formats and JSON Schema types. This is valuable for developers creating new connector schemas.
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/static/json_schema.json (1)
16-20: Good handling of personal vs business account distinction.The
is_personalboolean with clear description helps users understand the difference between OneDrive Personal and SharePoint/OneDrive for Business configurations..claude/skills/connector-ops/assets/templates/test_mock_template.py (1)
1-18: Comprehensive mock test template.The template provides good coverage of connector functionality including static methods, initialization with both config styles, connection handling, credential validation, and query execution. The placeholder documentation is clear.
The Ruff syntax errors are expected false positives since the placeholders are intentional.
.claude/skills/connector-ops/assets/templates/test_integration_template.py (1)
1-25: Well-designed integration test template.The template uses
@unittest.skipUnlesswith environment variable checks appropriately, allowing tests to be skipped in CI while running locally when services are available. The environment variable documentation in the class docstring is helpful..claude/skills/connector-ops/scripts/fetch_logo.py (2)
26-28: Good normalization function.The
normalize_namefunction provides consistent name handling across all logo sources by removing non-alphanumeric characters and lowercasing.
168-208: Well-structured main function with graceful degradation.The script tries multiple sources in priority order and provides helpful guidance when no logo is found. The exit codes (0 for success, 1 for failure) are appropriate.
The Ruff S310 warnings about URL scheme auditing are low risk since all URLs are constructed from hardcoded
https://base URLs combined with normalized service names. This is a developer utility script, not production code handling untrusted input..claude/skills/connector-ops/scripts/verify_connector.py (3)
28-54: LGTM!The directory structure validation is thorough, checking for the connector directory and all required files including the JSON schema.
57-92: LGTM!The metadata validation correctly checks for required fields and ensures
is_activeisTruefor the connector to be registered.
95-140: LGTM!The connector class validation properly checks for required static methods and attempts to invoke them, which will catch common implementation errors early.
.claude/skills/connector-ops/assets/templates/filesystem_template.py (1)
92-121: LGTM!The
get_fsspec_fsimplementation correctly uses lazy initialization with double-checked locking for thread safety and fork safety (importing inside the method). This follows the patterns documented inconnector_patterns.md..claude/skills/connector-ops/assets/templates/database_template.py (1)
87-131: LGTM!The
get_engineimplementation correctly uses lazy imports for fork safety and handles both connection URL and individual parameter modes. The SSL configuration is properly guarded.unstract/connectors/src/unstract/connectors/filesystems/sharepoint/__init__.py (1)
1-11: LGTM!The module correctly exports
SharePointFSand defines the requiredmetadatadictionary with all fields needed for Connectorkit registration (name,version,connector,description,is_active)..claude/skills/connector-ops/references/test_patterns.md (1)
27-173: LGTM!The mock-based test template is comprehensive, covering static methods, ID format validation, JSON schema validation, initialization modes, engine retrieval, credential testing, query execution, and type mapping—all essential areas for connector testing.
.claude/skills/connector-ops/references/connector_patterns.md (1)
1-376: LGTM!Excellent reference documentation covering essential patterns (fork-safe initialization, connection URL vs parameters, SSL/TLS, credential testing) and important anti-patterns (module-level heavy imports, hardcoded credentials, changing connector IDs). The code examples are practical and follow established best practices.
unstract/connectors/tests/filesystems/test_sharepoint_fs.py (1)
1-275: Comprehensive test coverage for SharePoint connector.The test suite provides excellent coverage of both unit and integration scenarios:
- Static metadata validation (ID, name, description, capabilities)
- JSON schema completeness
- Multiple initialization modes (client credentials, OAuth, personal OneDrive)
- Metadata extraction helpers (hash, modified date, directory detection)
- Path formatting edge cases
- Integration tests properly gated by environment variables
The bare Exception catch at line 252 is appropriate for gracefully skipping tests when files don't exist.
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py (5)
344-347: Clarify or implement recursive deletion behavior.The
recursiveparameter is accepted but never used in the implementation. Verify whether the SharePoint API automatically handles recursive deletion, or if this needs explicit implementation.If SharePoint automatically handles recursive deletion when deleting a folder, document this behavior:
def rm(self, path: str, recursive: bool = False, **kwargs: Any) -> None: - """Remove file or directory.""" + """Remove file or directory. + + Args: + path: Path to remove + recursive: Ignored - SharePoint API always recursively deletes folders + **kwargs: Additional arguments (unused) + """ item = self._get_item_by_path(path) item.delete_object().execute_query()Otherwise, implement the check:
def rm(self, path: str, recursive: bool = False, **kwargs: Any) -> None: """Remove file or directory.""" + if not recursive and self.isdir(path): + # Check if directory is empty + contents = self.ls(path, detail=False) + if contents: + raise ValueError(f"Directory {path} is not empty. Use recursive=True") + item = self._get_item_by_path(path) item.delete_object().execute_query()
399-420: Clean initialization and settings management.The lazy initialization pattern with proper locking is well-implemented. Settings are extracted and validated upfront, with the actual filesystem created on-demand.
463-504: Fork-safe and thread-safe lazy initialization.The double-checked locking with
_SHAREPOINT_INIT_LOCKproperly ensures thread-safety, and importing the Office365 library within the lock provides fork-safety. The tenant resolution logic for personal vs. business accounts is correct.
519-592: Robust metadata extraction with graceful fallbacks.The metadata extraction methods (
extract_metadata_file_hash,is_dir_by_metadata,extract_modified_date) handle multiple field names and data types with appropriate fallbacks:
- Hash extraction tries multiple fields (quickXorHash, sha256Hash, cTag, eTag) with proper eTag cleanup
- Date extraction handles both datetime objects and ISO strings with timezone normalization
- Logging provides helpful debugging context when fields are missing
594-603: Appropriate path normalization for SharePoint.The root directory formatting correctly strips leading slashes and adds a trailing slash, which aligns with SharePoint path conventions.
.claude/skills/connector-ops/assets/templates/test_integration.py.template
Show resolved
Hide resolved
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py
Show resolved
Hide resolved
chore: remove claude artifacts from gitignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
.claude/skills/connector-ops/assets/templates/filesystem.py.template (1)
144-190: LGTM for existing methods; missingextract_modified_datealready flagged.The
extract_metadata_file_hashandis_dir_by_metadatamethods handle multiple storage backend patterns appropriately. The missingextract_modified_dateabstract method was already identified in a previous review..claude/skills/connector-ops/assets/templates/database.py.template (1)
177-204: Previous concern about missing abstract methods may still apply.As noted in an earlier review, this template may be missing required abstract methods (
prepare_multi_column_migrationandget_create_table_base_query) from theUnstractDBbase class. If these are indeed required, connectors generated from this template will fail at instantiation.Verify whether these methods are abstract in
UnstractDB:#!/bin/bash # Check if prepare_multi_column_migration and get_create_table_base_query are abstract methods rg -nP -A3 '@(abc\.)?abstractmethod' unstract/connectors/src/unstract/connectors/databases/unstract_db.py | rg -A3 '(prepare_multi_column_migration|get_create_table_base_query)'.claude/skills/connector-ops/assets/templates/queue.py.template (1)
1-199: Previous concern about missing abstract methods may still apply.As noted in an earlier review, this template may be missing several required abstract methods from
UnstractQueue:lset,llen,lindex,keys,lrange, anddequeue_batch. If these are indeed abstract requirements, connectors generated from this template will fail at instantiation.Verify whether these methods are abstract in
UnstractQueue:#!/bin/bash # Check if the mentioned methods are abstract in UnstractQueue base class rg -nP -A3 '@(abc\.)?abstractmethod' unstract/connectors/src/unstract/connectors/queues/unstract_queue.py | rg -A3 '(lset|llen|lindex|keys|lrange|dequeue_batch)'.claude/skills/connector-ops/SKILL.md (1)
274-279: Nested code block syntax issue remains unresolved.As noted in a previous review, the bash code block nested inside the report template will cause malformed markdown rendering.
🧹 Nitpick comments (2)
.claude/skills/connector-ops/assets/templates/filesystem.py.template (1)
66-74: Specify explicit encoding when opening files.The
open()call should specifyencoding="utf-8"to avoid potential issues on systems with different default encodings, and it's a best practice for reading JSON files.🔎 Proposed fix
@staticmethod def get_json_schema() -> str: schema_path = os.path.join( os.path.dirname(__file__), "static", "json_schema.json" ) - with open(schema_path, "r") as f: + with open(schema_path, "r", encoding="utf-8") as f: return f.read().claude/skills/connector-ops/SKILL.md (1)
32-40: Add language identifiers to fenced code blocks for better rendering.Several fenced code blocks throughout the document lack language identifiers (
bash,text, ormarkdown), which can affect syntax highlighting and rendering in some viewers.🔎 Examples of affected locations
- Line 32: Directory structure block (use
textorbash)- Line 254: Report template block (use
textormarkdown)- Line 280: Closing fence of report template
- Line 333: Report template for REMOVE operation (use
textormarkdown)- Line 416: Report template for MODIFY operation (use
textormarkdown)Also applies to: 254-256, 280-281, 333-335, 416-417
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (8)
.claude/skills/connector-ops/SKILL.md.claude/skills/connector-ops/assets/templates/database.py.template.claude/skills/connector-ops/assets/templates/filesystem.py.template.claude/skills/connector-ops/assets/templates/init.py.template.claude/skills/connector-ops/assets/templates/queue.py.template.claude/skills/connector-ops/assets/templates/test_integration.py.template.claude/skills/connector-ops/assets/templates/test_mock.py.template.gitignore
💤 Files with no reviewable changes (1)
- .gitignore
✅ Files skipped from review due to trivial changes (1)
- .claude/skills/connector-ops/assets/templates/test_integration.py.template
🧰 Additional context used
🪛 Checkov (3.2.334)
.claude/skills/connector-ops/assets/templates/test_mock.py.template
[medium] 33-34: Basic Auth Credentials
(CKV_SECRET_4)
🪛 markdownlint-cli2 (0.18.1)
.claude/skills/connector-ops/SKILL.md
32-32: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
254-254: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
280-280: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
333-333: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
416-416: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (8)
.claude/skills/connector-ops/assets/templates/filesystem.py.template (4)
1-23: LGTM!The docstring clearly documents all placeholders, and the imports are appropriate for the template's functionality.
32-48: Good pattern: Lazy initialization for fork safety.The approach of storing settings in
__init__without initializing clients, combined with the thread lock for lazy initialization, correctly avoids gRPC/Celery fork issues.
92-121: LGTM!The double-checked locking pattern with thread safety is correctly implemented, and the dynamic import inside the method properly handles fork safety for Celery workers.
123-142: LGTM!The credential test approach of listing the bucket/root is a straightforward and effective way to validate access.
.claude/skills/connector-ops/assets/templates/init.py.template (1)
1-18: LGTM! Clean and consistent template structure.The template provides a clear, minimal structure for connector initialization modules with proper metadata exposure. The placeholder documentation is helpful for template users.
.claude/skills/connector-ops/assets/templates/database.py.template (1)
87-130: Good fork-safe connection pattern.The lazy import and comprehensive error handling with
ConnectorErrorprovide robust connection management. The support for both connection URL and individual parameter modes offers flexibility..claude/skills/connector-ops/assets/templates/test_mock.py.template (1)
1-234: Comprehensive test template with proper mocking patterns.The template provides excellent coverage across all connector capabilities: static metadata, initialization modes, connection handling, credential validation, query execution, and type mapping. The use of
@patchdecorators and mock objects follows best practices for unit testing..claude/skills/connector-ops/assets/templates/queue.py.template (1)
85-116: Good connection management with client pooling.The lazy initialization pattern with
_clientcaching and fork-safe imports provides efficient connection reuse. The test ping before returning the client ensures connection validity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py (3)
188-190: Uselogging.exceptionfor better error context.When logging before re-raising an exception, use
logging.exceptioninstead oflogging.errorto automatically include the stack trace.🔎 Proposed fix
except Exception as e: - logger.error(f"Error listing path {path}: {e}") + logger.exception(f"Error listing path {path}: {e}") raise
337-340: Consider validatingrecursiveparameter.The
recursiveparameter is accepted but not validated. Ifrecursive=Falseis passed and the path is a non-empty directory, the delete may fail or unexpectedly succeed depending on the SharePoint API behavior. Consider validating this parameter or documenting the behavior.
526-532: Minor optimization: Usedict.getfor cleaner key checking.The conditional
if field in metadata and metadata[field]:can be simplified usingdict.get.🔎 Proposed fix
for field in hash_fields: - if field in metadata and metadata[field]: - value = metadata[field] + if value := metadata.get(field): if isinstance(value, str): # Remove quotes and version info from eTags return value.strip('"').split(",")[0] return str(value)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to Reviews > Disable Cache setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (1)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py
🧰 Additional context used
🧬 Code graph analysis (1)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py (2)
unstract/connectors/src/unstract/connectors/exceptions.py (1)
ConnectorError(18-35)unstract/connectors/src/unstract/connectors/filesystems/unstract_file_system.py (1)
UnstractFileSystem(17-353)
🪛 Ruff (0.14.10)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py
51-51: Unused method argument: final
(ARG002)
118-121: Avoid specifying long messages outside the exception class
(TRY003)
173-173: Unused method argument: kwargs
(ARG002)
187-187: Consider moving this statement to an else block
(TRY300)
189-189: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
235-235: Unused method argument: kwargs
(ARG002)
246-246: Unused method argument: kwargs
(ARG002)
250-250: Consider moving this statement to an else block
(TRY300)
251-251: Do not catch blind exception: Exception
(BLE001)
259-259: Do not catch blind exception: Exception
(BLE001)
267-267: Do not catch blind exception: Exception
(BLE001)
270-270: Unused method argument: kwargs
(ARG002)
319-319: Unused method argument: kwargs
(ARG002)
327-327: Unused method argument: kwargs
(ARG002)
337-337: Unused method argument: recursive
(ARG002)
337-337: Unused method argument: kwargs
(ARG002)
359-359: Do not catch blind exception: Exception
(BLE001)
492-495: Avoid specifying long messages outside the exception class
(TRY003)
493-493: Use explicit conversion flag
Replace with conversion flag
(RUF010)
505-505: Consider moving this statement to an else block
(TRY300)
507-510: Avoid specifying long messages outside the exception class
(TRY003)
508-508: Use explicit conversion flag
Replace with conversion flag
(RUF010)
527-527: Unnecessary key check before dictionary access
Replace with dict.get
(RUF019)
588-588: Unused static method argument: kwargs
(ARG004)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (13)
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py (13)
1-28: LGTM! Clean module structure with appropriate thread safety.The module header, imports, and global initialization lock are well-organized. The thread-safe initialization pattern using
_SHAREPOINT_INIT_LOCKis appropriate for preventing concurrent client creation issues.
51-53: Unusedfinalparameter is acceptable.The
finalparameter is unused but likely required by theAbstractBufferedFileinterface. This is acceptable and doesn't need to be changed.
91-125: Good thread-safe initialization pattern.The double-checked locking pattern is correctly implemented for lazy initialization of the SharePoint context. The support for both client credentials and OAuth token flows is well-structured.
153-171: Path normalization logic is correct.The path handling correctly normalizes empty and root paths to the "root" string expected by the SharePoint API.
246-268: Exception handling in existence checks is appropriate.The
exists,isdir, andisfilemethods intentionally catch all exceptions and returnFalse, which is the expected behavior for these predicate methods. The unusedkwargsparameters are likely required by the fsspec interface.
270-302: Directory creation logic is correct.The
mkdirmethod properly handles parent directory creation whencreate_parents=True, and_create_foldercorrectly uses the Office365 API to create folders.
304-335: File operations implemented correctly.The file read/write operations properly interface with the SharePoint API. Unused
kwargsparameters are acceptable as they're likely required by the fsspec interface.
346-376: Walk implementation is correct.The recursive directory traversal properly handles
maxdepthand gracefully skips inaccessible directories by catching exceptions. This is the expected behavior for a walk operation.
414-454: Connector metadata methods are well-defined.The static methods provide appropriate metadata for the connector. The clarification that
requires_oauthreturnsFalsebecause both OAuth and client credentials are supported is helpful.
464-497: Lazy initialization with proper thread safety.The double-checked locking with the global
_SHAREPOINT_INIT_LOCKensures thread-safe initialization. The logic for determining the tenant ("consumers" for personal, tenant_id or "common" for business) is correct.
499-510: Credential testing approach is sound.Testing credentials by attempting to list the root directory is an effective way to verify both authentication and basic access permissions.
550-585: Robust date extraction with proper timezone handling.The
extract_modified_datemethod correctly tries multiple date fields and ensures all returned datetimes are timezone-aware and converted to UTC. The handling of ISO format with "Z" suffix is appropriate.
587-596: Root directory normalization is correct for SharePoint.The implementation correctly removes leading slashes and adds trailing slashes, which aligns with SharePoint path conventions. This differs from other filesystems and is appropriate.
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py
Show resolved
Hide resolved
gaya3-zipstack
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
…e data extraction (#1751) * changes for sharepoint * changes for sharepoint * changes for sharepoint * final changes * removing print * update readme * UI change * uv lock * Pr review ui * Pr review logs * Pr review BE * updating skill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
frontend/src/components/input-output/add-source-modal/AddSourceModal.jsx (1)
57-69: Inconsistent metadata reset values and missing timeout cleanup.Two concerns with this effect:
Inconsistent reset values:
setMetadata({})here (line 64), butsetMetadata(null)inhandleBack(line 155) andonCancel(line 182). This inconsistency could cause subtle bugs if any code path checks metadata differently (e.g.,if (metadata)vsif (Object.keys(metadata).length)).Missing cleanup: The
setTimeoutisn't cleared if the component unmounts oropenchanges rapidly, potentially causing state updates on an unmounted component.🔧 Proposed fix
useEffect(() => { + let timeoutId; if (!open) { - setTimeout(() => { + timeoutId = setTimeout(() => { // A delay added in order to avoid glitch in the UI when the modal is closed. setSelectedSourceId(null); setEditItemId(null); // Clear metadata to prevent stale data when adding a new connector - setMetadata({}); + setMetadata(null); }, 500); } getListOfSources(); + return () => { + if (timeoutId) { + clearTimeout(timeoutId); + } + }; }, [open]);Alternatively, if
{}is the intended initial state, updatehandleBackandonCancelto also usesetMetadata({})for consistency.backend/connector_v2/views.py (1)
89-100: Require metadata when OAuth key is absent for OAuth-capable connectors.On Line 92, OAuth-capable connectors can now proceed without
oauth_key. If the request also omitsconnector_metadata,_get_connector_metadatareturnsNone, andperform_createwill persist an invalid connector. Add an explicit guard so OAuth-capable connectors must provide eitheroauth_keyorconnector_metadata.🔧 Proposed fix
- oauth_key = self.request.query_params.get(ConnectorAuthKey.OAUTH_KEY) + oauth_key = self.request.query_params.get(ConnectorAuthKey.OAUTH_KEY) + supports_oauth = ConnectorInstance.supportsOAuth(connector_id=connector_id) - if ConnectorInstance.supportsOAuth(connector_id=connector_id) and oauth_key: + if supports_oauth and oauth_key: connector_metadata = ConnectorAuthHelper.get_oauth_creds_from_cache( cache_key=oauth_key, delete_key=False, # Don't delete yet - wait for successful operation ) if connector_metadata is None: raise MissingParamException(param=ConnectorAuthKey.OAUTH_KEY) else: connector_metadata = self.request.data.get(CIKey.CONNECTOR_METADATA) + if supports_oauth and not connector_metadata: + raise MissingParamException(param=CIKey.CONNECTOR_METADATA)
🤖 Fix all issues with AI agents
In `@backend/connector_processor/connector_processor.py`:
- Around line 139-145: The code calls
ConnectorAuthHelper.get_oauth_creds_from_cache(...) and may assign None to
credentials; update the oauth_key branch in connector_processor.py so that after
calling ConnectorAuthHelper.get_oauth_creds_from_cache(cache_key=oauth_key,
delete_key=False) you check if the returned credentials is None and immediately
raise OAuthTimeOut (or another clear user-facing exception) with a descriptive
message (e.g., "OAuth credentials expired or missing, re-authentication
required"); ensure you reference oauth_key, credentials, and
ConnectorAuthHelper.get_oauth_creds_from_cache in the change and import or use
the existing OAuthTimeOut exception class.
In `@frontend/src/components/input-output/configure-ds/ConfigureDs.jsx`:
- Around line 59-69: isOAuthMethodSelected currently reads component state
(formData) and can be stale when handleTestConnection is given updatedFormData;
change isOAuthMethodSelected to accept a data argument (e.g.,
isOAuthMethodSelected(data)) and use that instead of formData, then update calls
from handleTestConnection (and the other call at lines ~165-169) to pass the
submitted updatedFormData so the OAuth check uses the latest payload rather than
stale state.
In
`@unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py`:
- Around line 103-118: The OAuth validation requires both self._access_token and
self._refresh_token (has_oauth) but _get_context creates a token_provider that
only returns access_token and never uses refresh_token; either implement refresh
logic in _get_context/token_provider to accept and use self._refresh_token
(e.g., call GraphClient with a provider that can refresh tokens when expired) —
referencing _get_context, token_provider, GraphClient, self._access_token and
self._refresh_token — or relax the validation to set has_oauth =
bool(self._access_token) (and update any docs/comments) if refresh is managed
externally; choose one approach and make the corresponding change so validation
and token usage are consistent.
🧹 Nitpick comments (4)
frontend/src/components/input-output/add-source-modal/AddSourceModal.css (1)
1-10: Consider responsiveness and hardcoded values.The fixed 750px height and hardcoded 55px header subtraction work for the current layout but have limitations:
- On smaller viewports (laptops, tablets), 750px may exceed available height
- The 55px header assumption is tied to the current Ant Design modal header styling and could break on library updates
Consider adding a media query fallback or using
max-heightwith viewport units for better cross-device support.♻️ Suggested improvement for responsiveness
/* Prevent modal from shifting when switching between auth methods */ .add-source-modal .ant-modal-content { - height: 750px; + height: 750px; + max-height: 85vh; } .add-source-modal .ant-modal-body { height: calc(750px - 55px); /* Subtract header height */ + max-height: calc(85vh - 55px); overflow-y: auto; overflow-x: hidden; }.claude/skills/connector-ops/SKILL.md (1)
32-40: Missing language specifier on fenced code block.The directory structure code block should have a language specified for consistent rendering. Consider using
textorplaintextas the language identifier.🔧 Suggested fix
-``` +```text connector_name/ ├── __init__.py # Metadata dict with is_active flag.claude/skills/connector-ops/references/test_patterns.md (1)
7-23: Missing language specifier on directory structure code block.The directory structure code block should specify a language for consistent rendering.
🔧 Suggested fix
-``` +```text tests/ ├── __init__.pyunstract/connectors/tests/filesystems/test_sharepoint_fs.py (1)
268-289: Avoid persistent artifacts in integration tests.The integration test writes a fixed folder/file name and never cleans up, which can clutter real SharePoint/OneDrive drives and cause collisions across runs. Consider using a unique folder name and a
finallycleanup that removes the folder after the test.🧹 Example cleanup pattern
+ import uuid # Use simple folder name for easy verification - folder_name = "test-unit" + folder_name = f"test-unit-{uuid.uuid4().hex[:8]}" file_name = "sample.pdf" file_path = f"{folder_name}/{file_name}" @@ - try: + try: # Create folder (this will happen automatically when writing file) logger.info("Creating file: %s", file_path) fs.write_bytes(file_path, pdf_content) logger.info("Successfully wrote file to: %s", file_path) @@ self.assertTrue(fs.exists(file_path)) logger.info("Verified file exists") - - except Exception as e: - self.fail(f"Failed to write file: {e}") + except Exception as e: + self.fail(f"Failed to write file: {e}") + finally: + try: + fs.rm(folder_name, recursive=True) + except Exception: + pass
unstract/connectors/src/unstract/connectors/filesystems/sharepoint/sharepoint.py
Show resolved
Hide resolved
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|



What
connector-ops)Why
connector-opsskill provides Claude with structured guidance for developing, testing, and maintaining connectors in the Unstract ecosystemHow
Connector-Ops Skill
.claude/skills/connector-ops/SKILL.mdSharePoint Connector
SharePointFSclass extendingConnectorBaseget_fsspec_fs(),test_credentials(),get_connector_root_dir()Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
No - This PR only adds new functionality:
.claude/directory and only affects Claude Code sessionsDatabase Migrations
Env Config
Relevant Docs
Related Issues or PRs
Dependencies Versions
office365-rest-python-client>=2.5.0- Added tounstract/connectors/pyproject.tomlNotes on Testing
pytest unstract/connectors/tests/filesystems/test_sharepoint_fs.pyScreenshots
N/A - Backend/skill changes only
Checklist
I have read and understood the Contribution Guidelines.
🤖 Generated with Claude Code