Skip to content

Commit

Permalink
Add dropbox document loader docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Ismail-Bashir committed Nov 30, 2024
1 parent 9efc17e commit b2a1dda
Showing 1 changed file with 138 additions and 1 deletion.
Original file line number Diff line number Diff line change
@@ -1 +1,138 @@
// documentation goes here
---
hide_table_of_contents: true
sidebar_class_name: node-only
---

# Dropbox Loader

The `DropboxLoader` allows you to load documents from Dropbox into your LangChain applications. It retrieves files or directories from your Dropbox account and converts them into documents ready for processing.

## Overview

Dropbox is a file hosting service that brings all your files—traditional documents, cloud content, and web shortcuts—together in one place. With the `DropboxLoader`, you can seamlessly integrate Dropbox file retrieval into your projects.

## Setup

1. Create a dropbox app, using the [Dropbox App Console](https://www.dropbox.com/developers/apps/create).
2. Ensure the app has the `files.metadata.read`, `files.content.read` scope permissions:
3. Generate the access token from the Dropbox App Console.
4. To use this loader, you'll need to have Unstructured already set up and ready to use at an available URL endpoint. It can also be configured to run locally.
See the docs [here](https://www.dropbox.com/developers/apps/create) for information on how to do that.
5. Install the necessary packages:

```bash npm2yarn
npm install @langchain/community @langchain/core dropbox
```

## Usage

### Loading Specific Files

To load specific files from Dropbox, specify the file paths:

```typescript
import { DropboxLoader } from "@langchain/community/document_loaders/web/dropbox";
const loader = new DropboxLoader({
clientOptions: {
accessToken: "your-dropbox-access-token",
},
unstructuredOptions: {
apiUrl: "http://localhost:8000/general/v0/general", // Replace with your Unstructured API URL
},
filePaths: ["/path/to/file1.txt", "/path/to/file2.pdf"], // Replace with file paths on Dropbox.
});
const docs = await loader.load();
console.log(docs);
```
### Loading Files from a Directory
To load all files from a specific directory, provide the `folderPath` and set the `mode` to `"directory"`. Set `recursive` to `true` to traverse subdirectories:
```typescript
import { DropboxLoader } from "@langchain/community/document_loaders/web/dropbox";
const loader = new DropboxLoader({
clientOptions: {
accessToken: "your-dropbox-access-token",
},
unstructuredOptions: {
apiUrl: "http://localhost:8000/general/v0/general",
},
folderPath: "/path/to/folder",
recursive: true, // Load documents found in subdirectories
mode: "directory",
});
const docs = await loader.load();
console.log(docs);
```
### Streaming Documents
To process large datasets efficiently, use the `loadLazy` method to stream documents asynchronously:
```typescript
import { DropboxLoader } from "@langchain/community/document_loaders/web/dropbox";
const loader = new DropboxLoader({
clientOptions: {
accessToken: "your-dropbox-access-token",
},
unstructuredOptions: {
apiUrl: "http://localhost:8000/general/v0/general",
},
folderPath: "/large/dataset",
recursive: true,
mode: "directory",
});
for await (const doc of loader.loadLazy()) {
// Process each document as it's loaded
console.log(doc);
}
```
### Authentication with Environment Variables
You can set the `DROPBOX_ACCESS_TOKEN` environment variable instead of passing the access token in `clientOptions`:
```bash
export DROPBOX_ACCESS_TOKEN=your-dropbox-access-token
```
Then initialize the loader without specifying `accessToken`:
```typescript
import { DropboxLoader } from "@langchain/community/document_loaders/web/dropbox";
const loader = new DropboxLoader({
clientOptions: {},
unstructuredOptions: {
apiUrl: "http://localhost:8000/general/v0/general",
},
filePaths: ["/important/notes.txt"],
});
const docs = await loader.load();
console.log(docs[0].pageContent);
```
## Configuration Options
Here are the configuration options for the `DropboxLoader`:
| Option | Type | Description |
| --------------------- | ------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `clientOptions` | `DropboxOptions` | Configuration options for initializing the Dropbox client, including authentication details. Refer to the [Dropbox SDK Documentation](https://dropbox.github.io/dropbox-sdk-js/Dropbox.html#Dropbox__anchor) for more information. |
| `unstructuredOptions` | `UnstructuredLoaderOptions` | Options for the `UnstructuredLoader` used to process downloaded files. Includes the `apiUrl` for your Unstructured server. |
| `folderPath` | `string` (optional) | The path to the folder in Dropbox from which to load files. Defaults to the root folder (`""`) if not specified. |
| `filePaths` | `string[]` (optional) | An array of specific file paths in Dropbox to load. Required if `mode` is set to `"file"`. |
| `recursive` | `boolean` (optional) | Indicates whether to recursively traverse folders when `mode` is `"directory"`. Defaults to `false`. |
| `mode` | `"file"` or `"directory"` (optional) | The mode of operation. Set to `"file"` to load specific files or `"directory"` to load all files in a directory. Defaults to `"file"`. |
## API References
- [Dropbox SDK for JavaScript](https://github.com/dropbox/dropbox-sdk-js)

0 comments on commit b2a1dda

Please sign in to comment.