Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(community): Added Reddit integration with tool and document loader #7300

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/unit-tests-integrations.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ jobs:
needs: get-changed-files
runs-on: ubuntu-latest
env:
PACKAGES: "anthropic,azure-openai,cloudflare,cohere,core,community,exa,google-common,google-gauth,google-genai,google-vertexai,google-vertexai-web,google-webauth,groq,mistralai,mongo,nomic,openai,pinecone,qdrant,redis,textsplitters,weaviate,yandex,baidu-qianfan"
PACKAGES: "anthropic,azure-openai,cloudflare,cohere,core,community,exa,google-common,google-gauth,google-genai,google-vertexai,google-vertexai-web,google-webauth,groq,mistralai,mongo,nomic,openai,pinecone,qdrant,redis,textsplitters,weaviate,yandex,baidu-qianfan,reddit"
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
matrix_length: ${{ steps.set-matrix.outputs.matrix_length }}
Expand Down
6 changes: 6 additions & 0 deletions docs/core_docs/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,12 @@ docs/integrations/retrievers/bm25.md
docs/integrations/retrievers/bm25.mdx
docs/integrations/retrievers/bedrock-knowledge-bases.md
docs/integrations/retrievers/bedrock-knowledge-bases.mdx
docs/integrations/toolkits/vectorstore.md
docs/integrations/toolkits/vectorstore.mdx
docs/integrations/toolkits/sql.md
docs/integrations/toolkits/sql.mdx
docs/integrations/toolkits/openapi.md
docs/integrations/toolkits/openapi.mdx
docs/integrations/text_embedding/togetherai.md
docs/integrations/text_embedding/togetherai.mdx
docs/integrations/text_embedding/pinecone.md
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
hide_table_of_contents: true
---

# Reddit

This example goes over how to load text from the posts of subreddits or Reddit users.
You will need to make a [Reddit Application](https://www.reddit.com/prefs/apps/) and initialize the loader with with your Reddit API credentials.

import CodeBlock from "@theme/CodeBlock";
import Example from "@examples/document_loaders/reddit.ts";

<CodeBlock language="typescript">{Example}</CodeBlock>
12 changes: 12 additions & 0 deletions docs/core_docs/docs/integrations/tools/reddit.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
hide_table_of_contents: true
---

# Reddit

This example goes over how to retrieve post(s) from a subreddit or from a particular user.
You will need to make a [Reddit Application](https://www.reddit.com/prefs/apps/) and initialize the tool with with your Reddit API credentials and user agent. Refer to https://support.reddithelp.com/hc/en-us/articles/16160319875092-Reddit-Data-API-Wiki on user agent format.

import CodeBlock from "@theme/CodeBlock";
import Example from "@examples/document_loaders/reddit.ts";
<CodeBlock language="typescript">{Example}</CodeBlock>
3 changes: 3 additions & 0 deletions examples/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ CLICKHOUSE_PORT=ADD_YOURS_HERE
CLICKHOUSE_USERNAME=ADD_YOURS_HERE
CLICKHOUSE_PASSWORD=ADD_YOURS_HERE
REDIS_URL=ADD_YOURS_HERE
REDDIT_CLIENT_ID=ADD_YOURS_HERE #https://www.reddit.com/prefs/apps
REDDIT_CLIENT_SECRET=ADD_YOURS_HERE #https://www.reddit.com/prefs/apps
REDDIT_USER_AGENT=ADD_YOURS_HERE #https://support.reddithelp.com/hc/en-us/articles/16160319875092-Reddit-Data-API-Wiki
SINGLESTORE_HOST=ADD_YOURS_HERE
SINGLESTORE_PORT=ADD_YOURS_HERE
SINGLESTORE_USERNAME=ADD_YOURS_HERE
Expand Down
29 changes: 29 additions & 0 deletions examples/src/document_loaders/reddit.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import { RedditPostsLoader } from "@langchain/community/document_loaders/web/reddit";

// load using 'subreddit' mode
const loader = new RedditPostsLoader({
clientId: "REDDIT_CLIENT_ID", // or load it from process.env.REDDIT_CLIENT_ID
clientSecret: "REDDIT_CLIENT_SECRET", // or load it from process.env.REDDIT_CLIENT_SECRET
userAgent: "REDDIT_USER_AGENT", // or load it from process.env.REDDIT_USER_AGENT
searchQueries: ["LangChain", "Langchaindev"],
mode: "subreddit",
categories: ["hot", "new"],
numberPosts: 5
});
const docs = await loader.load();
console.log({ docs });

// // or load using 'username' mode
// const loader = new RedditPostsLoader({
// clientId: "REDDIT_CLIENT_ID", // or load it from process.env.REDDIT_CLIENT_ID
// clientSecret: "REDDIT_CLIENT_SECRET", // or load it from process.env.REDDIT_CLIENT_SECRET
// userAgent: "REDDIT_USER_AGENT", // or load it from process.env.REDDIT_USER_AGENT
// searchQueries: ["AutoModerator"],
// mode: "username",
// categories: ["hot", "new"],
// numberPosts: 2
// });
// const docs = await loader.load();
// console.log({ docs });

// Note: Categories can be only of following value - "controversial" "hot" "new" "rising" "top"
34 changes: 34 additions & 0 deletions examples/src/tools/reddit.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import RedditSearchRun from "@langchain/community/tools/reddit";

// Retrieve a post from a subreddit

// Refer to doc linked below for how to set the userAgent.
// https://support.reddithelp.com/hc/en-us/articles/16160319875092-Reddit-Data-API-Wiki
// clientId, clientSecret and userAgent can be set in the environment variables
const search = new RedditSearchRun({
sortMethod: "relevance",
time: "all",
subreddit: "dankmemes",
limit: 1,
clientId: "REDDIT_CLIENT_ID", // or load from process.env.REDDIT_CLIENT_ID
clientSecret: "REDDIT_CLIENT_SECRET", // or load from process.env.REDDIT_CLIENT_SECRET
userAgent: "REDDIT_USER_AGENT" // or load from process.env.REDDIT_USER_AGENT
});

const post = await search.invoke("College");
console.log(post);

// Retrieve a post from a user

// const search = new RedditSearchRun({
// sortMethod: "relevance",
// time: "all",
// subreddit: "dankmemes",
// limit: 1,
// clientId: "REDDIT_CLIENT_ID", // or load from process.env.REDDIT_CLIENT_ID
// clientSecret: "REDDIT_CLIENT_SECRET", // or load from process.env.REDDIT_CLIENT_SECRET
// userAgent: "REDDIT_USER_AGENT" // or load from process.env.REDDIT_USER_AGENT
// });

// const post = await search.fetchUserPosts("REDDIT USER TO RETRIEVE POST FROM", 1, "all");
// console.log(post);
128 changes: 128 additions & 0 deletions langchain-core/src/utils/async_caller.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
"use strict";
var __importDefault = (this && this.__importDefault) || function (mod) {
return (mod && mod.__esModule) ? mod : { "default": mod };
};
Object.defineProperty(exports, "__esModule", { value: true });
exports.AsyncCaller = void 0;
const p_retry_1 = __importDefault(require("p-retry"));
const p_queue_1 = __importDefault(require("p-queue"));
const STATUS_NO_RETRY = [
400,
401,
402,
403,
404,
405,
406,
407,
409, // Conflict
];
// eslint-disable-next-line @typescript-eslint/no-explicit-any
const defaultFailedAttemptHandler = (error) => {
if (error.message.startsWith("Cancel") ||
error.message.startsWith("AbortError") ||
error.name === "AbortError") {
throw error;
}
// eslint-disable-next-line @typescript-eslint/no-explicit-any
if (error?.code === "ECONNABORTED") {
throw error;
}
const status =
// eslint-disable-next-line @typescript-eslint/no-explicit-any
error?.response?.status ?? error?.status;
if (status && STATUS_NO_RETRY.includes(+status)) {
throw error;
}
// eslint-disable-next-line @typescript-eslint/no-explicit-any
if (error?.error?.code === "insufficient_quota") {
const err = new Error(error?.message);
err.name = "InsufficientQuotaError";
throw err;
}
};
/**
* A class that can be used to make async calls with concurrency and retry logic.
*
* This is useful for making calls to any kind of "expensive" external resource,
* be it because it's rate-limited, subject to network issues, etc.
*
* Concurrent calls are limited by the `maxConcurrency` parameter, which defaults
* to `Infinity`. This means that by default, all calls will be made in parallel.
*
* Retries are limited by the `maxRetries` parameter, which defaults to 6. This
* means that by default, each call will be retried up to 6 times, with an
* exponential backoff between each attempt.
*/
class AsyncCaller {
constructor(params) {
Object.defineProperty(this, "maxConcurrency", {
enumerable: true,
configurable: true,
writable: true,
value: void 0
});
Object.defineProperty(this, "maxRetries", {
enumerable: true,
configurable: true,
writable: true,
value: void 0
});
Object.defineProperty(this, "onFailedAttempt", {
enumerable: true,
configurable: true,
writable: true,
value: void 0
});
Object.defineProperty(this, "queue", {
enumerable: true,
configurable: true,
writable: true,
value: void 0
});
this.maxConcurrency = params.maxConcurrency ?? Infinity;
this.maxRetries = params.maxRetries ?? 6;
this.onFailedAttempt =
params.onFailedAttempt ?? defaultFailedAttemptHandler;
const PQueue = "default" in p_queue_1.default ? p_queue_1.default.default : p_queue_1.default;
this.queue = new PQueue({ concurrency: this.maxConcurrency });
}
// eslint-disable-next-line @typescript-eslint/no-explicit-any
call(callable, ...args) {
return this.queue.add(() => (0, p_retry_1.default)(() => callable(...args).catch((error) => {
// eslint-disable-next-line no-instanceof/no-instanceof
if (error instanceof Error) {
throw error;
}
else {
throw new Error(error);
}
}), {
onFailedAttempt: this.onFailedAttempt,
retries: this.maxRetries,
randomize: true,
// If needed we can change some of the defaults here,
// but they're quite sensible.
}), { throwOnTimeout: true });
}
// eslint-disable-next-line @typescript-eslint/no-explicit-any
callWithOptions(options, callable, ...args) {
// Note this doesn't cancel the underlying request,
// when available prefer to use the signal option of the underlying call
if (options.signal) {
return Promise.race([
this.call(callable, ...args),
new Promise((_, reject) => {
options.signal?.addEventListener("abort", () => {
reject(new Error("AbortError"));
});
}),
]);
}
return this.call(callable, ...args);
}
fetch(...args) {
return this.call(() => fetch(...args).then((res) => (res.ok ? res : Promise.reject(res))));
}
}
exports.AsyncCaller = AsyncCaller;
3 changes: 3 additions & 0 deletions langchain/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ CLICKHOUSE_USERNAME=ADD_YOURS_HERE
CLICKHOUSE_PASSWORD=ADD_YOURS_HERE
FIGMA_ACCESS_TOKEN=ADD_YOURS_HERE
REDIS_URL=ADD_YOURS_HERE
REDDIT_CLIENT_ID=ADD_YOURS_HERE
REDDIT_CLIENT_SECRET=ADD_YOURS_HERE
REDDIT_USER_AGENT=ADD_YOURS_HERE
ROCKSET_API_KEY=ADD_YOURS_HERE
# defaults to "usw2a1" (oregon)
ROCKSET_REGION=ADD_YOURS_HERE
Expand Down
4 changes: 4 additions & 0 deletions libs/langchain-community/langchain.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ export const config = {
"tools/google_places": "tools/google_places",
"tools/google_routes": "tools/google_routes",
"tools/ifttt": "tools/ifttt",
"tools/reddit": "tools/reddit",
"tools/searchapi": "tools/searchapi",
"tools/searxng_search": "tools/searxng_search",
"tools/serpapi": "tools/serpapi",
Expand Down Expand Up @@ -288,6 +289,7 @@ export const config = {
"document_loaders/web/notionapi": "document_loaders/web/notionapi",
"document_loaders/web/pdf": "document_loaders/web/pdf",
"document_loaders/web/recursive_url": "document_loaders/web/recursive_url",
"document_loaders/web/reddit": "document_loaders/web/reddit",
"document_loaders/web/s3": "document_loaders/web/s3",
"document_loaders/web/sitemap": "document_loaders/web/sitemap",
"document_loaders/web/sonix_audio": "document_loaders/web/sonix_audio",
Expand Down Expand Up @@ -338,6 +340,7 @@ export const config = {
"tools/discord",
"tools/gmail",
"tools/google_calendar",
"tools/reddit",
"agents/toolkits/aws_sfn",
"agents/toolkits/stagehand",
"callbacks/handlers/llmonitor",
Expand Down Expand Up @@ -506,6 +509,7 @@ export const config = {
"document_loaders/web/taskade",
"document_loaders/web/notionapi",
"document_loaders/web/recursive_url",
"document_loaders/web/reddit",
"document_loaders/web/s3",
"document_loaders/web/sitemap",
"document_loaders/web/sonix_audio",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import { test } from "@jest/globals";
import { Document } from "@langchain/core/documents";
import { RedditPostsLoader } from "../web/reddit.js";

test.skip("Test RedditPostsLoader in subreddit mode", async () => {
const loader = new RedditPostsLoader({
clientId: process.env.REDDIT_CLIENT_ID!,
clientSecret: process.env.REDDIT_CLIENT_SECRET!,
userAgent: process.env.REDDIT_USER_AGENT!,
searchQueries: ["LangChain"],
mode: "subreddit",
categories: ["new"],
numberPosts: 2,
});
const documents = await loader.load();
expect(documents).toHaveLength(2);
expect(documents[0]).toBeInstanceOf(Document);
expect(documents[0].metadata.post_subreddit).toMatch("LangChain");
expect(documents[0].metadata.post_category).toMatch("new");
expect(documents[0].metadata.post_title).toBeTruthy();
expect(documents[0].metadata.post_score).toBeGreaterThanOrEqual(0);
expect(documents[0].metadata.post_id).toBeTruthy();
expect(documents[0].metadata.post_author).toBeTruthy();
expect(documents[0].metadata.post_url).toMatch(/^http/);
});

test.skip("Test RedditPostsLoader in username mode", async () => {
const loader = new RedditPostsLoader({
clientId: process.env.REDDIT_CLIENT_ID!,
clientSecret: process.env.REDDIT_CLIENT_SECRET!,
userAgent: process.env.REDDIT_USER_AGENT!,
searchQueries: ["AutoModerator"],
mode: "username",
categories: ["hot", "new"],
numberPosts: 5,
});
const documents = await loader.load();
expect(documents).toHaveLength(10);
expect(documents[0]).toBeInstanceOf(Document);
expect(documents[0].metadata.post_author).toMatch("AutoModerator");
expect(documents[0].metadata.post_category).toMatch("hot");
expect(documents[0].metadata.post_title).toBeTruthy();
expect(documents[0].metadata.post_score).toBeGreaterThanOrEqual(0);
expect(documents[0].metadata.post_id).toBeTruthy();
expect(documents[0].metadata.post_subreddit).toBeTruthy();
expect(documents[0].metadata.post_url).toMatch(/^http/);
});
Loading