Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions apps/docs/components/ui/icon-mapping.ts
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,8 @@ export const blockTypeToIconMap: Record<string, IconComponent> = {
extend_v2: ExtendIcon,
fathom: FathomIcon,
file: DocumentIcon,
file_v2: DocumentIcon,
file_v3: DocumentIcon,
file_v4: DocumentIcon,
findymail: FindymailIcon,
firecrawl: FirecrawlIcon,
Expand Down Expand Up @@ -309,6 +311,7 @@ export const blockTypeToIconMap: Record<string, IconComponent> = {
iam: IAMIcon,
identity_center: IdentityCenterIcon,
image_generator: ImageIcon,
image_generator_v2: ImageIcon,
imap: MailServerIcon,
incidentio: IncidentioIcon,
infisical: InfisicalIcon,
Expand Down Expand Up @@ -341,6 +344,7 @@ export const blockTypeToIconMap: Record<string, IconComponent> = {
microsoft_planner: MicrosoftPlannerIcon,
microsoft_teams: MicrosoftTeamsIcon,
mistral_parse: MistralIcon,
mistral_parse_v2: MistralIcon,
mistral_parse_v3: MistralIcon,
monday: MondayIcon,
mongodb: MongoDBIcon,
Expand Down Expand Up @@ -421,6 +425,7 @@ export const blockTypeToIconMap: Record<string, IconComponent> = {
vercel: VercelIcon,
video_generator: VideoIcon,
video_generator_v2: VideoIcon,
video_generator_v3: VideoIcon,
vision: EyeIcon,
vision_v2: EyeIcon,
wealthbox: WealthboxIcon,
Expand Down
72 changes: 42 additions & 30 deletions apps/docs/content/docs/en/tools/image_generator.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,63 +6,75 @@ description: Generate images
import { BlockInfoCard } from "@/components/ui/block-info-card"

<BlockInfoCard
type="image_generator"
type="image_generator_v2"
color="#4D5FFF"
/>

{/* MANUAL-CONTENT-START:intro */}
[DALL-E](https://openai.com/dall-e-3) is OpenAI's advanced AI system designed to generate realistic images and art from natural language descriptions. As a state-of-the-art image generation model, DALL-E can create detailed and creative visuals based on text prompts, allowing users to transform their ideas into visual content without requiring artistic skills.
The Image Generator block creates images from text prompts using leading image generation providers. Choose OpenAI for GPT Image models, Google Gemini for Nano Banana models, or Fal.ai for a multi-model catalog that includes Nano Banana, GPT Image, Seedream, FLUX, and Grok Imagine.

With DALL-E, you can:
Use it to:

- **Generate realistic images**: Create photorealistic visuals from textual descriptions
- **Design conceptual art**: Transform abstract ideas into visual representations
- **Produce variations**: Generate multiple interpretations of the same prompt
- **Control artistic style**: Specify artistic styles, mediums, and visual aesthetics
- **Create detailed scenes**: Describe complex scenes with multiple elements and relationships
- **Visualize products**: Generate product mockups and design concepts
- **Illustrate ideas**: Turn written concepts into visual illustrations
- **Generate production images**: Create polished visuals from workflow prompts
- **Choose the right provider**: Route requests to OpenAI, Gemini, or Fal.ai based on model availability and cost
- **Control output shape**: Set provider-specific size, aspect ratio, resolution, quality, background, and output format options
- **Use advanced Fal.ai features**: Configure safety tolerance, safety checking, web search grounding, seeds, and thinking level when supported
- **Pass generated files downstream**: Use the returned image file or URL in later workflow steps

In Sim, the DALL-E integration enables your agents to generate images programmatically as part of their workflows. This allows for powerful automation scenarios such as content creation, visual design, and creative ideation. Your agents can formulate detailed prompts, generate corresponding images, and incorporate these visuals into their outputs or downstream processes. This integration bridges the gap between natural language processing and visual content creation, enabling your agents to communicate not just through text but also through compelling imagery. By connecting Sim with DALL-E, you can create agents that produce visual content on demand, illustrate concepts, generate design assets, and enhance user experiences with rich visual elements - all without requiring human intervention in the creative process.
In Sim, the Image Generator block lets agents create visual assets programmatically as part of automated workflows. This is useful for content creation, design mockups, product visuals, creative ideation, and any flow that needs generated imagery without a manual handoff.
{/* MANUAL-CONTENT-END */}


## Usage Instructions

Integrate Image Generator into the workflow. Can generate images using DALL-E 3, GPT Image 1, or GPT Image 2.
Generate images using OpenAI GPT Image, Google Nano Banana, or Fal.ai image models.



## Tools

### `openai_image`
### `image_generate`

Generate images using OpenAI
Generate images with OpenAI GPT Image, Google Nano Banana, or Fal.ai image models

#### Input

| Parameter | Type | Required | Description |
| --------- | ---- | -------- | ----------- |
| `model` | string | Yes | The model to use \(dall-e-3, gpt-image-1, or gpt-image-2\) |
| `prompt` | string | Yes | A text description of the desired image |
| `size` | string | Yes | Image size. dall-e-3: 1024x1024, 1024x1792, or 1792x1024. gpt-image-1: auto, 1024x1024, 1536x1024, or 1024x1536. gpt-image-2: auto or any size with edges ≤3840px and multiples of 16 \(e.g. 1024x1024, 1536x1024, 1024x1536, 2560x1440, 3840x2160\). |
| `quality` | string | No | Quality. dall-e-3: standard\|hd. gpt-image-1/gpt-image-2: auto\|low\|medium\|high |
| `style` | string | No | The style of the image \(vivid or natural\), only for dall-e-3 |
| `background` | string | No | Background. gpt-image-1: auto\|transparent\|opaque. gpt-image-2: auto\|opaque \(transparent not supported\) |
| `outputFormat` | string | No | Output image format \(png, jpeg, webp\), only for gpt-image-1 and gpt-image-2 |
| `moderation` | string | No | Moderation level \(auto or low\), only for gpt-image-1 and gpt-image-2 |
| `n` | number | No | The number of images to generate \(1-10\) |
| `apiKey` | string | Yes | Your OpenAI API key |
| `provider` | string | Yes | Image generation provider: openai, gemini, or falai |
| `apiKey` | string | Yes | Provider API key |
| `model` | string | Yes | Provider model ID, such as gpt-image-1.5, gemini-3.1-flash-image-preview, or nano-banana-2 |
| `prompt` | string | Yes | Text prompt describing the image to generate |
| `size` | string | No | Provider-specific image size |
| `aspectRatio` | string | No | Aspect ratio, such as auto, 1:1, 16:9, or 9:16 |
| `resolution` | string | No | Provider-specific image resolution, such as 1K, 2K, 4K, 1k, or 2k |
| `quality` | string | No | Provider-specific image quality |
| `background` | string | No | Background setting when supported |
| `outputFormat` | string | No | Output image format: png, jpeg, or webp where supported |
| `moderation` | string | No | OpenAI moderation level: auto or low |
| `safetyTolerance` | string | No | Fal.ai safety tolerance when supported |
| `numImages` | number | No | Number of images to generate, subject to provider limits |
| `seed` | number | No | Random seed when supported |
| `enableSafetyChecker` | boolean | No | Enable the Fal.ai safety checker when supported |
| `enableWebSearch` | boolean | No | Enable web search grounding when supported by the selected Fal.ai model |
| `thinkingLevel` | string | No | Fal.ai thinking level when supported: minimal or high |

#### Output

| Parameter | Type | Description |
| --------- | ---- | ----------- |
| `success` | boolean | Operation success status |
| `output` | object | Generated image data |
| ↳ `content` | string | Image URL or identifier |
| ↳ `image` | string | Base64 encoded image data |
| ↳ `metadata` | object | Image generation metadata |
| ↳ `model` | string | Model used for image generation |
| `content` | string | Generated image URL or identifier |
| `image` | file | Generated image file |
| `imageUrl` | string | Generated image URL |
| `provider` | string | Provider used |
| `model` | string | Model used |
| `metadata` | json | Generation metadata |
| ↳ `provider` | string | Provider used |
| ↳ `model` | string | Model used |
| ↳ `description` | string | Provider description |
| ↳ `revisedPrompt` | string | Revised prompt |
| ↳ `seed` | number | Seed used for generation |
| ↳ `jobId` | string | Provider job ID |
| ↳ `contentType` | string | Image MIME type |


32 changes: 16 additions & 16 deletions apps/docs/content/docs/en/tools/video_generator.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,37 +6,35 @@ description: Generate videos from text using AI
import { BlockInfoCard } from "@/components/ui/block-info-card"

<BlockInfoCard
type="video_generator_v2"
type="video_generator_v3"
color="#181C1E"
/>

{/* MANUAL-CONTENT-START:intro */}
Create videos from text prompts using cutting-edge AI models from top providers. Sim's Video Generator brings powerful, creative video synthesis capabilities to your workflow—supporting diverse models, aspect ratios, resolutions, camera controls, native audio, and advanced style and consistency features.
Create videos from text prompts using leading AI video providers. Sim's Video Generator supports direct provider integrations for Runway, Google Veo, Luma, and MiniMax, plus a Fal.ai multi-model provider for newer and specialized models.

**Supported Providers & Models:**

- **[Runway Gen-4](https://research.runwayml.com/gen2/)** (Runway ML):
Runway is a pioneer in text-to-video generation, known for powerful models like Gen-2, Gen-3, and Gen-4. The latest [Gen-4](https://research.runwayml.com/gen2/) model (and Gen-4 Turbo for faster results) supports more realistic motion, greater world consistency, and visual references for character, object, style, and location. Supports 16:9, 9:16, and 1:1 aspect ratios, 5–10 second durations, up to 4K resolution, style presets, and direct upload of reference images for consistent generations. Runway powers creative tools for filmmakers, studios, and content creators worldwide.
- **[Runway Gen-4](https://docs.dev.runwayml.com/)**: Generate image-to-video clips with a required reference image, 5 or 10 second durations, and landscape, portrait, or square output.

- **[Google Veo](https://deepmind.google/technologies/veo/)** (Google DeepMind):
[Veo](https://deepmind.google/technologies/veo/) is Google’s next-generation video generation model, offering high-quality, native-audio videos up to 1080p and 16 seconds. Supports advanced motion, cinematic effects, and nuanced text understanding. Veo can generate videos with built-in sound—activating native audio as well as silent clips. Options include 16:9 aspect, variable duration, different models (veo-3, veo-3.1), and prompt-based controls. Ideal for storytelling, advertising, research, and ideation.
- **[Google Veo](https://ai.google.dev/gemini-api/docs/video)**: Generate text-to-video clips with Veo 3 and Veo 3.1 models, portrait or landscape aspect ratios, 4, 6, or 8 second durations, and 720p or 1080p output.

- **[Luma Dream Machine](https://lumalabs.ai/dream-machine)** (Luma AI):
[Dream Machine](https://lumalabs.ai/dream-machine) delivers jaw-droppingly realistic and fluid video from text. It incorporates advanced camera control, cinematography prompts, and supports both ray-1 and ray-2 models. Dream Machine supports precise aspect ratios (16:9, 9:16, 1:1), variable durations, and the specification of camera paths for intricate visual direction. Luma is renowned for breakthrough visual fidelity and is backed by leading AI vision researchers.
- **[Luma Dream Machine](https://docs.lumalabs.ai/docs/video-generation)**: Generate Ray 2 videos with 5 or 9 second durations, common aspect ratios, multiple resolutions, and optional camera concept controls.

- **[MiniMax Hailuo-02](https://minimax.chat/)** (via [Fal.ai](https://fal.ai/)):
[MiniMax Hailuo-02](https://minimax.chat/) is a sophisticated Chinese generative video model, available globally through [Fal.ai](https://fal.ai/). Generate videos up to 16 seconds in landscape or portrait format, with options for prompt optimization to improve clarity and creativity. Pro and standard endpoints available, supporting high resolutions (up to 1920×1080). Well-suited for creative projects needing prompt translation and optimization, commercial storytelling, and rapid prototyping of visual ideas.
- **[MiniMax Hailuo](https://platform.minimax.io/docs/api-reference/video-generation-t2v)**: Generate Hailuo 2.3 or Hailuo-02 videos through MiniMax's platform API, with standard or pro quality endpoints and prompt optimization.

- **[Fal.ai Multi-Model](https://fal.ai/docs/model-api-reference/video-generation-api/overview)**: Access Veo 3.1, Sora 2, Seedance 2.0, Kling 3.0 and O3, MiniMax Hailuo 2.3, WAN 2.2, LTX 2.3, and previously supported Fal.ai models from one provider option.

**How to Choose:**
Pick your provider and model based on your needs for quality, speed, duration, audio, cost, and unique features. Runway and Veo offer world-leading realism and cinematic capabilities; Luma excels in fluid motion and camera control; MiniMax is ideal for Chinese-language prompts and offers fast, affordable access. Consider reference support, style presets, audio requirements, and pricing when selecting your tool.
Pick the provider and model based on quality, speed, duration, audio support, reference image needs, resolution, and cost. Runway is best when you have a visual reference, Veo and Luma are strong general text-to-video options, MiniMax offers a direct Hailuo API path, and Fal.ai is the best choice when you need access to the broadest model catalog.

For more details on features, restrictions, pricing, and model advances, see each provider’s official documentation above.
{/* MANUAL-CONTENT-END */}


## Usage Instructions

Generate high-quality videos from text prompts using leading AI providers. Supports multiple models, aspect ratios, resolutions, and provider-specific features like world consistency, camera controls, and audio generation.
Generate high-quality videos from text prompts using leading AI providers. Supports Runway, Google Veo, Luma, MiniMax, and Fal.ai multi-model generation with provider-specific durations, aspect ratios, resolutions, prompt optimization, and native audio controls.



Expand Down Expand Up @@ -141,9 +139,10 @@ Generate videos using MiniMax Hailuo through MiniMax Platform API with advanced
| --------- | ---- | -------- | ----------- |
| `provider` | string | Yes | Video provider \(minimax\) |
| `apiKey` | string | Yes | MiniMax API key from platform.minimax.io |
| `model` | string | No | MiniMax model: hailuo-02 \(default\) |
| `model` | string | No | MiniMax model: hailuo-2.3 \(default\) or hailuo-02 |
| `prompt` | string | Yes | Text prompt describing the video to generate |
| `duration` | number | No | Video duration in seconds \(6 or 10, default: 6\) |
| `endpoint` | string | No | Quality endpoint: standard \(768P\) or pro \(1080P for 6s videos\) |
| `promptOptimizer` | boolean | No | Enable prompt optimization for better results \(default: true\) |

#### Output
Expand All @@ -161,20 +160,21 @@ Generate videos using MiniMax Hailuo through MiniMax Platform API with advanced

### `video_falai`

Generate videos using Fal.ai platform with access to multiple models including Veo 3.1, Sora 2, Kling 2.5, MiniMax Hailuo, and more
Generate videos using Fal.ai with access to Veo 3.1, Sora 2, Seedance 2.0, Kling 3.0, MiniMax Hailuo 2.3, WAN 2.2, LTX 2.3, and previously supported models

#### Input

| Parameter | Type | Required | Description |
| --------- | ---- | -------- | ----------- |
| `provider` | string | Yes | Video provider \(falai\) |
| `apiKey` | string | Yes | Fal.ai API key |
| `model` | string | Yes | Fal.ai model: veo-3.1 \(Google Veo 3.1\), sora-2 \(OpenAI Sora 2\), kling-2.5-turbo-pro \(Kling 2.5 Turbo Pro\), kling-2.1-pro \(Kling 2.1 Master\), minimax-hailuo-2.3-pro \(MiniMax Hailuo Pro\), minimax-hailuo-2.3-standard \(MiniMax Hailuo Standard\), wan-2.1 \(WAN T2V\), ltxv-0.9.8 \(LTXV 13B\) |
| `model` | string | Yes | Fal.ai model: veo-3.1, veo-3.1-fast, sora-2, sora-2-pro, seedance-2.0, seedance-2.0-fast, kling-v3-pro, kling-v3-4k, kling-o3-pro, kling-o3-4k, minimax-hailuo-2.3-pro, minimax-hailuo-2.3-standard, wan-2.2-a14b-turbo, ltx-2.3, ltx-2.3-fast, plus previously supported model IDs |
| `prompt` | string | Yes | Text prompt describing the video to generate |
| `duration` | number | No | Video duration in seconds \(varies by model\) |
| `aspectRatio` | string | No | Aspect ratio \(varies by model\): 16:9, 9:16, 1:1 |
| `resolution` | string | No | Video resolution \(varies by model\): 540p, 720p, 1080p |
| `resolution` | string | No | Video resolution \(varies by model\): 480p, 580p, 720p, 1080p, true_1080p, 1440p, 2160p, 4k |
| `promptOptimizer` | boolean | No | Enable prompt optimization for MiniMax models \(default: true\) |
| `generateAudio` | boolean | No | Generate native audio when supported by the selected Fal.ai model |

#### Output

Expand Down
4 changes: 2 additions & 2 deletions apps/sim/app/(landing)/integrations/data/icon-mapping.ts
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ export const blockTypeToIconMap: Record<string, IconComponent> = {
hunter: HunterIOIcon,
iam: IAMIcon,
identity_center: IdentityCenterIcon,
image_generator: ImageIcon,
image_generator_v2: ImageIcon,
imap: MailServerIcon,
incidentio: IncidentioIcon,
infisical: InfisicalIcon,
Expand Down Expand Up @@ -398,7 +398,7 @@ export const blockTypeToIconMap: Record<string, IconComponent> = {
typeform: TypeformIcon,
upstash: UpstashIcon,
vercel: VercelIcon,
video_generator_v2: VideoIcon,
video_generator_v3: VideoIcon,
vision_v2: EyeIcon,
wealthbox: WealthboxIcon,
webflow: WebflowIcon,
Expand Down
10 changes: 5 additions & 5 deletions apps/sim/app/(landing)/integrations/data/integrations.json
Original file line number Diff line number Diff line change
Expand Up @@ -6642,11 +6642,11 @@
"tags": ["enrichment", "sales-engagement"]
},
{
"type": "image_generator",
"type": "image_generator_v2",
"slug": "image-generator",
"name": "Image Generator",
"description": "Generate images",
"longDescription": "Integrate Image Generator into the workflow. Can generate images using DALL-E 3, GPT Image 1, or GPT Image 2.",
"longDescription": "Generate images using OpenAI GPT Image, Google Nano Banana, or Fal.ai image models.",
"bgColor": "#4D5FFF",
"iconName": "ImageIcon",
"docsUrl": "https://docs.sim.ai/tools/image_generator",
Expand Down Expand Up @@ -14015,14 +14015,14 @@
"tags": ["cloud", "ci-cd"]
},
{
"type": "video_generator_v2",
"type": "video_generator_v3",
"slug": "video-generator",
"name": "Video Generator",
"description": "Generate videos from text using AI",
"longDescription": "Generate high-quality videos from text prompts using leading AI providers. Supports multiple models, aspect ratios, resolutions, and provider-specific features like world consistency, camera controls, and audio generation.",
"longDescription": "Generate high-quality videos from text prompts using leading AI providers. Supports Runway, Google Veo, Luma, MiniMax, and Fal.ai multi-model generation with provider-specific durations, aspect ratios, resolutions, prompt optimization, and native audio controls.",
"bgColor": "#181C1E",
"iconName": "VideoIcon",
"docsUrl": "https://docs.sim.ai/tools/video-generator",
"docsUrl": "https://docs.sim.ai/tools/video_generator",
"operations": [],
"operationCount": 0,
"triggers": [],
Expand Down
Loading
Loading