Dataset Viewer

community

Browse and analyze Hugging Face datasets with features like search, filtering, statistics, and data export

Dataset Viewer MCP Server

An MCP server for interacting with the Hugging Face Dataset Viewer API, providing capabilities to browse and analyze datasets hosted on the Hugging Face Hub.

Features

Resources

Uses
```
dataset://
```
URI scheme for accessing Hugging Face datasets
Supports dataset configurations and splits
Provides paginated access to dataset contents
Handles authentication for private datasets
Supports searching and filtering dataset contents
Provides dataset statistics and analysis

Tools

The server provides the following tools:

validate
- Check if a dataset exists and is accessible
- Parameters:
  - ```
  dataset
```
  : Dataset identifier (e.g. 'stanfordnlp/imdb')
- ```
auth_token
```
    (optional): For private datasets
get_info
- Get detailed information about a dataset
- Parameters:
  - ```
  dataset
```
  : Dataset identifier
- ```
auth_token
```
    (optional): For private datasets
get_rows
- Get paginated contents of a dataset
- Parameters:
  - ```
  dataset
```
  : Dataset identifier
- ```
config
```
    : Configuration name
  - ```
  split
```
  : Split name
- ```
page
```
    (optional): Page number (0-based)
  - ```
  auth_token
```
  (optional): For private datasets
get_first_rows
- Get first rows from a dataset split
- Parameters:
  - ```
  dataset
```
  : Dataset identifier
- ```
config
```
    : Configuration name
  - ```
  split
```
  : Split name
- ```
auth_token
```
    (optional): For private datasets
get_statistics
- Get statistics about a dataset split
- Parameters:
  - ```
  dataset
```
  : Dataset identifier
- ```
config
```
    : Configuration name
  - ```
  split
```
  : Split name
- ```
auth_token
```
    (optional): For private datasets
search_dataset
- Search for text within a dataset
- Parameters:
  - ```
  dataset
```
  : Dataset identifier
- ```
config
```
    : Configuration name
  - ```
  split
```
  : Split name
- ```
query
```
    : Text to search for
  - ```
  auth_token
```
  (optional): For private datasets
filter
- Filter rows using SQL-like conditions
- Parameters:
  - ```
  dataset
```
  : Dataset identifier
- ```
config
```
    : Configuration name
  - ```
  split
```
  : Split name
- ```
where
```
    : SQL WHERE clause (e.g. "score > 0.5")
  - ```
  orderby
```
  (optional): SQL ORDER BY clause
- ```
page
```
    (optional): Page number (0-based)
  - ```
  auth_token
```
  (optional): For private datasets
get_parquet
- Download entire dataset in Parquet format
- Parameters:
  - ```
  dataset
```
  : Dataset identifier
- ```
auth_token
```
    (optional): For private datasets

Installation

Prerequisites

Python 3.12 or higher
uv - Fast Python package installer and resolver

Setup

Clone the repository:

git clone https://github.com/privetin/dataset-viewer.git
cd dataset-viewer

Create a virtual environment and install:

# Create virtual environment
uv venv

# Activate virtual environment
# On Unix:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

# Install in development mode
uv add -e .

Configuration

Environment Variables

```
HUGGINGFACE_TOKEN
```
: Your Hugging Face API token for accessing private datasets

Claude Desktop Integration

Add the following to your Claude Desktop config file:

On Windows:

%APPDATA%\Claude\claude_desktop_config.json

On MacOS:

~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "dataset-viewer": {
      "command": "uv",
      "args": [
        "run",
        "dataset-viewer"
      ]
    }
  }
}

Usage Examples

Validate a dataset:

{
  "dataset": "stanfordnlp/imdb"
}

Get dataset information:

{
  "dataset": "stanfordnlp/imdb"
}

Search dataset contents:

{
  "dataset": "stanfordnlp/imdb",
  "config": "plain_text",
  "split": "train",
  "query": "great movie"
}

Filter and sort rows:

{
  "dataset": "stanfordnlp/imdb",
  "config": "plain_text",
  "split": "train",
  "where": "label = 'positive'",
  "orderby": "text DESC",
  "page": 0
}

Get dataset statistics:

{
  "dataset": "stanfordnlp/imdb",
  "config": "plain_text",
  "split": "train"
}

License

MIT License - see LICENSE for details

Related Servers

Brave Search

reference

Web and local search using Brave's Search API

View Details

Git

reference

Tools to read, search, and manipulate Git repositories

View Details

Google Drive

reference

File access and search capabilities for Google Drive

View Details

Aiven

official

Navigate your [Aiven projects](https://go.aiven.io/mcp-server) and interact with the PostgreSQL®, Apache Kafka®, ClickHouse® and OpenSearch® services

View Details

Apify

official

[Actors MCP Server](https://apify.com/apify/actors-mcp-server): Use 3,000+ pre-built cloud tools to extract data from websites, e-commerce, social media, search engines, maps, and more

View Details