D
Dataset Viewer
community
search
Browse and analyze Hugging Face datasets with features like search, filtering, statistics, and data export
Dataset Viewer MCP Server
An MCP server for interacting with the Hugging Face Dataset Viewer API, providing capabilities to browse and analyze datasets hosted on the Hugging Face Hub.
Features
Resources
- Uses
URI scheme for accessing Hugging Face datasetsdataset://
- Supports dataset configurations and splits
- Provides paginated access to dataset contents
- Handles authentication for private datasets
- Supports searching and filtering dataset contents
- Provides dataset statistics and analysis
Tools
The server provides the following tools:
-
validate
- Check if a dataset exists and is accessible
- Parameters:
: Dataset identifier (e.g. 'stanfordnlp/imdb')dataset
(optional): For private datasetsauth_token
-
get_info
- Get detailed information about a dataset
- Parameters:
: Dataset identifierdataset
(optional): For private datasetsauth_token
-
get_rows
- Get paginated contents of a dataset
- Parameters:
: Dataset identifierdataset
: Configuration nameconfig
: Split namesplit
(optional): Page number (0-based)page
(optional): For private datasetsauth_token
-
get_first_rows
- Get first rows from a dataset split
- Parameters:
: Dataset identifierdataset
: Configuration nameconfig
: Split namesplit
(optional): For private datasetsauth_token
-
get_statistics
- Get statistics about a dataset split
- Parameters:
: Dataset identifierdataset
: Configuration nameconfig
: Split namesplit
(optional): For private datasetsauth_token
-
search_dataset
- Search for text within a dataset
- Parameters:
: Dataset identifierdataset
: Configuration nameconfig
: Split namesplit
: Text to search forquery
(optional): For private datasetsauth_token
-
filter
- Filter rows using SQL-like conditions
- Parameters:
: Dataset identifierdataset
: Configuration nameconfig
: Split namesplit
: SQL WHERE clause (e.g. "score > 0.5")where
(optional): SQL ORDER BY clauseorderby
(optional): Page number (0-based)page
(optional): For private datasetsauth_token
-
get_parquet
- Download entire dataset in Parquet format
- Parameters:
: Dataset identifierdataset
(optional): For private datasetsauth_token
Installation
Prerequisites
- Python 3.12 or higher
- uv - Fast Python package installer and resolver
Setup
- Clone the repository:
git clone https://github.com/privetin/dataset-viewer.git cd dataset-viewer
- Create a virtual environment and install:
# Create virtual environment uv venv # Activate virtual environment # On Unix: source .venv/bin/activate # On Windows: .venv\Scripts\activate # Install in development mode uv add -e .
Configuration
Environment Variables
: Your Hugging Face API token for accessing private datasetsHUGGINGFACE_TOKEN
Claude Desktop Integration
Add the following to your Claude Desktop config file:
On Windows:
%APPDATA%\Claude\claude_desktop_config.json
On MacOS:
~/Library/Application Support/Claude/claude_desktop_config.json
{ "mcpServers": { "dataset-viewer": { "command": "uv", "args": [ "run", "dataset-viewer" ] } } }
Usage Examples
- Validate a dataset:
{ "dataset": "stanfordnlp/imdb" }
- Get dataset information:
{ "dataset": "stanfordnlp/imdb" }
- Search dataset contents:
{ "dataset": "stanfordnlp/imdb", "config": "plain_text", "split": "train", "query": "great movie" }
- Filter and sort rows:
{ "dataset": "stanfordnlp/imdb", "config": "plain_text", "split": "train", "where": "label = 'positive'", "orderby": "text DESC", "page": 0 }
- Get dataset statistics:
{ "dataset": "stanfordnlp/imdb", "config": "plain_text", "split": "train" }
License
MIT License - see LICENSE for details
Related Servers
Aiven
official
Navigate your [Aiven projects](https://go.aiven.io/mcp-server) and interact with the PostgreSQL®, Apache Kafka®, ClickHouse® and OpenSearch® services
View DetailsApify
official
[Actors MCP Server](https://apify.com/apify/actors-mcp-server): Use 3,000+ pre-built cloud tools to extract data from websites, e-commerce, social media, search engines, maps, and more
View Details