Configuration
GPlay Scraper provides flexible configuration options to customize behavior.
Default Settings
The library comes with sensible defaults:
from gplay_scraper import Config
print(f"Timeout: {Config.DEFAULT_TIMEOUT}s") # 30 seconds
print(f"Rate limit: {Config.RATE_LIMIT_DELAY}s") # 1.0 seconds
print(f"ASO keywords: {Config.ASO_TOP_KEYWORDS}") # 20
print(f"Min word length: {Config.ASO_MIN_WORD_LENGTH}") # 3
Network Configuration
Timeout Settings
Control request timeout:
from gplay_scraper import GPlayScraper
scraper = GPlayScraper()
# Access current timeout
print(f"Current timeout: {scraper.scraper.timeout}s")
# Modify timeout (affects all future requests)
scraper.scraper.timeout = 45 # 45 seconds
Rate Limiting
Configure delays between requests:
from gplay_scraper import GPlayScraper
# Default rate limiting (1 second)
scraper = GPlayScraper()
# Custom rate limiting
scraper = GPlayScraper()
scraper.scraper.rate_limit_delay = 2.0 # 2 seconds between requests
# Disable rate limiting (not recommended)
scraper.scraper.rate_limit_delay = 0
User Agent
Customize the user agent string:
from gplay_scraper import Config
# Get default headers
headers = Config.get_headers()
print(headers['User-Agent'])
# Custom user agent
custom_headers = Config.get_headers("MyApp/1.0")
print(custom_headers['User-Agent'])
ASO Configuration
Keyword Analysis
Configure ASO analysis parameters:
from gplay_scraper import GPlayScraper, Config
scraper = GPlayScraper()
# Use default settings (20 top keywords)
data = scraper.analyze("com.hubolabs.hubo")
# Custom keyword count
# Note: This requires modifying the analyzer directly
scraper.aso_analyzer.analyze_app_text(
app_data,
top_n=50 # Get top 50 keywords instead of 20
)
Stop Words
Customize stop words for keyword analysis:
from gplay_scraper.core.aso_analyzer import AsoAnalyzer
analyzer = AsoAnalyzer()
# View default stop words
print(f"Stop words count: {len(analyzer.default_stop_words)}")
# Add custom stop words
custom_stop_words = analyzer.default_stop_words.copy()
custom_stop_words.update(['custom', 'words', 'to', 'exclude'])
# Use custom stop words
tokens = analyzer.tokenize_text("Your text here", stop_words=custom_stop_words)
Caching Configuration
The library includes built-in caching for performance:
from gplay_scraper import GPlayScraper
scraper = GPlayScraper()
# First call - fetches from Play Store
data1 = scraper.analyze("com.hubolabs.hubo")
# Second call - uses cached data (much faster)
data2 = scraper.analyze("com.hubolabs.hubo")
# Clear cache if needed
scraper._cache.clear()
Environment Variables
You can use environment variables for configuration:
import os
from gplay_scraper import GPlayScraper
# Set environment variables
os.environ['GPLAY_TIMEOUT'] = '45'
os.environ['GPLAY_RATE_LIMIT'] = '2.0'
# Use in your application
timeout = int(os.environ.get('GPLAY_TIMEOUT', 30))
rate_limit = float(os.environ.get('GPLAY_RATE_LIMIT', 1.0))
scraper = GPlayScraper()
scraper.scraper.timeout = timeout
scraper.scraper.rate_limit_delay = rate_limit
Advanced Configuration
Custom Request Configuration
For advanced users who need full control:
from gplay_scraper import Config
import requests
# Get request configuration
config = Config.get_request_config()
# Modify configuration
config['timeout'] = 60
config['headers']['Accept-Language'] = 'en-US,en;q=0.9'
# Use with requests directly (advanced usage)
response = requests.get('https://example.com', **config)
Logging Configuration
Configure logging for debugging:
import logging
from gplay_scraper import GPlayScraper
# Enable debug logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger('gplay_scraper')
# Now you'll see detailed logs
scraper = GPlayScraper()
data = scraper.analyze("com.hubolabs.hubo")
Error Handling
Handle errors gracefully:
from gplay_scraper import GPlayScraper, AppNotFoundError, NetworkError
scraper = GPlayScraper()
try:
data = scraper.app_analyze("com.whatsapp")
except AppNotFoundError:
print("App not found")
except NetworkError:
print("Network error occurred")
except Exception as e:
print(f"Unexpected error: {e}")
When to Use Each Method
App Methods
app_analyze()
- Need all data for comprehensive analysisapp_get_field()
- Need just one specific valueapp_get_fields()
- Need several specific fields (more efficient)app_print_field()
- Quick debugging/console outputapp_print_fields()
- Quick debugging of multiple valuesapp_print_all()
- Explore available data structure
Search Methods
search_analyze()
- Need complete data for all search resultssearch_get_field()
- Need just one field from all resultssearch_get_fields()
- Need specific fields from all resultssearch_print_all()
- Explore available data structure
Reviews Methods
reviews_analyze()
- Need complete review data for analysisreviews_get_field()
- Need just one field (e.g., all scores)reviews_get_fields()
- Need specific fields (more efficient)reviews_print_all()
- Explore available data structure
Developer Methods
developer_analyze()
- Need complete data for all appsdeveloper_get_field()
- Need just one field from all appsdeveloper_get_fields()
- Need specific fields from all appsdeveloper_print_all()
- Explore available data structure
List Methods
list_analyze()
- Need complete data for all chart appslist_get_field()
- Need just one field from all appslist_get_fields()
- Need specific fields from all appslist_print_all()
- Explore available data structure
Similar Methods
similar_analyze()
- Need complete data for similar appssimilar_get_field()
- Need just one field from all appssimilar_get_fields()
- Need specific fields from all appssimilar_print_all()
- Explore available data structure
Suggest Methods
suggest_analyze()
- Need list of suggestionssuggest_nested()
- Need suggestions for suggestionssuggest_print_all()
- Quick console outputsuggest_print_nested()
- Quick nested output
Best Practices
- Rate Limiting
Always use appropriate rate limiting to avoid being blocked. The default 1-second delay is recommended.
- Error Handling
Always wrap scraper calls in try-catch blocks for production use.
- HTTP Client Selection
Start with default (requests), install 2-3 backup clients for reliability.
- Method Selection
Use get_fields() instead of multiple get_field() calls for better performance.
- Timeouts
Adjust timeouts based on your network conditions and requirements.
- Logging
Enable logging in development to understand what’s happening under the hood.
- Multi-Region
Use lang and country parameters to get localized data for different markets.